We submitted Lisper to the Kaggle Gemma 4 Good Hackathon: a lisp-focused AI speech practice app built around low-pressure feedback, repeatable training flows, and a fine-tuned Gemma 4 audio model.
Read the Kaggle submission | Try the demo | Source code
Why Lisps
Speech therapy apps usually treat lisps as one small part of a much wider speech category. We wanted to build the opposite: a tool centered specifically on /s/ and /z/ practice, where the product language, examples, scoring, and feedback all assume the user is working on lisp patterns.
That focus matters because the hard part is usually repetition, not explanation. A narrow practice loop can keep the interaction short, predictable, and low-pressure while still giving feedback that is specific to the sounds being trained.
The goal was not to replace a speech-language pathologist. It was to create a free practice loop that makes daily repetition easier: record a phrase, get clear feedback, try again, and gradually move from isolated sounds into words, sentences, and natural speech.
What We Built
Lisper has three public surfaces:
- A Hugging Face Spaces demo for sampling the fine-tuned model without installing anything.
- A browser demo model packaged as ONNX/WebGPU
q4f16for local, keyless inference. - A training and evaluation pipeline for the Gemma 4 E2B audio LoRA, including the merged model and release artifacts.
The app combines a speech-practice interface with a model trained to give concise, encouraging feedback about likely lisp patterns. The release includes the LoRA adapter, merged model, and browser-ready model so the work can be inspected, reused, and run in different environments.
Model Artifacts
The main release links are:
For the hackathon, the important split is that the browser package is optimized for the demo path, while the model gate used a stronger held-out evaluation path. The final v18 hybrid eval passed on 2000 held-out rows with 0 hard errors, and the browser target is the q4f16 ONNX/WebGPU package.
What I Like About This Project
The best part of Lisper is that it is small and practical. It does not ask for API keys. It does not depend on a closed hosted model. It is focused on one speech need and one practice loop.
That focus made the build more honest. The product is not “AI therapy” in the broad, hand-wavy sense. It is a practice assistant for a specific articulation challenge, with public code, public model artifacts, and a demo anyone can try.
The submission is live now on Kaggle.
