Building.it

SaySo is rebuilding voice input for people who type for a living

SaySo is a voice-input layer for writers and builders. The pitch — stop typing into the prompt box. Just talk to your editor, your IDE, your AI agent.

Mira Kowalski

·April 9, 2026·

The keyboard is the bottleneck.

Talk to any working programmer, prompt engineer, or AI-tool power user about their day and they will eventually describe the same scene: they are pacing the kitchen at 2 a.m. composing a paragraph in their head that they then have to translate, one finger-jab at a time, into a chat window. The thought is already complete. The typing is the friction.

SaySo is a small Shanghai-based team trying to remove that friction. The product is a voice-input layer that sits on top of your editor, your IDE, your AI chat, your terminal — anywhere you'd otherwise type a prompt. You hold a hotkey, you talk for a minute, and your words show up in the input box already cleaned of "um" and "uh" and false starts.

Voice dictation is not new — voice for builders is

The category is older than most of the people working in it. Dragon NaturallySpeaking shipped in 1997. Apple's been bundling system-wide dictation since 2012. What's changed isn't the recognition quality — that's been good enough for a decade — but who's typing all day.

The first wave of voice users were lawyers, doctors, journalists. People dictating long-form prose into a single document. The transcription engine had to understand domain vocabulary, sure, but it didn't really have to understand what you were trying to do with the text. You spoke, the words appeared.

Builders work differently. A working programmer in 2026 might switch between Cursor, ChatGPT, a terminal, Slack, and a Notion doc thirty times in an hour. Each context wants different formatting. A prompt to an LLM is a paragraph. A Slack message is one line. A commit message is imperative mood. The friction isn't the words — it's all the micro-decisions about where the words go and what shape they take when they get there.

SaySo's bet is that this is what the category needed: not better speech-to-text, but better speech-to-context.

What it actually does

The product itself is unfussy. A small status-bar app. A push-to-talk hotkey. A short delay after you let go.

What's interesting is what happens in the delay. The team described, in a recent thread on X, three things they do between the raw transcript and the text that lands in your input field:

Stripping disfluency. Filler words, restarts, second-thoughts. The kind of cleanup that takes a competent editor about thirty seconds and a 7B-param model about three.
Context shaping. If you're talking into a terminal, the model knows to render commands without quotation marks around them. If you're talking into Cursor's chat, it knows to preserve code snippets verbatim. If you're talking into a Slack DM, it shortens.
Memory. Personal vocabulary — the names of your colleagues, your repo, your stack — stays consistent across sessions. The first time you say "Mira" it has to guess; by the third time it doesn't.

The third one is, in my testing, the thing that makes the difference between a voice product you try once and a voice product you keep installed. Personal-vocabulary memory is the difference between "this transcribed react as racked again" and "huh, I just talked for ninety seconds without correcting anything."

Who this is for

There are at least three founder-adjacent audiences SaySo seems aimed at, and the team is candid about not knowing yet which one will pay first:

Power prompt users — people who treat ChatGPT/Claude/Cursor as their primary work surface and find the keyboard-bound prompt UI a cognitive bottleneck. The pitch here is throughput: you can outline a 600-word brief in the time it takes to type two sentences.
Solo founders doing customer support — people who spend their evenings answering DMs and email and find that talking the reply, then editing, is faster than typing the reply, then editing. Particularly compelling when you're tired.
Non-native English builders — a non-obvious audience, but a real one. SaySo's roots are in Shanghai, and the team is open about building first for the bilingual case: someone whose head moves faster in Mandarin than their fingers move in English. Voice flattens the keyboard-vocabulary gap.

That last audience is, frankly, the most interesting one. There's a huge cohort of working programmers and founders worldwide whose written English is a tax — slower than their thought, slower than their native language, slower than their work needs them to be. A voice layer that handles the bilingual case well could be a quiet wedge into a category nobody else is building for.

What gives me pause

SaySo isn't fighting on a quiet hill. Apple, Google, and the OS-bundled voice dictation in every browser are right there, free, and built into the platform. The argument SaySo has to make is that "built into the platform" and "actually good for builders" are two different things — and historically they have been, but the OS vendors keep narrowing that gap.

There's also a quieter risk: voice products are sticky if and only if the first three minutes work. The first miscorrection that you have to fix manually breaks the trust. Voice has been around long enough that most builders have a story about giving up on Dragon in 2009 or Siri in 2013, and the rebuilding-trust ask is real.

The team seems aware of both. Their public roadmap leans hard on the parts of the experience the OS vendors don't bother with — the contextual shaping, the personal memory, the bilingual handling. That's the right place to compete.

Why I'm watching it

The honest answer is that I've spent the last six months watching the AI-tool category land where the keyboard used to be the bottleneck and is now the entire bottleneck. Every product I use most days is a text box, and every text box wants more text from me than I can comfortably type. If voice doesn't get fixed soon, the AI-tool category hits a hard ceiling that nobody talks about: the part where the model is faster than the person.

SaySo isn't a sure thing. But it's pointed at the right problem, the team is shipping with the kind of clarity small teams have when they haven't been beaten up by a board yet, and the bilingual angle is a real moat in a category most US-based competitors will never bother building for.

Worth installing for a week. The keyboard isn't going to fix itself.

Building It

Voice dictation is not new — voice for builders is

What it actually does

Who this is for

What gives me pause

Why I'm watching it