Typing long prompts mid-flow is friction. Whether you are deep in a debugging session, iterating on a complex refactor, or working in a language where keyboard input is cumbersome, stopping to compose a multi-sentence instruction breaks concentration. Kolbo Code's push-to-talk voice input solves this with a single key chord.

How It Works

Hold Ctrl+Y and speak. Release to submit.

While the key is held, Kolbo Code opens a live audio stream to ElevenLabs Scribe v2, the current state-of-the-art real-time speech transcription engine. Audio is streamed continuously - not recorded and uploaded in a batch - so transcription is low-latency and reflects what you said rather than a post-processed approximation. The moment you release Ctrl+Y, the final transcript is injected into the active prompt field and submitted automatically.

No intermediate "review and confirm" step. The workflow is:

Hold Ctrl+Y
Say your prompt
Release - prompt is submitted

This matches the mental model of a walkie-talkie: press-to-talk, release-to-send.

Setup

If you have already authenticated Kolbo Code with your account, voice input requires zero additional configuration. The ElevenLabs Scribe v2 integration is bundled into the Kolbo Code runtime - there is no separate API key to provision and no microphone permission dialog to navigate beyond your OS-level audio access grant.

On first use the TUI will display a brief status indicator confirming that the microphone stream is active. If your system has multiple audio input devices, Kolbo Code uses the OS default input. To switch devices, update your system's default recording device before launching.

Best Use Cases

Hands-Free Debugging

When you are reading a stack trace or stepping through logic mentally, your hands are often occupied scrolling, hovering, or holding a reference. Dictating the next instruction - "Add a null check before the getUser call and log the result to the debug channel" - lets you stay in the visual context without context-switching to the keyboard.

Long, Detailed Prompts

Spoken language is faster than typing for most people, and Scribe v2 handles technical vocabulary well. Prompts that would take 30 seconds to type can be spoken in 8 to 10. For instructions with multiple steps or constraints, voice is noticeably faster:

"Refactor the fetchOrders function to use async/await instead of Promise chains, add error handling that surfaces the HTTP status code, and update the JSDoc to reflect the new signature."

That single sentence becomes a fully formed, accurately transcribed prompt on release.

Accessibility

For developers with repetitive strain injuries, motor impairments, or any condition that makes sustained keyboard use difficult, voice input converts Kolbo Code into a hands-free coding assistant. The push-to-talk model - rather than always-on voice detection - gives precise control over when audio is captured, which matters in open office environments and avoids unintended submissions.

Multilingual Prompting

Kolbo Code ships with a TUI localized into 12 languages. Voice input works across all of them. You can prompt in the same language as the interface, or mix languages freely - Scribe v2 performs language detection per utterance. This is particularly useful when the most precise way to describe a domain concept is in your native language even if the codebase comments are in English.

Hebrew and Arabic Voice Prompting

Scribe v2 has strong support for both Hebrew and Arabic, including full RTL text handling. When Kolbo Code is running in Hebrew (he) or Arabic (ar) locale mode, transcribed text is rendered right-to-left and submitted in the correct direction. There is no special configuration required - the locale setting that controls the TUI language is the same one that signals RTL rendering to the prompt field.

A practical note for Hebrew and Arabic speakers: technical terms, identifiers, and English method names embedded in an otherwise Hebrew or Arabic sentence are transcribed correctly. You do not need to pause or switch register when saying something like:

"תוסיף error handling ל-fetchUser שיחזיר null אם הסטטוס הוא 404"

Scribe v2 handles code-switched speech of this kind without requiring any preprocessing or tagging on your end.

Tips for Best Transcription Quality

Speak at a natural pace. Scribe v2 is trained on conversational speech. Deliberately slow dictation can sometimes produce less accurate results than normal speaking speed.

Name identifiers as you would read them aloud. For camelCase or snake_case identifiers, saying "fetch user data" produces fetchUserData in most contexts where the surrounding prompt makes the intent clear. For ambiguous cases, spelling out the casing explicitly - "fetch, capital U, user, capital D, data" - is a reliable fallback.

Keep background noise low on the first session. Scribe v2 adapts to your microphone characteristics over the course of a session. The first few utterances in a noisy environment may show slightly lower accuracy than subsequent ones.

Use complete sentences. Fragment prompts like "null check, fetchUser, 404" are parsed correctly most of the time, but full imperative sentences give the transcription engine more context to resolve ambiguous words and produce cleaner output.

Avoid holding Ctrl+Y before you are ready to speak. The stream opens immediately on key-down. A second or two of silence at the start of an utterance is handled gracefully, but beginning mid-word can occasionally truncate the first syllable.

Combining Voice with the Code Editing Workflow

Voice input is additive - it does not replace keyboard entry. A common pattern is to use voice for high-level intent and keyboard for precise edits: dictate "Extract this block into a helper function called formatCurrency", review the generated code, then use keyboard shortcuts to apply or refine the diff. The two input modes compose naturally because they operate at different levels of abstraction.

Voice input in Kolbo Code is available now for all authenticated users. Open Kolbo Code, hold Ctrl+Y, and say your next prompt. If you have feedback on transcription quality, language support, or workflow integration, reach out via the in-app feedback channel or the Kolbo community Discord - voice input accuracy improvements are an active development priority.

Try Kolbo Code →

Kolbo Code Voice Input - Hands-Free AI Coding with Push-to-Talk

How It Works

Setup

Best Use Cases

Hands-Free Debugging

Long, Detailed Prompts

Accessibility

Multilingual Prompting

Hebrew and Arabic Voice Prompting

Tips for Best Transcription Quality

Combining Voice with the Code Editing Workflow

Tags

Related Posts

Kolbo Code - The AI Coding Agent for Your Terminal

Kolbo Code vs Claude Code - Key Differences and When to Use Each

AI Transcription and Subtitles: A Complete Guide with ElevenLabs Scribe v2

We value your privacy