Skip to content
Announcements

May 2026 Update: Four New Models + A Smarter Studio

Grok Imagine Quality for cinematic, photoreal imagery. Gemini Omni Video that pins specific character voices into the result. 30 new Gemini TTS voices with 40+ accents and a Voice Direction picker. Plus a redesigned Generation Progress system, Erase mode in the Inpaint tool, and Visual DNA reference video + voice samples.

By Kolbo.AI Team
May 2026 Update: Four New Models + A Smarter Studio

A big batch shipped this week. Cinematic imagery, character-locked video voices, expressive speech with real direction, cleaner erase, and a smarter in-progress UI. Here's everything that's new.

🎨 Grok Imagine Quality - Cinematic, Photoreal Imagery

Grok Imagine Quality

xAI's higher-quality tier is now live, in both Text-to-Image and Image Edit. Atmospheric lighting, realistic rain and neon reflections, nuanced expressions. Supports Visual DNA and up to 3 reference images. Selectable 1K and 2K output.

This is the model to reach for when you need photorealism with real depth - film stills, product shots that actually look shot, characters with believable skin and hair detail.

Try Grok Imagine Quality →

🗣️ Gemini Voices - 30 New TTS Voices, Accents & Direction

Gemini Voice Direction picker - 12 curated styles plus custom

30 expressive Gemini voices, each with 40+ language and accent variants - English (US/UK/AU/IN), Spanish, French, Portuguese, Hebrew, Arabic, Hindi, Japanese, Korean, Mandarin and 20+ more.

Pair them with the new Voice Direction picker: 12 curated styles (Whisper, Dramatic newscast, Calm narration, Storyteller, TV commercial, Cheerful, Soft & intimate, Excited & energetic, Sad & melancholic, Serious & professional, British accent, Warm & conversational) - or a Custom prompt up to 500 characters for free-form direction.

The chosen style is shown on each generated audio card so you can see exactly how a take was directed.

Try the new voices →

🎬 Gemini Omni Video - Pin a Character's Voice Into the Video

A new text-to-video and Elements model that locks specific character voices and audio clips into the output. 4–10s outputs at 16:9 or 9:16 in 720p, 1080p, or 4K.

The magic: it reuses voice samples attached to your character Visual DNAs. Say it once on the DNA and every Gemini Omni run picks up the right voice automatically. Perfect for character-driven shorts where the voice IS the character.

Open Video Tools →

⚡ Grok Imagine Video - Any Duration, More Reliable

Example: image-to-video with Grok Imagine

Pick any video length from 1 to 15 seconds - no more fixed-duration buckets. We swapped providers (fal primary, kie fallback) for dramatically more reliable generations. Same quality, far fewer "the model timed out" surprises.

Try Grok Imagine →

🧽 New Erase Mode in Inpaint + Flux Pro Erase

The brush tool now has two modes:

  • Replace - paint a region, describe what should appear there
  • Erase - paint, get it cleanly removed, no prompt needed

The model list swaps automatically when you switch modes, and a new Inpaint / Erase shortcut is available straight from the image right-click menu.

We also added Flux Pro Erase - a premium dedicated erase model from Black Forest Labs for cleaner removals on skin, hair, and complex textures than the default erase model produces.

⏱️ Generation Progress - Redesigned & Smarter

Generation Progress Redesign

A cleaner in-progress UI, more accurate ETAs you can actually trust (resolution-aware, calibrated from real production data - no more 38-second countdowns for 4K generations that take 4 minutes), and one-click jump from the global progress panel straight to any running generation.

Each loader now shows the model avatar so you know exactly what's running - no more guessing whether that loader is for the Seedance generation or the Sora preset you queued.

🧬 Visual DNA: Attach Reference Video & Voice Samples

The Visual DNA creation panel now has an Advanced media section: optionally attach a 3–10s reference video and, for character DNAs, a voice sample alongside the usual reference images.

Useful when motion or speech is part of what makes the character feel right - captured once at creation time, used everywhere downstream (including Gemini Omni Video, which reads the voice sample automatically).

What's Next

Keep an eye on the in-app announcement deck (bottom-right corner) for daily updates. More model launches, workflow upgrades, and quality-of-life improvements are already in the pipeline for June.

Tags

grok-imaginegeminittstext-to-imagetext-to-videoimage-editingvisual-dnanew-modelsmay-2026

Related Posts

    We value your privacy

    We use cookies and similar technologies to improve your experience, analyze site traffic, and personalize content. You can choose which types of cookies to accept.