Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Gemini 3.1 Flash TTS: the next generation of expressive AI speech — Google DeepMind News
Source: Google DeepMind News

Gemini 3.1 Flash TTS is the latest text-to-speech model, offering improved controllability, expressivity and quality. It is rolling out in preview for developers via the Gemini API and Google AI Studio, for enterprises on Vertex AI, and for Workspace users through Google Vids.

The model produces more natural, expressive speech with native multi-speaker dialogue and support for more than 70 languages. On the Artificial Analysis TTS leaderboard it achieved an Elo score of 1,211 and is placed in the “most attractive quadrant” for combining high-quality output with low cost.

New audio tags let users embed natural-language commands in text to steer vocal style, pace and delivery. Google AI Studio adds configurable controls that place the developer in the “director’s chair,” including scene direction for context, speaker-level Audio Profiles and Director’s Notes to toggle pace, tone and accent, and seamless export of parameters as Gemini API code.

gemini 3.1, flash tts, text-to-speech, gemini api, ai studio, vertex ai, google vids, multi-speaker, audio tags, 70 languages