Gemini 3.1 Flash-Lite: fast, low-cost model for high-volume workloads

Gemini 3.1 Flash-Lite: fast, low-cost model for high-volume workloads — Google DeepMind News
Source: Google DeepMind News

Gemini 3.1 Flash-Lite is now available in preview to developers via the Gemini API in Google AI Studio and to enterprises via Vertex AI. Built for high-volume developer workloads at scale, it is the fastest and most cost-efficient model in the Gemini 3 series. Priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, Flash-Lite delivers enhanced performance at a fraction of the cost of larger models.

It achieves a 2.5× faster Time to First Answer Token and a 45% increase in output speed over 2.5 Flash on the Artificial Analysis benchmark, while maintaining similar or better quality. The model also posts strong benchmark results, with an Elo score of 1432 on the Arena.ai Leaderboard and high marks across reasoning and multimodal understanding tests, including 86.9% on GPQA Diamond and 76.8% on MMMU Pro, even surpassing some prior larger Gemini models.

gemini 3.1, flash-lite, gemini api, ai studio, vertex ai, token pricing, output speed, latency, gpqa diamond, mmmu pro