Google has unveiled a preview of Gemini 2.5 Flash-Lite, a reasoning mannequin optimized for price and pace, and introduced that two different Gemini fashions, Gemini 2.5 Professional and Gemini 2.5 Flash, at the moment are typically accessible.
Google made the bulletins June 17. Gemini 2.5 fashions are considering fashions, able to reasoning by ideas earlier than responding, leading to enhanced efficiency and improved accuracy, Google mentioned.
Gemini 2.5 Flash-Lite has the bottom price and lowest latency within the Gemini 2.5 mannequin household, Google mentioned. Flash-Lite is a reasoning mannequin that allows dynamic management of the considering price range through an API parameter, however as a result of Flash-Lite is optimized for low latency and low price, considering is turned off by default. This mannequin is “nice” for prime throughput duties comparable to classification or summarization at scale, Google mentioned. Constructed as an improve to Gemini 1.5 Flash and a pair of.0 Flash fashions, Gemini 2.5 Flash-Lite gives higher efficiency throughout most evals and decrease time to the primary token, whereas additionally attaining larger tokens per second decode, in accordance with Google. Every Gemini 2.5 mannequin has management over the considering price range, giving builders the flexibility to decide on when and the way a lot the mannequin thinks earlier than producing a response.