Google launches its own AI model designed for reasoning.

Dec 20, 2024

Discover Google's newly launched AI model, specifically designed to enhance reasoning capabilities.

Google launches its own AI model designed for reasoning.

Google's New Reasoning AI Model: Gemini 2.0 Flash Thinking Experimental

Google has entered the arena of reasoning AI models with the release of its experimental model, Gemini 2.0 Flash Thinking Experimental. Built upon the Gemini 2.0 Flash model, this new AI aims to improve accuracy and decision-making by incorporating a self-fact-checking mechanism. Unlike traditional LLMs, Gemini 2.0 Flash Thinking Experimental pauses to consider related prompts and explains its reasoning process before delivering a summarized answer. This approach, while potentially more accurate, comes at the cost of increased processing time, often taking seconds to minutes to respond

Gemini stage presentation at Made by Google 24

Initial testing reveals mixed results. While the model demonstrates promise in tackling complex problems in programming, math, and physics, it also exhibits limitations. For example, in one instance, it incorrectly counted the number of "R"s in the word "strawberry." This highlights the ongoing challenges in developing truly robust reasoning AI.

The launch of Gemini 2.0 Flash Thinking Experimental follows a recent trend in the AI industry, with other companies like OpenAI (with its o1 model) and DeepSeek also developing reasoning models. This surge in development is partly driven by the diminishing returns of simply scaling up traditional AI models through brute force methods.

Google's model card describes Gemini 2.0 Flash Thinking Experimental as "best for multimodal understanding, reasoning, and coding," capable of "reasoning over the most complex problems." However, current limitations include a 32k token input limit, text and image input only, an 8k token output limit, text-only output, and no built-in tool usage. Google acknowledges the experimental nature of the model and is actively seeking feedback for further refinement.

Google's AI Model Portfolio: A Broader Look

Beyond Gemini 2.0 Flash Thinking Experimental, Google offers a diverse range of AI models accessible through its AI Studio platform. These models cater to various needs and applications:

Gemini 2.0 Flash Experimental: A workhorse model prioritizing low latency and enhanced performance, featuring native image generation and text-to-speech capabilities.
Gemini 1.0 Ultra: The largest model, designed for highly complex tasks, excelling in multimodal reasoning and complex coding.
Gemini 1.5 Pro: Optimized for reasoning across large amounts of information, adept at complex reasoning and reasoning across modalities.
Gemini 1.0 Pro: A versatile model for scaling across a wide range of tasks, with strengths in complex reasoning systems and advanced audio understanding.
Gemini 1.0 Nano: The most efficient model, ideal for on-device tasks, excelling in summarization, reading comprehension, and text completion.
PaLM 2: A next-generation language model with improved multilingual, reasoning, and coding capabilities.
Imagen: A text-to-image model capable of generating high-quality images and understanding natural language prompts.
Codey: Generates code based on natural language descriptions, supporting code completion, generation, and chat functionalities.
Chirp: A universal speech model for automatic speech recognition (ASR) in over 100 languages.
Veo: A generative video model capable of creating high-quality videos from text prompts.
MedLM: Fine-tuned for healthcare applications, aiding in medical information access, analysis, and application.
LearnLM: Designed for education, infused with learning capabilities and grounded in pedagogical evaluations.
SecLM: A security-specialized AI API tuned for security-specific tasks, leveraging threat intelligence from various sources.
Gemma: A family of lightweight, open-source models built with the same research and technology as Gemini models, prioritizing responsible AI development.
CodeGemma: Lightweight open-source code models built on Gemma, performing tasks like code completion, generation, and chat.
RecurrentGemma: A technically distinct model using recurrent neural networks for improved memory efficiency and higher throughput.
PaliGemma: Google's first multimodal Gemma model, designed for class-leading fine-tune performance across diverse vision-language tasks.