Advertisement

Google announces the launch of the new AI model Gemini 2.0

Recently, the AI circle has been as lively as during the Spring Festival.OpenAI has released 12 major updates in a row, and while I was still digesting these new developments, Google announced the launch of its new AI model Gemini 2.0.Paula sent it to me this morning.As we all know, Google usually enters Christmas holiday mode at the end of each year, with many important updates often postponed until the next year.However, this December, Google unusually broke the tradition by choosing to launch Gemini 2.0 at this time.

CEO's Open Letter ✉️

In the open letter from Google and Alphabet CEO Sundar Pichai, he shared Google's vision for organizing information and developing AI over the years, and announced the release of the new generation AI model Gemini 2.0.

Entering the Agent Era

" (it feels like the whole Silicon Valley is emphasizing AI Agents, especially AI Code Engineers), which means allowing AI models not only to understand the surrounding world but also to foresee multi-step actions and take them under user supervision. Based on this vision, Google launched the new generation model Gemini 2.0, whose features include:

  • : Supporting native image and audio output to make interactions more natural.
  • : Enhancing AI's tool usage capability, laying the foundation for developing new intelligent agents.

Starting today, Gemini 2.0 will be available to developers and trusted users, and new features such as Deep Research will be introduced through Gemini Advanced. This function leverages AI's advanced reasoning and long context processing capabilities to help users research complex topics and generate detailed reports.

AI-driven Search

Google Search has become one of the most significantly transformed areas by AI. Currently, AI-based search overviews have reached 1 billion users, becoming a very popular feature. With the advanced reasoning capabilities of Gemini 2.0, the future of search will include:

  • questions.
  • issues.
  • expanding to more countries and

Trillium TPU

Behind Gemini 2.0 lies Google's decade-long full-stack innovation accumulation in the field of AI technology:

  • (Trillium) for training and inference.
  • It will also be made available to external customers starting today, supporting more developers in building AI applications.

If the focus of Gemini 1.0 was on organizing and understanding information, then the goal of Gemini 2.0 is to make information more practical.

If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful.

- Sundar

Gemini 2.0 Flash: The Peak of Speed and Performance

Gemini 2.0 Flash is an upgraded version based on 1.5 Flash. As a model deeply loved by developers, 1.5 Flash has been widely praised for its fast response and excellent performance. 2.0 Flash further improves performance and achieves breakthroughs in the following areas:

  • (faster and stronger): In key benchmark tests, 2.0 Flash even outperforms 1.5 Pro, while being twice as fast in response speed.

  • ability (capable of using tools): Supports calling Google Search, code execution, and user-defined third-party functions.

  • functionality, bringing a new interactive experience to developers and users.

Try Gemini 2.0

I took a look at others' trial effects on Twitter, and they seem pretty good:

I am also a Gemini Advance user, but my results are not good, I wonder if it's because of how I use it:

Multimodal Live API: A New Tool for Dynamic Interactive Applications

, whose features include:

  • Real-time audio and video stream input
  • , supporting the development of more complex applications.
Try Multimodal Live

I found someone else's Demo on x.com to share:


Deep Research

This function leverages AI's advanced reasoning and long context processing capabilities to help users research complex topics and generate detailed reports. It possesses the following outstanding abilities:

  • to complete complex tasks.
  • : Create a step-by-step research plan and allow users to approve it.
  • : Browse information via Google Search in a human-like manner.
  • : Generate a comprehensive research report with credible source links attached.
  • .

The launch of this functionality provides a new way for information retrieval and knowledge generation, particularly suitable for scenarios requiring in-depth analysis and reliable information.

Try Deep Research

Several Agent Use Scenarios:

Project Astra: Empowering Real-world AI Assistants with Multimodal Understanding

  1. Stronger conversational ability: Project Astra can engage in multilingual and mixed-language conversations, understanding user needs more accurately. It enhances the understanding of different language accents and rare words, making conversations more natural and smooth.
  2. New tool invocation function: Google Search, Google Lens, Google Maps
  3. Better memory function: In a single session, the memory duration is extended to 10 minutes, supporting more complex contextual conversations. It can remember users' past conversations, laying the foundation for personalized interactions while maintaining user control over memory content.
  4. Improved latency performance: Using new streaming technology and native audio understanding capabilities, Project Astra's language processing latency has approached human conversation levels, providing users with real-time and smooth interaction experiences.

Project Mariner: An AI Agent Assisting with Complex Tasks

Project Mariner is an early research prototype built on Gemini 2.0, aiming to explore future interaction methods between humans and AI agents. Starting with browsers, this project demonstrates how complex web tasks can be completed through multimodal understanding and reasoning. As an experimental Chrome extension, Project Mariner uses this information to assist users in completing tasks.

  • Outstanding task performance: In the WebVoyager benchmark test (evaluating the performance of agents in end-to-end real web tasks), Project Mariner achieved an industry-leading result of 83.5%, performing exceptionally well as a single-agent setup.
  • Task navigation within the browser: Project Mariner showcases the technical feasibility of completing tasks within the browser, although the current task completion speed is slow and accuracy is unstable, but these aspects are expected to improve rapidly in the future.

Jules: Developers' AI Code Assistant

Jules is an experimental AI-driven code agent designed to provide developers with intelligent collaboration tools. By directly integrating into GitHub workflows, Jules can perform the following tasks under the guidance and supervision of developers:

  • Problem solving: Identifying specific issues in the codebase and proposing solutions.
  • Plan formulation: Developing detailed implementation plans according to task requirements.
  • Task execution: Implementing code changes and driving the completion of the development process.

Intelligent Agents in Gaming

Google DeepMind has been committed to advancing AI models in rule-following, planning, and logical reasoning through gaming. For example, just last week, we launched Genie 2 — an AI model capable of generating infinitely diverse 3D game worlds from a single image (introduced yesterday:). Leveraging this tradition and accumulation, DeepMind built new AI agents based on Gemini 2.0, providing intelligent support for navigating virtual worlds in video games.

These AI agents possess the following key functions:

  • Screen reasoning capability: Understand the current situation by observing the game screen without additional data.
  • Real-time dialogue suggestions: Interact with players through natural language, providing suggestions for the next steps to help players better complete challenges.

Google is collaborating with top game developers, such as Supercell, to test the performance of these intelligent agents in different game types:

  • Strategy games: Such as "Clash of Clans," testing the agent's interpretation and suggestion capabilities for complex rules.
  • Casual simulations: Such as "Hay Day," evaluating its understanding and optimization of relaxing games. Through these tests, we hope to understand AI's adaptability in different game rules, challenges, and goals.

Capabilities of Intelligent Agents in Robotics

Beyond exploring the capabilities of AI agents in virtual worlds, we are also attempting to apply Gemini 2.0's spatial reasoning capabilities to robotics, providing support for physical world task execution.

Based on Gemini 2.0's spatial reasoning capabilities, future AI agents may possess the following capabilities:

  • Understanding and manipulating objects in the physical environment.
  • Assisting in completing high-precision practical tasks, such as assembly, transportation, or navigation.
  • Achieving higher-level human-machine collaboration, enhancing efficiency in industrial, service, and daily life contexts.