Google's AI event: Multimodal masterpiece makes a stunning appearance

I'm still happy to see my former company's AI getting better and better～

Read the highlights first

Gemma 3 multilingual capabilities take the spotlight: Google has released the multimodal model Gemma 3, with parameter scales ranging from 1B to 27B and a context window of up to 128K, supporting over 140 languages. The community is highly anticipating its potential to run on a single GPU or TPU.
Gemini 2.0 Flash text-to-image generation becomes more direct：Gemini 2.0 Flash adds a native image generation feature, allowing users to directly generate context-related images within the model. Developers can try it out through Google AI Studio.
Gemini Robotics brings AI into real life：Google showcased Gemini Robotics on YouTube, demonstrating an advanced vision-language-action model that enables robots to interact with the real world more naturally.

Gemma 3 Model: A heavyweight new breakthrough in the open-source field

As an open-source model,LMArenaAchieved excellent results in benchmark testing.

The release of Gemma 3 also refreshed the Pareto frontier for similar models, significantly leading other models of the same scale.

At the same time, Gemma 3 incorporates visual capabilities as one of its core functions, fully adopting the characteristics of the previous PaliGemma model (while ShieldGemma continues to exist as a separate branch).

Gemini 2.0 Flash text-to-image generation

Gemini 2.0 Flash now offers native image generation capabilities, allowing users to directly create images closely related to textual content within the model. Although the interface is relatively complex, once you find the entry point, image editing becomes simpler than ever before.

Kaushik Shivakumar, a member of the Google developer team, said: "We are very excited to publicly release Gemini's native image generation capabilities. It is still in the experimental stage. We have made a lot of progress, but we also look forward to further feedback from everyone!"

Google engineer Mostafa Dehghani excitedly stated: "In this team, every day is extraordinary, full of chaos yet brimming with creativity!"

The creator community expressed admiration for the performance of Gemini 2.0 Flash and looks forward to seeing models like Gemma with image generation capabilities in the future.

Gemini Robotics Model

Google DeepMind has launched a new generation of robot AI models based on Gemini 2.0 - Gemini Robotics, emphasizing its reasoning ability, interactivity, flexibility, and generalization ability.

Google DeepMind announced a collaboration with Apptronik to jointly develop humanoid robots powered by Gemini 2.0 technology.

Gemini Robotics-ER enables robots to leverage the embodied reasoning capabilities of the Gemini model for object detection, interaction recognition, and obstacle avoidance. Through this technology, robots perform twice as well on benchmark tests compared to current state-of-the-art models, demonstrating remarkable generalization abilities.

Google DeepMind is collaborating with Apptronik to develop humanoid robots based on Gemini 2.0, including Apptronik's Apollo robot. Gemini Robotics-ER allows robots to efficiently utilize Gemini's embodied reasoning technology, enabling object recognition, interaction perception, and obstacle avoidance.

Google DeepMind emphasizes that the goal of the Gemini Robotics model is to enable robots to more naturally and flexibly adapt to diverse task environments, achieving truly intelligent interactions.