Veo 2 can generate realistic and high-quality video clips based on text or image prompts, with resolutions as high as
I am still applying for the whitelist. Let's take a look at the introduction first.Prompt: An extreme close-up shot focuses on the face of a female DJ, her beautiful, voluminous black curly hair framing her features as she becomes completely absorbed in the music. Her eyes are closed, lost in the rhythm, and a slight smile plays on her lips. The camera captures the subtle movements of her head as she nods and sways to the beat, her body instinctively responding to the music pulsating through her headphones and out into the crowd. The shallow depth of field blurs the background. She’s surrounded by vibrant neon colors. The close-up emphasizes her captivating presence and the power of music to transport and transcend.
Key features of Veo 2:
for clearer and more detailed images.
The model can understand camera control instructions in text prompts, such as:
Prompt: The camera floats gently through rows of pastel-painted wooden beehives, buzzing honeybees gliding in and out of frame. The motion settles on the refined farmer standing at the center, his pristine white beekeeping suit gleaming in the golden afternoon light. He lifts a jar of honey, tilting it slightly to catch the light. Behind him, tall sunflowers sway rhythmically in the breeze, their petals glowing in the warm sunlight. The camera tilts upward to reveal a retro farmhouse with mint-green shutters, its walls dappled with shadows from swaying trees. Shot with a 35mm lens on Kodak Portra 400 film, the golden light creates rich textures on the farmer’s gloves, marmalade jar, and weathered wood of the beehives.
Prompt: The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. Coffee pours in smooth, swirling motion into a crystal-clear cup, filling it with deep brown layers of crema. Scene ends with a camera swoop into a fresh-cut orange, revealing its bright, juicy segments in stunning macro detail.
Performance comparison: surpassing mainstream video generation models
- dataset released by Meta.
- it performed best in overall preference and accuracy in following prompts.


Comparison conditions:
- Video durations were respectively:
- Human evaluators watched the full video samples and scored their preferences.
Model limitationsAlthough Veo 2 has made significant progress in generating realistic, dynamic, and high-quality videos, there are still some challenges:
- remains relatively difficult in complex scenes or high-dynamic motion.
- videos requires further improvement. (For example, there is still a bug in the legs in the video below.)
Prompt: A tracking shot, with the subject centered in the frame, follows an ice skater gliding across an ice rink that appears to be floating amidst the clouds. The skater, clad in a flowing white costume that ripples with every move, exudes an ethereal grace. The camera smoothly keeps pace, capturing their every movement with a dreamlike quality. The background is a swirling canvas of pastel colors and soft, shifting clouds, creating a sense of otherworldly wonder. The skater's serene expression and the whisper-quiet sound of their blades on the ice add to the magical atmosphere. The overall impression is one of ethereal beauty and effortless movement, set against a backdrop of pure fantasy.
Netizen reviews
