Trial of Google's video generation model VOE2

Google released a video generation model at the end of last year.VOE2It can create videos with realistic motion and rich details, supporting up to 4K resolution. Users can explore diverse visual styles through a variety of lens control options and easily create personalized works.

I was on the waiting list before, and today I saw that the VOE2 model is available onthe Fal platformfor open use, so I tried it:

🔗 https://fal.ai/models/fal-ai/veo2/playground

However, the price is a bit high:

To generate a 5-second video requires payment of$2.50；
and for every additional second beyond that, an extra payment of$0.50。

I suggest everyone plan the length and content of their videos in advance to efficiently and economically experience this powerful video generation technology.

I tried text-to-video

A professional gymnast in a brightly lit, modern indoor gymnastics arena performing an elegant and dynamic floor routine. The gymnast gracefully executes a sequence of flips, leaps, and spins, demonstrating precise body control, flexibility, and strength. Capture fluid motion and realistic details such as the gymnast's muscular definition, attire (a sleek gymnastics leotard), expressive posture, and confident facial expression. Use cinematic camera angles and smooth transitions to showcase the athletic artistry clearly, maintaining a polished, high-quality visual aesthetic.

and also tried image-to-video

A close-up video of a man enthusiastically eating a freshly cooked fish meal in a casual dining environment. The man shows vivid facial expressions, chewing energetically and clearly enjoying the taste, occasionally nodding in approval. Realistic details include mouth movements, expressive eyes, and subtle gestures of delight. Maintain cinematic lighting with natural colors, emphasizing a lively and immersive atmosphere.

A new standard for quality and control

VOE2 has strong understanding and execution capabilities, accurately realizing simple or complex instructions, realistically simulating physical effects, and presenting rich visual styles.

Ultra-high realism and detail representation
Compared to other AI video models, VOE2 shows significant advantages in detail portrayal, enhancing realism, and reducing visual artifacts.
Advanced motion simulation capability
Thanks to its understanding of physical rules, VOE2 can precisely present various action details and accurately execute video instructions.
More diverse lens controls
Users can achieve various lens styles, angles, and action combinations through precise instructions.

Performance evaluation

In direct comparison tests conducted by human evaluators across several top video generation models, VOE2 stood out. In the MovieGenBench benchmark dataset released by Meta, participants evaluated a total of 1003 video generation prompts and their corresponding video results. VOE2 ranked first in overall preference and accuracy in executing instructions.

All video tests were based on 720p resolution. In the comparison, VOE2 videos lasted 8 seconds, VideoGen lasted 10 seconds, and other models lasted 5 seconds, with all complete videos submitted to evaluators for viewing.