An open-source video generation model similar to Sora

This project was jointly initiated by the PKU-Tuzhan AIGC Lab, aiming to reproduce Sora (OpenAI's text-to-video generation model). It is hoped that the open-source community can contribute to this project. The Apache-2.0 license is used.

You can use the Demo here: https://huggingface.co/spaces/LanguageBind/Open-Sora-Plan-v1.1.0

Image generation usually requires 50 steps, and video generation may require 150 steps to produce good results, which might take 3-4 minutes. Therefore, the process of generating a 2-second video is also slow.

prompt: Extreme close-up of chicken and green pepper kebabs grilling on a barbeque with flames. Shallow focus and light smoke. vivid colours

prompt: A robot dog trots down a deserted alley at night, its metallic paws clinking softly on the cobblestones, the glow of its LED eyes piercing the darkness. Occasionally, it pauses to scan its surroundings with a soft, whirring sound.