Voyager: Embodied agent in Minecraft

Today I read a paper about NVIDIA's Voyager, which was developed by NVIDIA’s MineDojo team. It uses large language models (LLM) to drive embodied agents in the Minecraft world. The leader of this team is Jim (Linxi) Fan, who is an AI influencer and reportedly was OpenAI's first intern. I attended one of his sharing sessions at a ZhenFund event held in the U.S., where he mentioned that he was responsible for Game AI at NVIDIA.

The three most starred projects on MineDojo's GitHub are:

Foundation Model，MineCLIP
The main project
, namely An Open-Ended Embodied Agent with Large Language Models

Voyager can continuously explore the world, acquire diverse skills, and make new discoveries without human intervention. Paper link: https://voyager.minedojo.org/

Project Background

As the first LLM-driven Open-Ended Embodied Agent in Minecraft, Voyager continuously explores, masters skills, and makes new discoveries.

【Three Key Components】

, taking into account the progress of exploration and the status of the agent to maximize the scope of exploration.

, used for storing and retrieving complex behaviors. When a new task emerges, new skills are added to the top of the library, and queries are executed at the bottom to identify the top five relevant skills.

, to improve the program.

, for example, GPT-4 realizes it should make a wooden axe instead of an acacia axe.

Voyager interacts with GPT-4 through blackbox queries, avoiding the need for fine-tuning model parameters. The skills developed by Voyager are temporally continuous, strongly interpretable, and compositional, which accelerates its capability accumulation and alleviates the problem of catastrophic forgetting.

In contrast

Compared to the best techniques in the past, let's see the effects:

times
times
times.