Meta has introduced COCONUT, a continuous chain of thought, challenging traditional CoT and enhancing logical reasoning capabilities.

thought). Its core idea is to use the last hidden state of LLMs as the representation of the reasoning state (referred to as "continuous thought") instead of decoding it into word tokens. These continuous representations are directly fed back as the next input embedding, enabling reasoning in the continuous space.

Previously, the reasoning of large language models (LLMs) has been limited to the "linguistic space," where the reasoning process is typically expressed through Chain of Thought (CoT) to solve complex problems. However, linguistic space is not always the optimal choice for reasoning. For example, most word tokens are primarily used to maintain textual coherence and are irrelevant to reasoning, while some key tokens require complex planning, posing significant challenges to LLMs.

Comparison between CoT and Coconut

CoT（Chain-of-Thought）

Reasoning Mode: CoT expresses the reasoning process by generating a sequence of word tokens, for example: [x
Limitations:

Reasoning is restricted to the "language space."
Most word tokens are used to maintain text coherence but are not crucial to the reasoning process.
Key tokens may require complex planning, posing a challenge for LLMs.

Coconut（Chain of Continuous Thought）

Reasoning patterns:

The model's last hidden state is used as the representation of the reasoning state (referred to as "continuous thinking").
Directly use the "continuous thinking" as the embedding for the next step input instead of decoding it into word tokens.
Reasoning occurs in an unrestricted latent space rather than in linguistic space.

Advantages:

The model can explore multiple potential reasoning paths.
.
This avoids the limitation of prematurely determining a single path in traditional CoT.

An example of the location where continuous thought is decoded into language tokens

Case Study: ProsQA

The model trained with CoT fabricated an edge ("every yumpus is a rempus") when encountering a dead end. The path output by Coconut (k=1) ends with an unrelated node, whereas Coconut (k=2) correctly resolved the issue.

The training process of Coconut

In the training data containing language reasoning steps, additional chain-of-thoughts (e.g., c=1 in this case) are added at each training stage, and one language reasoning step is removed. Subsequently, the remaining language tokens (after the chain-of-thoughts) are trained using cross-entropy loss.

Results of three datasets

Comparison of GSM8k, ProntoQA, and ProsQA: Higher accuracy indicates stronger reasoning ability of the model, while generating fewer tokens represents higher reasoning efficiency.

Selected X online netizen comments

Changing the base often brings new insights. I feel like I just fell off a coconut tree. 😉

Continuous thinking might redefine our understanding of emergent reasoning in AI.

This is truly a very interesting development! The concept of continuous thinking is fascinating and could significantly enhance the reasoning capabilities of large language models (LLMs).

One question leads to more questions.