Advertisement

Meta releases new model: Large Concept Models (LCM)

Recently, I've been busy following the competition between Google and OpenAI, and just noticed that Meta launched a new model on December 11, 2024 —— **Large Concept Models (LCM)**.

"This kind of 'concept' is language and modality-independent, and can represent a higher-level idea or action in a process.

The core difference between LCMs and LLMs

  1. High-dimensional embedding space

  • for modeling in.
  • This approach is more suitable for multi-modal and multilingual tasks.
  • Concept-level modeling

    • LCMs represent semantic units as **Concepts**, which are abstract and independent of language and modality.
    • Encoding, thus more efficiently understanding the context and meaning.

    Key technologies and innovations

    1. Concept definition and representation

    • Assume that a "concept" corresponds to an entire sentence.
    • Leveraging the existing sentence embedding space (SONAR), it supports up to 200 languages, covering both text and speech modalities.
  • Model Design

    • Training LCM through the SONAR embedding space to achieve
    • Exploring various generation methods, including MSE regression, diffusion-based generative variants, and models operating in the quantized SONAR space.
  • Model Scale

    • The preliminary experiment used a model with 1.6B parameters, with a training data scale of 1.3 trillion tokens.
    • It was further expanded to 7B parameters, with the training data reaching 7.7 trillion tokens.
  • Tasks and Performance

    • (New task).
    • ,especially in multilingual settings, outperforms existing LLMs of the same scale.
  • Open source

    • The training code for the model has been open-sourced to support community research.
    • Link: https://github.com/facebookresearch/large_concept_model

    Method Exploration

    1. MSE Regression

    • and this implementation is already included in the code repository.
  • Diffusion-based Generative Variants

    • , have also been included in the code release.
  • Models for quantizing SONAR space

    • The model operates within the quantized SONAR embedding space; this method has not yet been released but will be added in future updates.

    Reproduction and fine-tuning

    The code repository provides complete steps and configurations to support the reproduction of the training and fine-tuning of the following models:

    • 1.6B parameter MSE LCM
    • Two-tower Diffusion LCM.

    Internet user comments

    Expectation and potential

    1. "If successful, it will open up countless possibilities."

      “This is it. if this goes well, this will open a lot of possibilities. This should be the place of Self-Learning AI.”
      —— @nojukuramu (Reddit)

      an important milestone. Compared to traditional token-level modeling, the abstract semantic representation of LCM can bring new breakthroughs to the thinking and reasoning abilities of AI systems.

    2. Abstraction first, then concretization

      “This is what I mean by planning in latent space. Think abstractly before concretizing to language.”
      —— @fredcunningham_ (X.com)

      LCM plans concepts in the latent space and then materializes them into language, a pattern that takes the simulation of human thought further. It goes beyond mere token manipulation, offering more possibilities for flexibility in complex tasks.

    Comparison with other models

    1. LCM vs O1: A more fundamental transformation

      “O1 is not a fundamental change from LLMs in the same way as LCM could be. It's just a waste of more tokens in hopes of a pseudo system 2.”
      —— @NunyaBuzor (Reddit)

      Comparing LCM with the O1 model, it is believed that O1 remains a superficial improvement on tokens, whereas LCM could represent a more fundamental paradigm shift. It breaks free from the constraints of traditional language models and focuses on higher-level semantic reasoning.

    2. Token vs Concept

      “Tokens are the low level layer, concepts are the higher level abstraction. It's like learning about concepts from transformers that were only dealing with token level and then staying at the concept level for generation and encoding.”
      —— @ethermelody (Reddit)

      Tokens are basic units, while concepts are higher-level abstractions. The idea behind LCM is to process language in a way closer to human thinking, achieving deeper semantic expression through the generation and encoding of concepts.

    Future Prospects

    1. Impact on AGI

      “LLMs should think more like humans to achieve AGI.”
      —— @Yuchenj_UW (X.com)

      Human-level artificial general intelligence (AGI) requires a thought process closer to that of humans. The emergence of LCM offers new possibilities for this goal.

    2. The Future of the Llama Series

      “Llama 4 is going to be 🔥🔥🔥🔥”
      —— @pbadeer (X.com)

      For Meta's AI ecosystem, it is anticipated that the technology of LCM will be applied to future versions of the Llama series, further advancing the evolution of language models.