Meta's latest LLM release: Llama 3.2 Lightweight and Multimodal.

Thanks to Faith's invitation, I attended the Meta Connect conference in the U.S. Faith was very nice and provided me with a lot of information about AI-related sub-forums, but I myself couldn't find the sub-venue for the Llama launch. Still, I watched the introduction of the latest model, Llama 3.2, outside the venue using my phone connected to Singapore’s number (since I also couldn’t find WiFi). (Had a lonely experience 😂)

Version

and

Lightweight (Lightweight) 1B and 3B models

These are Meta's lightest and most efficient models, capable of running on mobile devices and edge devices. They excel in multilingual text generation and tool invocation capabilities. These models empower developers to build personalized and autonomous applications directly on-device, with strong privacy protection, ensuring that data does not leave the device. For example, an application could help summarize the last 10 received messages, extract key to-do items, and send calendar invitations via tool calls to schedule follow-up meetings.

There are two main advantages to running these models locally. First, prompts and responses are almost instantaneous because processing occurs locally. Second, local operation can maintain privacy by avoiding uploading messages and calendar data to the cloud, making the entire application more privacy-friendly. Since processing is completed locally, the application can clearly control which queries remain on the device and which queries need to be handled by larger models in the cloud.

Multimodal (Multimodal) 11B and 90B models

Supports image reasoning scenarios, such as document-level understanding (including charts and graphs), image caption generation, and visual localization tasks (such as accurately locating objects in images based on natural language descriptions).For example, users can ask which month had the best sales for their small business last year, and Llama 3.2 can infer from existing charts and quickly provide answers.Another example is that the model can perform map reasoning, answering questions like which hiking trail will become steeper or the distance of a specific trail marked on the map.The 11B and 90B models can bridge the gap between vision and language, extracting image details, understanding scenes, and generating concise descriptions to serve as captions for images, helping to better tell stories.

Llama Stack

Llama Stack provides a complete set of seamless toolchains to help build autonomous applications.

This codebase contains the API specifications for Llama Stack, as well as the API providers and Llama Stack release versions. Llama Stack aims to define and standardize the core modules for building generative AI applications, covering the entire development lifecycle: from model training and fine-tuning to product evaluation, and then building and running AI agents in production environments. In addition to defining standards, Meta is also developing providers for the Llama Stack API, including both open-source versions and collaborations with multiple parties, ensuring that developers can piece together AI solutions through consistent modules across platforms.

The full release content includes:

: Used for building, configuring, and running Llama Stack releases
: Including Python, Node, Kotlin, and Swift
: Supporting Llama Stack Distribution Server and Agents API Provider
Multiple releases：

Single-node Llama Stack release: Provided through Meta internal implementation and Ollama
Cloud Llama Stack release: Supporting AWS, Databricks, Fireworks, and Together
On-device Llama Stack release: Implemented on iOS through PyTorch ExecuTorch
Locally deployed Llama Stack release: Supported by Dell