Llama 4 Models ‘Herd’: What to Know about Meta’s new Open-Source Multimodal AI

New open-source models announced by Meta include Scout, Maverick & Behemoth

Meta has officially released its latest family of open-source large language models — Llama 4 Scout, Llama 4 Maverick, and the forthcoming Llama 4 Behemoth. Perhaps most notably sporting massive parameter counts and context windows, here’s the latest you need to know about Meta’s newest open-source models.

Key Features of the Llama 4 Series

1. Multimodal Capabilities

Llama 4 models are designed to process and reason over text, image, video, and audio inputs. While previous models in the Llama line focused primarily on language, the Llama 4 family moves decisively into true multimodal territory, setting up direct competition with OpenAI’s GPT-4 and Google’s Gemini.

2. Extended Context Windows

Llama 4 Scout supports up to 10 million tokens of context — among the largest in the field. This enables applications like multi-document Q&A, comprehensive codebase summarization, and long-horizon planning for AI agents.

3. Introduction of Mixture-of-Experts (MoE) Architecture

For the first time, Meta has implemented a Mixture-of-Experts (MoE) architecture in its Llama models. This design allows the model to activate only a subset of its parameters for a given task, enhancing computational efficiency and enabling the scaling of model capacity without a proportional increase in computational cost.

Competitors such as Google’s Gemini and DeepSeek’s V3 have also adopted MoE architectures, reflecting an industry-wide shift toward this efficient design paradigm. While there have been speculations that OpenAI’s GPT-4 employs an MoE architecture, OpenAI has not publicly confirmed these details.

4. Open Weights with Guardrails

Similar to prior Llama models, Scout and Maverick are released under a source-available license. Developers have access to the model weights, but usage is governed by Meta’s Responsible Use Guide to prevent misuse.

Meet the Models: Llama 4 Scout, Maverick, and Behemoth

• Llama 4 Scout

Designed for performance and efficiency, Scout activates 17B parameters per forward pass using an MoE architecture with 16 total experts (109B parameters total). It’s optimized to run on a single NVIDIA H100 GPU, making it highly accessible for independent developers and startups.

• Llama 4 Maverick

With 17B active parameters and a total capacity of 400B across 128 experts, Maverick is Meta’s flagship multimodal model. It supports image understanding and fluent multilingual reasoning in 12 languages. Early demonstrations show impressive performance across visual question answering, dense summarization, and even story generation from image inputs.

• Llama 4 Behemoth (Coming Soon)

Currently in training, Behemoth is expected to activate 288B parameters across a massive 2T parameter space. Meta claims it already outperforms Claude Sonnet 3.7 and GPT-4.5 on STEM reasoning and retrieval benchmarks. Its release will signal Meta’s entry into the ultra-large model tier dominated by OpenAI.

Meta’s latest models have been met with broad excitement in the open-source and research communities. The release of open-weight models that support high-token contexts and visual reasoning is seen as a strong statement of intent, especially as major rivals shift toward closed systems.

However, some early critiques have emerged. Researchers noted limited documentation around fine-tuning practices and raised questions about benchmark transparency. Others pointed out that Scout and Maverick don’t yet match GPT-4 or Claude 3 Opus in general-purpose reasoning, particularly in multilingual and coding tasks. Notably, Meta’s longtime VP of AI Research, Joelle Pineau, recently announced her departure from the project. Pineau has led FAIR (Meta’s Fundamental AI Research team) for eight years and was instrumental in building the Llama program.

Llama 4 models, especially Scout and Maverick, open up new opportunities for developers looking to build advanced agents and real-time AI experiences. With context windows stretching into the millions and support for image inputs, these models are particularly well-suited for:

Complex document reasoning
Media-rich assistant tools
Data-to-answer pipelines for live, web-connected agents

Build with Llama + Dappier

At Dappier, we see Llama 4 as further evidence that AI is shifting toward agentic systems operating across diverse modalities. With Dappier, those agents can be improved and made more reliable with real-time data — news, finance, weather, and more — syndicated directly into LLM workflows.

By pairing trusted, real-time data with models like Llama 4, AI builders can deliver more accurate, useful, and monetizable experiences — without needing to build fragile scraping pipelines or rely on outdated summaries.

Ready to make your application AI-ready?

Explore how Dappier enables AI builders to integrate real-time data with open-weight models like Scout and Maverick today.

→ Browse our Documentation

→ Schedule a Demo