Skip to content

Update llama-guard-4.md #2837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 30, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions llama-guard-4.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ authors:
- user: pcuenq
---

TLDR; Today, Meta releases Llama Guard 4, a 12B dense (not a MoE!) multimodal safety model, and two new Llama Prompt Guard 2 models. This release comes with multiple open model checkpoints, along with [an interactive notebook](https://github.com/huggingface/huggingface-llama-recipes/blob/main/llama_guard/llama-guard-4.ipynb) for you to get started easily 🤗. Model checkpoints can be found in [Llama 4 Collection](https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164).
TL;DR: Today, Meta releases Llama Guard 4, a 12B dense (not a MoE!) multimodal safety model, and two new Llama Prompt Guard 2 models. This release comes with multiple open model checkpoints, along with [an interactive notebook](https://github.com/huggingface/huggingface-llama-recipes/blob/main/llama_guard/llama-guard-4.ipynb) for you to get started easily 🤗. Model checkpoints can be found in [Llama 4 Collection](https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164).

## Table-of-Contents

Expand All @@ -25,13 +25,13 @@ TLDR; Today, Meta releases Llama Guard 4, a 12B dense (not a MoE!) multimodal sa

Vision and large language models deployed to production can be exploited to generate unsafe output through jail breaking image and text prompts. Unsafe content in production varies from being harmful or inappropriate to violating privacy or intellectual property.

New safeguard models address this issue by evaluating image and text, and the content generated by the model. User messages classified as unsafe are not passed to vision and large language models, and unsafe assistant responses can be filtered out by production services
New safeguard models address this issue by evaluating image and text, and the content generated by the model. User messages classified as unsafe are not passed to vision and large language models, and unsafe assistant responses can be filtered out by production services.

Llama Guard 4 is a new multimodal model designed to detect inappropriate content in images and text, whether used as input or generated as output by the model. It’s a **dense** 12B model *pruned* from Llama 4 Scout model, and it can run on a single GPU (24 GBs of VRAM). It can evaluate both text-only and image+text inputs, making it suitable for filtering both inputs and outputs of large language models. This enables flexible moderation pipelines where prompts are analyzed before reaching the model, and generated responses are reviewed afterwards for safety. It can also understand multiple languages.
Llama Guard 4 is a new multimodal model designed to detect inappropriate content in images and text, whether used as input or generated as output by the model. It’s a **dense** 12B model *pruned* from Llama 4 Scout model, and it can run on a single GPU (24 GB of VRAM). It can evaluate both text-only and image+text inputs, making it suitable for filtering both inputs and outputs of large language models. This enables flexible moderation pipelines where prompts are analyzed before reaching the model, and generated responses are reviewed afterwards for safety. It can also understand multiple languages.



The model can classify 13 types of hazard defined in the MLCommons hazard taxonomy, along with code interpreter abuse.
The model can classify 14 types of hazard defined in the MLCommons hazard taxonomy, along with code interpreter abuse.


| | |
Expand Down