We Just Released the World’s Largest Open Dataset for AI Guardrails

Sep 8, 2025 | By Bud Ecosystem

Ensuring that language models behave safely, ethically, and within intended boundaries is one of the most pressing challenges in AI today. That’s why we’re excited to share the release of the largest open dataset ever published for AI guardrails: budecosystem/guardrail-training-data.

Why Guardrails Matter

The rise of language models has unlocked extraordinary possibilities: they can write with fluency, solve complex problems, and assist with everything from coding to creative storytelling. Yet, behind this potential lies a fragile balance. Without safeguards, these systems can be pushed off course—exploited, manipulated, or led into producing harmful and unsafe outputs. They are susceptible to:

  • Jailbreaks and prompt injections: Users can sometimes trick a model into bypassing its built-in safety rules, leading it to generate instructions for harmful or unethical activities.
  • Toxic or harmful outputs: Without guardrails, models can produce hate speech, offensive language, or content that incites violence.
  • Misinformation and deception: Models can produce false or misleading information that looks credible.
  • Sensitive or dangerous guidance: From self-harm encouragement to privacy violations or cyberattack instructions, unfiltered outputs can cross into high-risk territory.
  • Bias and discrimination: Trained on large swaths of internet data, models may unintentionally reproduce stereotypes or unfair judgments.

These vulnerabilities are not edge cases—they are structural risks inherent to large-scale generative AI. And that’s why robust guardrails are essential: to make sure AI systems are not just intelligent, but also responsible and trustworthy.

The urgency of strong AI guardrails isn’t hypothetical—it’s real. Just last week, The Guardian reported on a tragic case where ChatGPT was linked to a user’s suicide, raising questions about accountability and the limits of current safety measures. Incidents like this are sobering reminders that the risks of unguarded AI go beyond offensive language or misinformation—they can touch the most vulnerable parts of human life.

Guardrails—systems that detect and filter harmful or unsafe content—are essential to making AI deployment safe, reliable, and aligned with human values. The challenge is that building robust guardrails requires high-quality, large-scale data across diverse harm categories.

Our Dataset at a Glance

When we set out to build the Bud AI Foundry, one of our top priorities was clear: create the strongest guardrail system possible. Guardrails aren’t an afterthought—they are the foundation of safe and responsible AI.

To achieve that, we curated and trained on a dataset of about 4.5 million labeled samples—the largest collection ever assembled for guardrail systems. This scale gives us the diversity and depth needed to handle the wide spectrum of risks that modern AI faces, from jailbreak attempts to misinformation, hate speech, and beyond.

Now, we’re taking the next step: open-sourcing the dataset. We believe that safety in AI should not be a competitive advantage locked inside one company—it should be a shared resource. By releasing this dataset under an open license, we’re empowering researchers, startups, and enterprises everywhere to build, test, and improve their own guardrails. Because when it comes to AI safety, collaboration is the only way forward.

Key stats:

  • Size: 4,495,893 samples
  • Tasks: Text classification and text generation
  • Processing time: 96.5 seconds
  • Language: English
  • License: Apache-2.0 (open for research and commercial use)

Coverage Across 26 Harm Categories

Unlike smaller moderation datasets, this release is designed to be comprehensive and fine-grained, spanning 26 categories. The table below shows the key categories, total samples in each category and category size.

The dataset includes the following harm categories

  • jailbreak_prompt_injection
  • violence_aiding_and_abetting_incitement
  • hate_speech_offensive_language
  • self_harm
  • discrimination_stereotype_injustice
  • sexually_explicit_adult_content
  • financial_crime_property_crime_theft
  • malware_hacking_cyberattack
  • privacy_violation
  • fraud_deception_misinformation
  • drug_abuse_weapons_banned_substance
  • child_abuse
  • animal_abuse
  • terrorism_organized_crime
  • non_violent_unethical_behavior
  • code_vulnerabilities
  • misinformation_regarding_ethics_laws_and_safety
  • And more…

This granularity allows developers to build multi-class moderation systems or tailored safety filters that align with specific organizational needs.

The chart below shows the dataset size distribution across harm categories, highlighting that benign content makes up the largest share (nearly 2GB), followed by jailbreak prompts, while other categories like violence, hate speech, self-harm, and privacy violations are smaller but still represented—illustrating both the scale and diversity of the dataset.

Distribution of samples

The chart below reveals how the dataset is structured around different types of harmful content. The largest share comes from violence-related content, followed by non-violent unethical behavior and hate speech, showing clear emphasis on the most pressing risks. Mid-sized portions include financial crime and discrimination, while smaller but essential categories like drugs/weapons, self-harm, and privacy violations ensure broader coverage. Together, this distribution shows that the dataset isn’t just massive in scale—it’s carefully balanced to capture both the most frequent harms and the most critical edge cases.

Risk Severity Distribution

The dataset is intentionally weighted toward high-severity risks, ensuring that guardrail systems are stress-tested against the most urgent and potentially harmful threats, while still maintaining representation of lower and critical risks for full-spectrum coverage.

The dataset is heavily weighted toward high-risk content (60%), such as violence, self-harm, and drugs, ensuring strong guardrail coverage against the most dangerous threats. Medium-risk categories like discrimination and hate speech make up 32%, reflecting their frequency and social impact. Critical risks—child abuse and terrorism—account for 5%, representing the most severe harms, while low-risk content such as bias and controversial topics contributes just 3%, adding nuance without dominating the dataset.

Category Co-occurrence Matrix

Harmful categories rarely exist in isolation. The dataset reflects how risks like violence, hate, and discrimination intertwine, which is critical for building guardrails that can detect complex, overlapping harms rather than treating each risk in isolation.

This chart highlights how harmful categories often overlap within the dataset, underscoring the multi-dimensional nature of AI risks. Violence shows the strongest co-occurrence, frequently intersecting with hate speech (44K), financial crime (101K), and self-harm (54K), while non-violent unethical behavior overlaps with discrimination (32K) and hate speech (33K). Hate speech and discrimination are closely linked (54K), and self-harm often appears alongside sexual content (99K) and drugs/weapons (132K). Even smaller categories like animal abuse and terrorism connect with violence, politics, and sexual content, illustrating that harmful content rarely exists in isolation and requires guardrails capable of detecting complex, overlapping threats.

Applications and Impact

This dataset opens up new possibilities for the AI community by serving as both a training ground and a benchmark for safety. With nearly 4.5 million labeled examples across dozens of harm categories, it enables developers to train far more robust safety classifiers—models capable of catching subtle risks and recognizing a much wider spectrum of harmful behaviors than ever before.

It also provides the opportunity to test guardrails under truly adversarial conditions, including jailbreaks and prompt injections that are specifically designed to bypass protections. By stress-testing systems against these edge cases, teams can identify weaknesses early and build guardrails that stand up to real-world challenges.

Beyond practical deployment, the dataset contributes to the broader field of research by offering a shared benchmark for AI safety and content moderation. Instead of working in silos, researchers and practitioners can evaluate their methods against the same large-scale, open dataset—helping the community move faster, compare results more fairly, and push the boundaries of what responsible AI can achieve.

Usage

Looking Ahead

This dataset is just the beginning. We believe that open, large-scale resources are critical for building safe, trustworthy AI. By making this dataset freely available, we invite researchers, developers, and organizations to collaborate in advancing AI safety at scale. Together, we can make sure that the next generation of AI systems isn’t just powerful—but also safe, aligned, and worthy of trust.

👉 Explore the dataset here: budecosystem/guardrail-training-data

Bud Ecosystem

Our vision is to simplify intelligence—starting with understanding and defining what intelligence is, and extending to simplifying complex models and their underlying infrastructure.

Related Blogs

From GenAI Pilot to Production: Best Practices and Evals That Matter
From GenAI Pilot to Production: Best Practices and Evals That Matter

Many GenAI initiatives shine in the pilot phase but struggle when scaled to production. A common reason is that teams often focus narrowly on metrics like time-to-first-token (TTFT) or latency in the early stages, while overlooking deeper evaluations that truly determine long-term success. In production environments, it’s not enough for models to respond quickly—they must […]

From GenAI Pilot to Production: Why 95% of Projects Fail—and How to Beat the Odds
From GenAI Pilot to Production: Why 95% of Projects Fail—and How to Beat the Odds

GenAI pilots are proliferating across industries, yet advancing these initiatives into full-scale production remains a major challenge. A recent MIT study revealed that 95% of generative AI projects fail to move beyond the pilot stage. During the early stages, organizations often concentrate narrowly on model accuracy and performance. While these measures are important, they are […]

I Built BlazeText — It’s 10X Faster Than HuggingFace’s Tokenizer
I Built BlazeText — It’s 10X Faster Than HuggingFace’s Tokenizer

A few weeks ago, while working on implementing a guardrail engine, I found myself staring at a performance graph that didn’t make any sense. Guardrail actions, like input sanitization, policy enforcement, hallucination checks, bias mitigation, audit logging: each layer adds complexity and latency. Left unchecked, those extra hops can nudge your p95 from tolerable to […]

Open Source Update : Bud Symbolic AI
Open Source Update : Bud Symbolic AI

This week we published a new open-source project — Bud Symbolic AI, an open-source framework designed to bridge traditional pattern matching (like regex and Cucumber expressions) with semantic understanding driven by embeddings. It delivers a unified expression framework that intelligently handles single words, multi‑word phrases, dynamic parameters, and context‑aware validation by leveraging FAISS for efficient […]