Introducing

Bud Sentinel

AI guardrails powered by Resource Aware Attention

State-of-the-art accuracy. Runs on any CPU. Performance that surpasses SOTA models running on $15,000 GPU servers.

CPU-Native SOTA Accuracy Zero GPU Cost

Learn More ↓ Contact Us

Why Sentinel?

Today's GenAI is
built for GPUs.

On the commodity hardware most organizations actually have, today's GenAI systems struggle to perform. This often forces teams to run guardrail systems on GPUs, turning what should be lightweight safeguards into an unexpectedly expensive part of the stack.

Costly

Hard to Scale

Inaccessible

⬡ GPU

GPUs are fast but expensive. CPUs are affordable but unusably slow.

CPU · 16 Workers

~5s

Latency at 8,000 input tokens
Affordable hardware - unusable speed

GPU · A100 ($15K)

~300ms

Fast enough - but the guardrail costs
as much as the language model itself

The Optimization Wall

You can optimize the runtime.
But returns diminish.

No amount of tuning compensates for an architecture designed for GPUs.

The Solution

Resource Aware Attention

To truly democratise GenAI, you have to commoditise it. That requires rethinking the model architecture itself, not optimising for GPUs, but building for the hardware most organisations actually have.

Resource Aware Attention is designed from the ground up for CPUs, maximising their strengths while maintaining model-level accuracy. The result is a fundamentally more efficient way to run GenAI — without the cost and dependency of specialised infrastructure.

First Application

We built Sentinel with
Resource Aware Attention

Because guardrails are non-negotiable in any serious GenAI deployment, they sit on the critical path of every request.

User Request

Authentication

⛨ Input Guardrail

Language Model

⛨ Output Guardrail

Response

Coverage

23 Models. Every Threat.

Five categories. Thirty-three variants. One deployment.

Guardrail systems are only as good as what they guard against. Sentinel ships 23 specialised models across 33 variants — covering the full spectrum of threats your GenAI deployment will face in production. Every model runs on commodity CPUs at enterprise scale. Zero GPU infrastructure required at any point.

Category	Models	What It Catches
Security	3	Jailbreak detection, prompt injection defence, secrets & credential exposure
Safety	4	Content moderation, harmful content, suicide & self-harm detection, drug enablement
Toxicity	8	Hate speech, abuse, profanity, obscenity, insults, threats, identity attacks, impoliteness, toxic conversation patterns
Compliance	6	PII detection across 11 regions (AU, US, ES, IN, FI, IT, KR, SG, PL, UK, General), illegal activity, political content, regulated advice, bias
Quality	2	Spam detection, domain-specific QA validation

→

All 23 models are available through a single unified API via the Bud Guardrail Gateway.

Accuracy

More Accurate Than Leading
Guardrail Systems

Low attack success rate and low false refusal rate. Every other model trades one for the other.

Every guardrail system in the market accepts one of two failure modes. High-sensitivity models like ArchGuard and Prompt Guard block 94–95% of attacks — but also reject 81–89% of legitimate users. Unusable in production. Lower-sensitivity models like ProtectAI V2 and Prompt Guard 2 keep false refusals low — but let 34–36% of attacks through. A security liability.

The industry treated this as an unavoidable trade-off. Sentinel refuses to accept it.

Benchmark Results: Overall Performance

Model	Attack Success Rate ↓	False Refusal Rate ↓	Balanced Accuracy	Rank
Bud Sentinel	15.97%	14.92%	84.56%	#1
PIGuard	25.01%	25.86%	74.57%	#2
Prompt Guard 2 (86M)	34.68%	15.30%	75.01%	#3
ProtectAI V2	36.35%	24.39%	69.63%	#4
ArchGuard	5.40%	81.65%	56.48%	#5
Prompt Guard (86M)	5.83%	89.35%	52.41%	#6

↓ Lower is better for ASR and FRR. Higher is better for Balanced Accuracy.

ASR (Attack Success Rate): percentage of attacks that bypass the guardrail — lower is better.

FRR (False Refusal Rate): percentage of legitimate requests incorrectly blocked — lower is better.

Benchmarks conducted across JailBreakBench, PIGuard, WildJailbreak, and Qualifire PI.

Performance

Faster on any CPU
than competitors on a $15K GPU.

Every competing guardrail was tested on an NVIDIA A100. Sentinel was tested on a laptop. Sentinel won.

Sentinel on a laptop vs. everyone else on an A100

Per-classification latency · 512 tokens · Sentinel includes gRPC network overhead

Bud Sentinel

i7 Laptop · CPU

8.39ms

Prompt Guard 2

A100 · $15K GPU

18.52ms

ArchGuard

A100 · $15K GPU

19.07ms

PIGuard

A100 · $15K GPU

19.00ms

↑ 2.3x faster on a laptop CPU than competitors on a $15,000 GPU

Same CPU hardware. Completely different architecture.

Per-classification latency on Intel Xeon 8272CL · 512 tokens

Bud Sentinel

Xeon 8272CL

5.99ms

Prompt Guard 2

Xeon 8272CL

334ms

56x slower

ArchGuard

Xeon 8272CL

380ms

63x slower

Prompt Guard

Xeon 8272CL

402ms

67x slower

↑ 56-67x faster on the exact same hardware

Production-grade throughput

Sentinel handles enterprise traffic on commodity CPUs alone - no GPUs in the loop.

Xeon 8592+

256 vCPU · 8K Tokens

1,432 req/s

p50: 0.70ms

16K tokens761 req/s

65K tokens89 req/s

Xeon 8272CL

16 vCPU · 512 Tokens

2,749 req/s

p50: 25.17ms

8K tokens508 req/s

65K tokens101 req/s

EPYC 9V74

16 vCPU · 8K Tokens

57 req/s

p50: 17.60ms

16K tokens29 req/s

65K tokens6 req/s

The bottom line

Sentinel redefines what hardware you need for guardrails.

Bud Sentinel

Any laptop CPU

CPU

8.39 ms

512 tokens · incl. network overhead · no GPU

Leading Guardrails

NVIDIA A100 · $15,000

GPU

~18.9 ms

512 tokens only · max seq 512 · needs 16 parallel for 8K

Bud Sentinel

Xeon 8592+ · Server

Server

0.70 ms

8K tokens · 5,000 concurrent · 1,412 req/s

Leading Guardrails

Same CPU · No GPU

CPU

~845 ms

512 tokens only · 100x slower on identical hardware

Requires a $15K GPU to reach 18ms

Infrastructure

The Bud Guardrail Gateway

One binary. All 23 models. Deploy anywhere.

The Guardrail Gateway is the production serving infrastructure that delivers Sentinel's performance claims at scale. A single binary that serves all 23 models with a unified gRPC API — handling request routing, batching, model loading, health checks, and horizontal scaling automatically. No orchestration overhead. No GPU dependency. One process, enterprise-ready.

Single binary deployment

All 23 models served from one process with a unified API. No per-model infrastructure. No GPU cluster management.

Native long-context support

Processes up to 65,536 tokens in a single pass — no chunking, no windowing, no parallel workers. Competing models cap at 512 tokens and require 16 parallel GPU workers to handle real-world prompt lengths.

Concurrency-optimised

Latency improves under load. At 10,000 concurrent connections, p50 is 0.70ms — better than at low concurrency. Built for enterprise traffic patterns, not benchmarking conditions.

Hardware agnostic

Validated on Intel Xeon, AMD EPYC, and consumer-grade Intel Core processors. Deploy on existing cloud CPU instances, on-premise servers, or edge nodes without hardware changes.

Enterprise-ready

gRPC protocol, health checks, observability hooks, and native integration with Bud SENTRY for governance and audit. Fits into any existing infrastructure stack.

Deploy on public cloud, private cloud, on-premise, air-gapped environments, or edge — with no infrastructure changes.

The Foundation

Built on the World's Largest
Open Guardrails Dataset

Accurate guardrails require data that reflects the real threat landscape. No existing public dataset covered the breadth of attacks, toxicity patterns, and adversarial perturbations that production deployments face. So we built one.

4M+ labelled rows

The Bud Guardrails Dataset contains over 4 million labelled rows spanning toxicity, jailbreak attacks, prompt injections, and adversarial perturbations — believed to be the largest open guardrails dataset in existence. Every Sentinel model is trained on this foundation.

Ubiquitous Safety

Deploy guardrails
everywhere.

Guardrails stop being a cost center. They become infrastructure.

Edge Deployment

Phones, IoT gateways, embedded systems, ARM processors. Sentinel runs at ~25ms on edge CPUs — competing models take ~2,400ms on the same hardware. For the first time, guardrails can run at the point of generation, not just in the cloud.

96× faster than competitors on edge

Always-On Agent Monitoring

Guardrail every agent action, tool call, and model output — without adding a GPU to your agent infrastructure. One CPU server. 124 million classifications per day.

$0.10 per million classifications

Sovereign & Air-Gapped Deployments

Government, defence, and regulated industries can now run a full guardrail stack on their existing CPU infrastructure. No GPU procurement. No external cloud dependency. No data ever leaves the perimeter.

No GPU Required No Cloud Dependency Data Never Leaves

15–18× Cost-Performance Advantage

CPU cloud instances run at approximately $0.50/hour. NVIDIA A100 instances cost $2–3/hour — and still lose to Sentinel on latency. The performance advantage compounds with the cost difference at every scale.

CPU Instance $0.50/hr

A100 GPU $2-3/hr

What's Next

Sentinel is just
the beginning.

We're rebuilding the entire GenAI stack with Resource Aware Attention.

Guardrails - Sentinel Live

Embeddings Next

Rerankers Next

Routing Planned

Compression Planned

Caching Planned

Testimonials

What Industry Leaders Say

Hear from our partners and collaborators about Bud Sentinel

Our collaboration with Bud Ecosystem demonstrates the value of co-designing hardware-aware software. By optimizing Sentinel for Intel® Xeon® processors, Bud enables enterprises to scale AI safety with high performance and significantly lower total cost of ownership, without the cost and complexity of GPUs. Intel® Xeon® 6 provides the foundation for Sentinel's CPU-native architecture, delivering enterprise-grade performance on the infrastructure customers already trust - across data center, edge, and regulated environments.

Sangeeta Roy

Director – Global Partner Business Leadership

Intel Corporation

Bud Ecosystem is working in two directions in AI research that I find personally very interesting: AI safety and reducing the carbon footprint of large language models. I've had a chance to try out their models, and their benchmarking results look very promising. It's encouraging to see efforts that challenge the assumption that cutting-edge performance must come with high computational and environmental costs. Looking forward to seeing more innovations along these lines, making "AI safe and accessible for everyone."

Subhrajyoty Roy

Independent AI Practitioner, Postdoctoral Researcher

Washington University

Redesign the foundation.
Redefine what's possible.

Not by waiting for cheaper hardware - but by building an architecture that works with what already exists.

Bud Sentinel

Today's GenAI is
built for GPUs.

You can optimize the runtime.
But returns diminish.

Resource Aware Attention

We built Sentinel with
Resource Aware Attention

23 Models. Every Threat.

More Accurate Than Leading
Guardrail Systems

Faster on any CPU
than competitors on a $15K GPU.

The Bud Guardrail Gateway

Single binary deployment

Native long-context support

Concurrency-optimised

Hardware agnostic

Enterprise-ready

Built on the World's Largest
Open Guardrails Dataset

Deploy guardrails
everywhere.

Edge Deployment

Always-On Agent Monitoring

Sovereign & Air-Gapped Deployments

15–18× Cost-Performance Advantage

Sentinel is just
the beginning.

What Industry Leaders Say

Redesign the foundation.
Redefine what's possible.

Company

Product

Resources

Bud Sentinel

Today's GenAI isbuilt for GPUs.

You can optimize the runtime.But returns diminish.

Resource Aware Attention

We built Sentinel withResource Aware Attention

23 Models. Every Threat.

More Accurate Than LeadingGuardrail Systems

Faster on any CPUthan competitors on a $15K GPU.

The Bud Guardrail Gateway

Single binary deployment

Native long-context support

Concurrency-optimised

Hardware agnostic

Enterprise-ready

Built on the World's LargestOpen Guardrails Dataset

Deploy guardrailseverywhere.

Edge Deployment

Always-On Agent Monitoring

Sovereign & Air-Gapped Deployments

15–18× Cost-Performance Advantage

Sentinel is justthe beginning.

What Industry Leaders Say

Redesign the foundation.Redefine what's possible.

Company

Product

Resources

Today's GenAI is
built for GPUs.

You can optimize the runtime.
But returns diminish.

We built Sentinel with
Resource Aware Attention

More Accurate Than Leading
Guardrail Systems

Faster on any CPU
than competitors on a $15K GPU.

Built on the World's Largest
Open Guardrails Dataset

Deploy guardrails
everywhere.

Sentinel is just
the beginning.

Redesign the foundation.
Redefine what's possible.