Introducing

Bud Sentinel

AI guardrails powered by Resource Aware Attention

State-of-the-art accuracy. Runs on any CPU. Performance that surpasses SOTA models running on $15,000 GPU servers.

CPU-Native SOTA Accuracy Zero GPU Cost
Why Sentinel?

Today's GenAI is
built for GPUs.

On the commodity hardware most organizations actually have, today's GenAI systems struggle to perform. This often forces teams to run guardrail systems on GPUs, turning what should be lightweight safeguards into an unexpectedly expensive part of the stack.

Costly
Hard to Scale
Inaccessible
GPU

GPUs are fast but expensive. CPUs are affordable but unusably slow.

CPU · 16 Workers
~5s
Latency at 8,000 input tokens
Affordable hardware - unusable speed
GPU · A100 ($15K)
~300ms
Fast enough - but the guardrail costs
as much as the language model itself
The Optimization Wall

You can optimize the runtime.
But returns diminish.

No amount of tuning compensates for an architecture designed for GPUs.

Optimization Effort → Performance Diminishing Returns
The Solution

Resource Aware Attention

To truly democratise GenAI, you have to commoditise it. That requires rethinking the model architecture itself, not optimising for GPUs, but building for the hardware most organisations actually have.

Resource Aware Attention is designed from the ground up for CPUs, maximising their strengths while maintaining model-level accuracy. The result is a fundamentally more efficient way to run GenAI — without the cost and dependency of specialised infrastructure.

First Application

We built Sentinel with
Resource Aware Attention

Because guardrails are non-negotiable in any serious GenAI deployment, they sit on the critical path of every request.

User Request
Authentication
⛨ Input Guardrail
Language Model
⛨ Output Guardrail
Response
Accuracy

More Accurate Than Leading
Guardrail Systems

Low attack success rate and low false refusal rate. Every other model trades one for the other.

Performance

Faster on any CPU
than competitors on a $15K GPU.

Every competing guardrail was tested on an NVIDIA A100. Sentinel was tested on a laptop. Sentinel won.

Sentinel on a laptop vs. everyone else on an A100
Per-classification latency · 512 tokens · Sentinel includes gRPC network overhead
Bud Sentinel
i7 Laptop · CPU
8.39ms
Prompt Guard 2
A100 · $15K GPU
18.52ms
ArchGuard
A100 · $15K GPU
19.07ms
PIGuard
A100 · $15K GPU
19.00ms
↑ 2.3x faster on a laptop CPU than competitors on a $15,000 GPU
Same CPU hardware. Completely different architecture.
Per-classification latency on Intel Xeon 8272CL · 512 tokens
Bud Sentinel
Xeon 8272CL
5.99ms
Prompt Guard 2
Xeon 8272CL
334ms
56x slower
ArchGuard
Xeon 8272CL
380ms
63x slower
Prompt Guard
Xeon 8272CL
402ms
67x slower
↑ 56-67x faster on the exact same hardware
Production-grade throughput
Sentinel handles enterprise traffic on commodity CPUs alone - no GPUs in the loop.
Xeon 8592+
256 vCPU · 8K Tokens
1,432 req/s
p50: 0.70ms
16K tokens761 req/s
65K tokens89 req/s
Xeon 8272CL
16 vCPU · 512 Tokens
2,749 req/s
p50: 25.17ms
8K tokens508 req/s
65K tokens101 req/s
EPYC 9V74
16 vCPU · 8K Tokens
57 req/s
p50: 17.60ms
16K tokens29 req/s
65K tokens6 req/s
The bottom line
Sentinel redefines what hardware you need for guardrails.
Bud Sentinel
Any laptop CPU
CPU
8.39 ms
512 tokens · incl. network overhead · no GPU
Leading Guardrails
NVIDIA A100 · $15,000
GPU
~18.9 ms
512 tokens only · max seq 512 · needs 16 parallel for 8K
Bud Sentinel
Xeon 8592+ · Server
Server
0.70 ms
8K tokens · 5,000 concurrent · 1,412 req/s
Leading Guardrails
Same CPU · No GPU
CPU
~845 ms
512 tokens only · 100x slower on identical hardware
Requires a $15K GPU to reach 18ms
Ubiquitous Safety

Deploy guardrails
everywhere.

Guardrails stop being a cost center. They become infrastructure.

📱
Edge Devices
Phones, IoT, embedded
🤖
Every Agent
Per-action at zero cost
Every Cloud
Multi-cloud, on-prem, hybrid
What's Next

Sentinel is just
the beginning.

We're rebuilding the entire GenAI stack with Resource Aware Attention.

Guardrails - Sentinel Live
Embeddings Next
Rerankers Next
Routing Planned
Compression Planned
Caching Planned

Redesign the foundation.
Redefine what's possible.

Not by waiting for cheaper hardware - but by building an architecture that works with what already exists.