Bud Model Foundry

The sovereign training platform for the agentic enterprise.

Build, fine-tune, post-train and agentic-train open models on your own infrastructure — on the GPUs you already own, with research-grade control and production-grade operations.

At a glance

Bud Model Foundry, in numbers.

The headline figures behind the platform — capability surface, performance commitments, and operational breadth.

118+
Supported open-weight models
500×
Inter-node bandwidth reduction via DiLoCo
350+
Platform REST endpoints
4
GPU vendors — NVIDIA, AMD, Qualcomm, Intel
6
Training stages — PT, SFT, RM, PPO, DPO, KTO
9
Quantization formats — BNB, GPTQ, AWQ, AQLM, FP8 and more
10
Graders, including LLM-as-judge and tool-call
260+
Data operators across text, image, audio, video, code
Design commitments

Five non-negotiables, woven through every layer.

Each commitment maps to a specific failure of the existing market — and each one is a feature of the platform from the foundation up, not a checkbox added in a release note.

01

Sovereignty by deployment

One-command on-premise install. No outbound dependency. AES-256-GCM encryption at rest. Air-gapped operation as a first-class deployment pattern.

02

Multi-vendor by design

NVIDIA, AMD, Qualcomm and Intel GPUs. PCIe form factors as first-class hardware. Mixed-vendor fleets supported within a single training job.

03

Agentic-first by purpose

Three RL training modes, four built-in environments, ten graders, five recipes, and a teaching-metaphor API for non-researcher operators.

04

End-to-end by scope

Data prep, training, RL, OpenAI-compatible serving, model registry with lineage, drift detection, feedback collection — in one platform, one auth surface, one audit log.

05

Production-grade from day one

API key authentication, OAuth/OIDC, RBAC with model-level policies, atomic quotas, rate limiting, structured audit, Prometheus metrics. Engineered as a platform.

What you can do with it

End-to-end training workflows, all inside your perimeter.

The concrete things your team can run on day one — not as separate tools stitched together, but as first-class workflows on a single platform.

i.

Fine-tune open models on domain data

Take a 7B–70B model and apply Full FT, LoRA, QLoRA, DoRA, LoRA+, OFT or top-N freeze through one configuration surface.

ii.

Continue pre-training on proprietary corpora

Inject domain vocabulary and knowledge into a base model with causal-language-modelling loss before instruction tuning.

iii.

Train custom reward models

Train a separate reward model on your preference data, then run online RLHF or DPO to align a base model to your standards.

iv.

Train agents end-to-end with reinforcement learning

Run RL against your own tools, APIs, and environments — with verifiable rewards, LLM-as-judge graders, or hybrid combinations.

v.

Run improvement sessions on deployed models

Take a deployed model, evaluate against a curriculum, identify weaknesses, and post-train to address them — through the Simplified ART API.

vi.

Curate large training corpora

Process and filter through 260+ data operators with distributed Ray-based execution and full reproducibility.

vii.

Serve fine-tuned models

OpenAI-compatible endpoints with multi-tenant adapter routing — hundreds of custom adapters from a single base model.

viii.

Track everything in a model registry

Full lineage from dataset to checkpoint, with statistical drift detection across five algorithms once the model is in production.

Capability map

Twelve capability pillars, one platform.

Each pillar addresses a specific operational need. Together they constitute a single platform with one authentication surface, one audit log, and one observability layer.

Five interfaces

One platform, five ways to drive it.

Same auth, same audit log, same governance — whichever interface you choose. Pick the one tuned to your persona.

Python SDK

Researchers · ML engineers

Sync and async clients with feature parity. Fluent builders for training, LoRA, DiLoCo, QLoRA configs.

REST API

Platform integrators

350+ endpoints with OpenAPI spec. Idempotency keys, webhooks, WebSocket subscriptions for live metrics.

Web Dashboard

Operators · managers · SMEs

35-page Next.js GUI with progressive disclosure. Visual data-pipeline DAG editor and Tinker Lab.

Server TUI

Site-reliability engineers

Textual-based terminal UI in any SSH session. Service health, GPU gauges, log tailing, air-gapped friendly.

MCP Server

Autonomous agents

Training capabilities exposed as MCP tools. Agents drive their own improvement loops with full audit governance.

Explore the developer experience in depth
Architecture & performance

Engineered for graceful degradation and operational independence.

Seven layers, each independently scalable. The core training and RL capabilities run on pure PyTorch and stay available even when optional high-level components are not. Each layer is monitored, secured and upgraded on its own schedule.

Inference
Throughput vs baseline tokens/sec on supported hardware.
<100ms
Time to first token
High-throughput serving with paged attention and continuous batching.
100–500×
DiLoCo
Inter-node bandwidth reduction. Up to 4,800× with int4 + adapter sync.
<5s
Drift detection
Per-batch latency across PSI, KL, JS, KS and chi-square.
Layer 7
Consumption
SDK · REST · Dashboard · TUI · MCP · OpenAI clients
Layer 6
Gateway
FastAPI middleware: request-ID, idempotency, rate-limit, auth, RBAC, CORS
Layer 5
Execution
Celery workers · in-process pipelines · background schedulers
Layer 4
Core engines
Bud Tinker · Training Pipelines · RL Engine · Simplified ART · DiLoCo
Layer 3
Platform subsystems
Data Pipeline · Inference Engine · Model Registry · Drift · Feedback
Layer 2
Cross-cutting services
Auth · encryption · audit · cost tracking · notifications · idempotency
Layer 1
Persistence
PostgreSQL · Redis · MinIO/S3 · external IdP
How it compares

The market splits into five archetypes.
Bud Model Foundry sits in the fifth.

An honest, capability-by-capability comparison against the four major training-platform archetypes. Full matrix and head-to-head positioning lives on the comparison page.

Requirement
Hosted
Hyperscaler
DIY OSS
Bud Foundry
Sovereignty / data residency
Fails
Partial
Pass
Pass
Predictable cost at scale
Per-token
GPU-hour + egress
CapEx
License
Multi-vendor GPU support
Single
Provider catalog
DIY
4 vendors
Agentic RL stack built-in
Limited
Partial
DIY (6–12 mo)
In-box
Time to first production job
Days
Weeks
6–12 months
Days
Air-gapped deployment
No
No
Possible
First-class
Lifecycle scope
Training only
Provider stack
Possible
Full lifecycle
Who it's for

Six audiences, six different ways to win.

The platform speaks differently to each audience. Pick the one that matches your organisation and read the use case in your register.

Deployment options

Deployable into any environment you operate.

No hosted dependency. No required outbound connection. No telemetry leaving your perimeter. Pick the pattern that matches the context.

01

Single-node Docker Compose

Pilot · development · single-team production

Eight services in containers (API, worker, frontend, PostgreSQL, Redis, MinIO, identity provider, RSA key bootstrap). Deploy in 30 minutes via the bud-install command-line tool.

02

Kubernetes via Helm

Multi-team production · sovereign cloud

Production-grade with horizontal scaling. Helm chart with HPA, PDB, network policies, persistent volumes, four conditional Bitnami subcharts. Standard K8s liveness and readiness probes.

03

Air-gapped on-premise

Maximum sovereignty · defence · classified

Same Helm chart with offline image staging. All artefacts pre-positioned, container registry mirrored, no outbound dependency. The default deployment pattern for sovereign-AI mandates.

Engagement models

Three ways to engage with the platform.

Procure the way your organisation prefers. Each model meets a different operating posture.

Model 01

Self-managed software license

Annual or multi-year software license. Bud delivers the software, documentation, and support. Your team handles deployment and operations end-to-end.

Talk to us
Model 02

Managed deployment

Software license plus a managed-services engagement. The Bud team handles deployment, configuration, upgrades and day-2 operations alongside your team.

Talk to us
Model 03

Strategic partnership

Multi-year partnership combining Bud Model Foundry, AI Foundry and other Ecosystem components, with co-engineered solutions for your specific use cases.

Talk to us

Build the AI you actually need.

Run on the GPUs you actually own. Train inside the perimeter your governance team requires. Deploy with production-grade authentication, audit, encryption and operations on day one.