Bud Model Foundry Deep dive · 03

Platform, Security & Deployment

A seven-layer architecture, production-grade serving and lifecycle, a security model built for regulated workloads, and deployment from a single-node pilot to a fully air-gapped estate.

Inference, registry & lifecycle

A trained checkpoint becomes a production capability.

The inference engine ships in the same platform as training — same auth, same audit log, same registry.

OpenAI-compatible serving

Drop-in /v1/chat/completions by changing the base URL. Continuous batching, paged attention, tensor + pipeline parallelism, six parameter presets and token streaming.

Multi-tenant LoRA serving

Hundreds of adapters from a single base model with native per-request routing and no cold-start cost — the fixed cost of a 70B base amortised across hundreds of tenants.

Model registry with lineage

PostgreSQL-backed, semantic versioning, full parent_version_id lineage, aliases (@latest, @production, @staging), training metrics and per-model audit trail.

Drift detection in production

Five algorithms — PSI, KL, Jensen-Shannon, Kolmogorov-Smirnov, chi-square — with severity bucketing, fixed/sliding/tumbling/ADWIN windows, and worst-case aggregation feeding the alert system.

Architecture

Seven layers, each independently scalable.

Graceful degradation: core training and RL run on pure PyTorch and stay available even when optional high-level components are not.

Consumption
SDK, 350+ REST endpoints, 35-page dashboard, server TUI, OpenAI-compatible clients, MCP server.
Gateway
FastAPI with ordered middleware: request-ID, idempotency, size limits, rate limiting, auth, RBAC, CORS.
Execution
Celery workers (training, pipelines, imports) and in-process pipelines (Tinker, RL, fast inference).
Core engines
Bud Tinker, Training Pipelines, Bud RL Engine, Simplified ART, DiLoCo Orchestrator — pure PyTorch.
Subsystems
Data pipeline, inference engine, model registry, drift detection, feedback collector.
Cross-cutting
Auth, AES-256-GCM encryption, audit logging, cost tracking, notifications, idempotency.
Persistence
PostgreSQL for state + registry, Redis for cache/queues, MinIO/S3 for artifacts.
Performance commitments

Stated targets — measured in production.

MetricTargetMechanismStatus
Inference throughput baselinePaged attention + continuous batchingValidated
Time to first token< 100 msHigh-throughput serving engineValidated
DiLoCo bandwidth reduction100–500×Inner AdamW + outer Nesterov SGDValidated
Drift detection latency< 5 sPSI · KL · JS · KS · chi-squareValidated
Registry version creation< 1 sPostgreSQL-backed registryValidated
Per-job memory growth0 MB avgTwo-tier GPU memory cleanupValidated
Hardware support

Four vendors. PCIe as a first-class target.

The installer auto-detects OS, GPU vendor, driver and runtime, then selects the correct PyTorch wheel automatically.

NVIDIA
PCIe + SXM · first-class
AMD
CDNA / RDNA via ROCm
Intel
XPU via PyTorch XPU
Qualcomm
Cloud AI accelerators
H100-80GBH200-141GBA100-80GBA100-40GBL40S-48GBRTX 4090RTX 3090A10GL4V100-32GBAMD MI300X
Security architecture

Suitable for regulated workloads — without additional hardening.

Security is woven through every layer. All operations are auditable, all data encryptable, all access governable.

API-key authenticationbcrypt-hashed with auto-salt, prefix-indexed lookup
OAuth / OIDC SSOBrute-force lockout, CSRF state, refresh-token rotation
Hierarchical RBACadmin > member > viewer plus explicit scopes
Model-level access policiesPer-model allowlists, inherited from training to inference
AES-256-GCM at rest96-bit nonces, chunked streaming for multi-GB files
Key rotationHourly scan, 90-day TTL refresh, concurrent-safe
Atomic quota enforcementRow-level locking — no TOCTOU across workers
Complete audit trailPer-key, per-model access, admin audit log with export
Deployment & operations

From a 30-minute pilot to a fully air-gapped estate.

No hosted dependency, no required outbound connection, no telemetry leaving your perimeter.

Single-node Docker

Eight services in containers via bud-install. Pilot, development and single-team production — deploy in 30 minutes.

Kubernetes via Helm

Production-grade with HPA, PDB, network policies and persistent volumes. Horizontal scaling for multi-team estates.

Air-gapped on-premise

Same Helm chart with offline image staging. All artefacts pre-positioned, registry mirrored, no outbound dependency.

Observability

WebSocket metric streaming, a public Prometheus /metrics endpoint, structured logging with auto-redaction, K8s liveness/readiness probes, and per-job / per-team cost attribution with line-item invoices.

Day-2 operations

Zombie-job detection, GPU reservation tracking, maintenance mode, eight-priority graceful shutdown, background schedulers for key rotation and recovery, and the server TUI for terminal-based control.

Ready to build inside your perimeter?

Get a demo