Architecture - BRICKS

BRICKS Buttress splits cleanly into four layers — clients, servers, backend cores, and shared hardware guardrails — connected by a JSON-RPC WebSocket protocol and a UDP autodiscovery transport. The diagram below is the target architecture; the Implemented vs planned section calls out what ships today and what’s still on the roadmap.

Complete architecture

Solid orange arrows are runtime data paths. Dashed green arrows mark where the Hardware Guardrails package is reused on both sides — the same code runs on the client and server. Dashed gray nodes (Thermal Printer, Future Backends) are on the roadmap but not yet shipped.

Layer responsibilities

Layer	Role
Buttress Server	Combined server with all features — WebSocket RPC, autodiscovery, file transfer
Buttress Backend	Backend without autodiscovery (for embedded use)
Autodiscovery	UDP broadcast and HTTP endpoint discovery
Buttress Client	Client library used by Foundation devices to connect to servers
Backend Core	Generator implementations (LLM, STT, MLX, future thermal printer)
Hardware Guardrails	Shared capability detection and scoring logic — the same code runs on both client and server so they grade themselves on the same scale

How capability comparison works

When a Foundation device starts a generator, the client and server exchange hardware information so the system can pick the right side to run on:

Client collects local capabilities. GPU/CPU info, available memory, model metadata (layers, embedding size, KV cache requirements).
Client sends capabilities to the server along with the model identifier and requested context size.
Server evaluates both sides by running the same guardrails code on the client’s reported caps and on its own. Each side gets a 0-100 performance score and a memory-fit verdict.
Server returns a recommendation — local, buttress, or either.
Client decides based on its strategy (prefer-local, prefer-buttress, prefer-best) and the recommendation. See Use Buttress from Foundation.

Because client and server share the guardrails package, the scores are directly comparable — there is no calibration drift between sides.

Implemented vs planned

Implemented

Workspace JWT authentication — Ed25519 issuer per workspace, short-lived { k:'ba', w_id, … } access tokens. See Workspace binding.
UDP autodiscovery with signed announcements — ANNOUNCE/QUERY/RESPONSE on UDP 8089, with per-backend hardware caps in the announcement. Each bound server signs every packet with its registered Ed25519 announce key; launchers verify the signature with a 30-second replay window. Protocol version is 2.0. See LAN auto-discovery.
Generator registry with reference counting — multiple clients can share a loaded model, and the server cleans up automatically when refcount hits zero.
Queue management — hardware-aware request queuing for STT, with parallel-slot tracking visible on the /status dashboard.
GGML LLM, MLX LLM, and GGML STT backends.
Capability detection and scoring — same guardrails code on both sides.
File transfer for STT — devices upload audio to POST /buttress/upload.

Planned

Docker distribution — pre-built images with CUDA / Vulkan support.
Multi-server pool selection — caps-aware ranking when multiple bound servers are present in the same workspace (currently last-seen wins).
Thermal printer backend — additional offload target beyond LLM/STT.

Workspace binding

How JWT auth fits into the architecture.

LAN auto-discovery

The UDP transport, announcement payload, and capability scoring.

​Complete architecture

​Layer responsibilities

​How capability comparison works

​Implemented vs planned

​Implemented

​Planned

​Related

Workspace binding

LAN auto-discovery

Complete architecture

Layer responsibilities

How capability comparison works

Implemented vs planned

Implemented

Planned

Related