BRICKS Buttress splits cleanly into four layers — clients, servers, backend cores, and shared hardware guardrails — connected by a JSON-RPC WebSocket protocol and a UDP autodiscovery transport. The diagram below is the target architecture; the Implemented vs planned section calls out what ships today and what’s still on the roadmap.Documentation Index
Fetch the complete documentation index at: https://docs.bricks.tools/llms.txt
Use this file to discover all available pages before exploring further.
Complete architecture
Layer responsibilities
| Layer | Role |
|---|---|
| Buttress Server | Combined server with all features — WebSocket RPC, autodiscovery, file transfer |
| Buttress Backend | Backend without autodiscovery (for embedded use) |
| Autodiscovery | UDP broadcast and HTTP endpoint discovery |
| Buttress Client | Client library used by Foundation devices to connect to servers |
| Backend Core | Generator implementations (LLM, STT, MLX, future thermal printer) |
| Hardware Guardrails | Shared capability detection and scoring logic — the same code runs on both client and server so they grade themselves on the same scale |
How capability comparison works
When a Foundation device starts a generator, the client and server exchange hardware information so the system can pick the right side to run on:- Client collects local capabilities. GPU/CPU info, available memory, model metadata (layers, embedding size, KV cache requirements).
- Client sends capabilities to the server along with the model identifier and requested context size.
- Server evaluates both sides by running the same guardrails code on the client’s reported caps and on its own. Each side gets a 0-100 performance score and a memory-fit verdict.
- Server returns a recommendation —
local,buttress, oreither. - Client decides based on its strategy (
prefer-local,prefer-buttress,prefer-best) and the recommendation. See Use Buttress from Foundation.
Implemented vs planned
Implemented
- Workspace JWT authentication — Ed25519 issuer per workspace, short-lived
{ k:'ba', w_id, … }access tokens. See Workspace binding. - UDP autodiscovery with signed announcements —
ANNOUNCE/QUERY/RESPONSEon UDP8089, with per-backend hardware caps in the announcement. Each bound server signs every packet with its registered Ed25519 announce key; launchers verify the signature with a 30-second replay window. Protocol version is2.0. See LAN auto-discovery. - Generator registry with reference counting — multiple clients can share a loaded model, and the server cleans up automatically when refcount hits zero.
- Queue management — hardware-aware request queuing for STT, with parallel-slot tracking visible on the
/statusdashboard. - GGML LLM, MLX LLM, and GGML STT backends.
- Capability detection and scoring — same guardrails code on both sides.
- File transfer for STT — devices upload audio to
POST /buttress/upload.
Planned
- Docker distribution — pre-built images with CUDA / Vulkan support.
- Multi-server pool selection — caps-aware ranking when multiple bound servers are present in the same workspace (currently last-seen wins).
- Thermal printer backend — additional offload target beyond LLM/STT.
Related
Workspace binding
How JWT auth fits into the architecture.
LAN auto-discovery
The UDP transport, announcement payload, and capability scoring.