The Buttress server runs from npm and exposes a single executable,Documentation Index
Fetch the complete documentation index at: https://docs.bricks.tools/llms.txt
Use this file to discover all available pages before exploring further.
bricks-buttress. It works on macOS, Linux, and Windows; for GPU acceleration on Linux, install CUDA or Vulkan drivers before starting the server.
Hardware
| Resource | Recommended |
|---|---|
| GPU | NVIDIA (CUDA), AMD/Intel (Vulkan), or Apple Silicon (Metal) |
| RAM | At least 2× the size of the largest model you plan to load |
| Disk | Enough free space in cache_dir to hold every model you download |
| Network | Wired LAN — UDP broadcasts must reach Foundation devices |
Install from npm
Requires Node.js 22+ (or Bun).bricks-buttress binary on your PATH.
Run the server
Without a config, the server starts on port2080 with sensible defaults:
CLI flags
| Flag | Description |
|---|---|
-p, --port <port> | Port to listen on (default: 2080) |
-c, --config <path|toml> | Path to a TOML file or an inline TOML string |
-v, --version | Print the server version |
-h, --help | Show help |
--port flag → [server] port in TOML → default 2080.
Environment variables
| Variable | Effect |
|---|---|
NODE_ENV | Set to development for verbose logs |
ENABLE_OPENAI_COMPAT_ENDPOINT | Set to 1 to enable the OpenAI-compat endpoint |
ENABLE_ANTHROPIC_MESSAGES_ENDPOINT | Set to 1 to enable the Anthropic messages endpoint |
HF_TOKEN | Hugging Face token for downloading gated models |
[env] in your TOML config.
macOS GPU memory
On Apple Silicon Macs, the GPU is allowed about 70% of system memory by default. To raise the cap before loading large models:Verify
When the server starts, it prints a LAN-reachable URL likeVisit http://<ip>:2080/status to see status via LAN. Open that URL — or http://localhost:2080/status from the same machine — to load the status dashboard.
The dashboard shows, per backend (GGML-LLM, GGML-STT, MLX-LLM):
- The list of loaded generators and which ones currently hold an active model context
- Parallel slot usage and queued requests for STT
- Recent model-load history and completion / transcription history (collapsible)
/status — host the server on a trusted LAN.
For machine-readable output, query the JSON endpoints directly:
/buttress/info is what Foundation devices read during HTTP fallback discovery — see LAN auto-discovery.
Next steps
Configuration
Configure generators, caching, and compatibility endpoints.
Workspace binding
Pair the server with a workspace and enable JWT auth.