The Buttress server reads a single TOML file passed viaDocumentation Index
Fetch the complete documentation index at: https://docs.bricks.tools/llms.txt
Use this file to discover all available pages before exploring further.
--config. Every section is optional; omit it to use defaults.
Minimal example
[server]
| Key | Type | Default | Description |
|---|---|---|---|
port | number | 2080 | HTTP/WebSocket port |
log_level | string | "info" | One of debug, info, warn, error |
id | string | buttress-<machineId> | Stable server id used for binding and discovery |
name | string | auto-generated | Friendly name shown in BRICKS Controller |
max_body_size | number or string | 52428800 (50 MB) | Max upload size; accepts "50MB", "1GB", etc. |
session_timeout | number or string | 60000 (1 min) | WebSocket idle timeout; accepts "1m", "30s" |
temp_file_dir | string | <os-tmpdir>/.buttress | Directory for STT audio uploads and other temp files |
[runtime]
Where the server stores downloaded models.
| Key | Default | Description |
|---|---|---|
cache_dir | ~/.buttress/models | Where downloaded model files live |
huggingface_token | "" | Hugging Face auth token; falls back to HF_TOKEN env var |
[runtime.session_cache]
For ggml-llm generators, the server can persist KV cache state between requests so that a follow-up completion sharing a prompt prefix skips prompt processing.
| Key | Default | Description |
|---|---|---|
enabled | true | Enable persistent KV cache |
max_size_bytes | "10GB" | Total disk budget; accepts "500MB", "50GB", or a number |
max_entries | 1000 | Max number of cached states (LRU eviction) |
{cache_dir}/.session-state-cache/.
[[generators]]
Each [[generators]] block declares one model the server can host. Repeat the block to host multiple.
LLM (llama.cpp / GGML)
LLM (MLX, Apple Silicon only)
Speech-to-text (Whisper / GGML)
| Key | Description |
|---|---|
type | One of ggml-llm, mlx-llm, ggml-stt |
backend.variant_preference | Ordered list of backend variants. LLM accepts cuda, vulkan, snapdragon, default. STT accepts coreml, default |
model.repo_id | Hugging Face repo id |
model.filename | Specific file inside the repo (STT only) |
model.quantization | Quantization tag matching the repo (LLM only) |
model.n_ctx | Context window length in tokens (LLM only) |
[autodiscover]
The server announces itself on UDP 8089 so Foundation devices on the same LAN can find it. Auto-discovery is on by default.
[autodiscover] = false to disable discovery entirely. See the autodiscovery reference for protocol details.
[env]
Environment variables applied at startup, but only if they are not already set in the system environment. System variables and command-line exports take precedence.
Compatibility endpoints
These endpoints are experimental. The schemas, error shapes, and CORS defaults may change.
| Endpoint | Config flag |
|---|---|
POST /oai-compat/v1/chat/completions | [openai_compat] enabled = true |
GET /oai-compat/v1/models | [openai_compat] enabled = true |
POST /anthropic-messages/v1/messages | [anthropic_messages] enabled = true |
POST /anthropic-messages/v1/messages/count_tokens | [anthropic_messages] enabled = true |
ENABLE_OPENAI_COMPAT_ENDPOINT=1 or ENABLE_ANTHROPIC_MESSAGES_ENDPOINT=1.
Next steps
Workspace binding
Pair the server with a BRICKS workspace and enable auth.
LAN auto-discovery
How Foundation devices find your server on the LAN.