BUZZ Dedicated LLM Endpoint Inference Dashboard
Monitor real-time throughput, latency, KV cache, and request metrics for your dedicated LLM inference endpoints.
To get started, find your metrics URL in the BUZZ LLM Inference Console under your dedicated endpoint's details, then add it below.
Select endpoints from the tab bar to compare them on the charts below.
Requests Running
—
active in engine
Requests Waiting
—
in queue
KV Cache Usage
—
of available cache
Prefix Cache Hit Rate
—
tokens served from cache
Avg TTFT
—
time to first token
Avg E2E Latency
—
end-to-end request
Request Outcomes (lifetime)
Prompt Token Size Distribution
Active Requests & Queue — Live
Generation Tokens/sec — Comparison
Health Checks
| Signal | Status |
| Engine State | — |
| Queue Backlog | — |
| Preemptions | — |
| Error Rate | — |
| Weights Offloaded | — |
Lifetime Totals
| Metric | Value |
| Total Requests | — |
| Prompt Tokens | — |
| Generation Tokens | — |
| Avg Prompt Length | — |
| Avg Response Length | — |
Process Info
| Metric | Value |
| Uptime | — |
| CPU Time | — |
| RSS Memory |
| Virtual Memory | — |
| Open FDs | — |