BUZZ Dedicated LLM Endpoint Inference Dashboard

Monitor real-time throughput, latency, KV cache, and request metrics for your dedicated LLM inference endpoints.

To get started, find your metrics URL in the BUZZ LLM Inference Console under your dedicated endpoint's details, then add it below.

Requests Running

—

active in engine

Requests Waiting

—

in queue

KV Cache Usage

—

of available cache

Prefix Cache Hit Rate

—

tokens served from cache

Avg TTFT

—

time to first token

Avg E2E Latency

—

end-to-end request

Prompt Tokens / sec

—

Generation Tokens / sec

—

Requests / sec

—

KV Cache %

—

Request Outcomes (lifetime)

TTFT Distribution

E2E Latency Distribution

Prompt Token Size Distribution

Active Requests & Queue — Live

Generation Tokens/sec — Comparison

KV Cache % — Comparison

Health Checks

Signal	Status
Engine State	—
Queue Backlog	—
Preemptions	—
Error Rate	—
Weights Offloaded	—

Lifetime Totals

Metric	Value
Total Requests	—
Prompt Tokens	—
Generation Tokens	—
Avg Prompt Length	—
Avg Response Length	—

Process Info

Metric	Value
Uptime	—
CPU Time	—
RSS Memory	—
Virtual Memory	—
Open FDs	—

Add Metrics Endpoint