Architecture
Token Router runs on Cloudflare Workers: one Worker, two surfaces. You don’t need to know any of this to use the platform — but if you like understanding what’s under the hood, here it is, kept light.
Two surfaces, one Worker
Section titled “Two surfaces, one Worker”/v1/*— the OpenAI-compatible API. Point any OpenAI SDK here with avk_live_…bearer key. Today:GET /v1/modelsandPOST /v1/chat/completions./api/*— the management API. Powers the dashboard (your account, keys, and instances) and a couple of utility endpoints.
Two ways to authenticate
Section titled “Two ways to authenticate”Both paths live in the same Worker but authenticate differently:
- Browsers use a signed
vk_sessioncookie, set when you sign in with GitHub and backed by a database. - API traffic uses
vk_live_<prefix>_<secret>bearer tokens, resolved right at the edge from a fast key-value cache. The main database is not touched on the hot path, which keeps request latency low.
How a chat request flows
Section titled “How a chat request flows”When you POST /v1/chat/completions, the gateway:
- Authenticates your
vk_live_…key at the edge. - Finds candidates — every active provider instance serving the requested model.
- Picks the least-loaded healthy one, leasing a slot from a global rate limiter (a token bucket with a circuit breaker). Ties break randomly.
- Forwards the call to that provider, decrypting its stored upstream secret just in time.
- Retries on failure — up to 3 attempts with exponential backoff, but only for genuine failures (timeouts, rate-limits, 5xx). A 4xx caused by your input is returned to you as-is and doesn’t penalize the provider.
If every candidate is busy, the request can fall through to an asynchronous overflow queue — you submit, get a job id, and poll for the result.
The pieces
Section titled “The pieces”- Workers (TypeScript + Hono) — the gateway and routing logic.
- A relational store (D1) — accounts, API keys, models, provider instances (plus their hardware/software inventory), sessions, and billing.
- A KV cache — hot bearer-auth lookups, so the database stays off the critical path.
- A Durable Object (RateLimiter) — global rate-limiting and circuit-breaker state.
- Queues — the async overflow path, with a dead-letter queue for safety.
- Web Crypto — HMAC-SHA256 for API keys, AES-GCM for upstream secrets.
The economic layer
Section titled “The economic layer”Routing and billing are two halves of the same system. Every served request is metered, split 70/30 between provider and treasury, and the treasury funds bounties that pull more providers toward under-supplied models. For the full story, see The Fair Ecosystem and Peer Validation.