Skip to content

Architecture

Token Router runs on Cloudflare Workers: one Worker, two surfaces. You don’t need to know any of this to use the platform — but if you like understanding what’s under the hood, here it is, kept light.

  • /v1/* — the OpenAI-compatible API. Point any OpenAI SDK here with a vk_live_… bearer key. Today: GET /v1/models and POST /v1/chat/completions.
  • /api/* — the management API. Powers the dashboard (your account, keys, and instances) and a couple of utility endpoints.

Both paths live in the same Worker but authenticate differently:

  • Browsers use a signed vk_session cookie, set when you sign in with GitHub and backed by a database.
  • API traffic uses vk_live_<prefix>_<secret> bearer tokens, resolved right at the edge from a fast key-value cache. The main database is not touched on the hot path, which keeps request latency low.

When you POST /v1/chat/completions, the gateway:

  1. Authenticates your vk_live_… key at the edge.
  2. Finds candidates — every active provider instance serving the requested model.
  3. Picks the least-loaded healthy one, leasing a slot from a global rate limiter (a token bucket with a circuit breaker). Ties break randomly.
  4. Forwards the call to that provider, decrypting its stored upstream secret just in time.
  5. Retries on failure — up to 3 attempts with exponential backoff, but only for genuine failures (timeouts, rate-limits, 5xx). A 4xx caused by your input is returned to you as-is and doesn’t penalize the provider.

If every candidate is busy, the request can fall through to an asynchronous overflow queue — you submit, get a job id, and poll for the result.

  • Workers (TypeScript + Hono) — the gateway and routing logic.
  • A relational store (D1) — accounts, API keys, models, provider instances (plus their hardware/software inventory), sessions, and billing.
  • A KV cache — hot bearer-auth lookups, so the database stays off the critical path.
  • A Durable Object (RateLimiter) — global rate-limiting and circuit-breaker state.
  • Queues — the async overflow path, with a dead-letter queue for safety.
  • Web Crypto — HMAC-SHA256 for API keys, AES-GCM for upstream secrets.

Routing and billing are two halves of the same system. Every served request is metered, split 70/30 between provider and treasury, and the treasury funds bounties that pull more providers toward under-supplied models. For the full story, see The Fair Ecosystem and Peer Validation.