The Fair Ecosystem

Token Router is a two-sided marketplace: providers run model-serving nodes and earn credits; consumers spend credits calling those models through one OpenAI-compatible gateway. This page lays out the rules that keep both sides honest and happy.

1. The deal

Pooled compute is more powerful and more reliable than any single home rig. You bring a model-serving endpoint; we bring the traffic, the billing, the retries, and an SDK-compatible front door.

For every dollar of inference you serve, $0.70 lands in your account and $0.30 goes to the treasury. New accounts start with $5 in credit so they can try the gateway before they buy or earn.

The split is fixed. No per-node negotiation, no enterprise discount, no rate card.

2. What the 30% buys

The platform’s take isn’t profit — it funds bounties. A bounty is a curated emission multiplier (greater than 1×) attached to a specific model. While it’s active, every successful request served against that model pays the provider extra, out of the treasury rather than the consumer’s pocket.

The bonus is standard_payout × (multiplier − 1), capped by the bounty’s treasury budget.
When the budget runs out or the bounty expires, the model returns to baseline.

Bounties exist to direct compute toward under-supplied models. If a model keeps saturating, a bounty raises its effective payout and attracts new instances. Bounties are platform-set today; user-funded bounties are on the near-term roadmap.

3. How nodes get picked

When a request arrives, the gateway finds every active instance serving that model, asks the routing layer who has spare capacity, and picks the least-loaded healthy candidate. Ties are broken at random, so two equally idle nodes share traffic evenly over time.

Routing protects you from short-term trouble:

4xx user errors don’t count against you. If a consumer sends bad input and your upstream returns a 400, that’s a neutral outcome — the response goes back to the user and your reputation is untouched. Only 408, 429, and 5xx are treated as failures.
The circuit breaker is short-fused but forgiving. Three failures in 60 seconds takes you offline. After 15 seconds you’re eligible for a single probe request; pass it and you’re fully back. A three-minute hiccup won’t park you for an hour.

4. The minimum bar: 5 tok/s

Token Router enforces a minimum sustained throughput of 5 tokens/second for generation. The floor protects consumer experience so the marketplace can’t race to the bottom on price.

Nodes that fall persistently below it are circuit-broken, become probe-eligible after the usual 15-second cooldown, and re-enter rotation automatically once they recover. A hardware or quantization upgrade just shows up as faster requests — there’s no manual re-enrollment.

Above the floor, we don’t reward extreme speed over steady speed. A 200 tok/s H100 and a 6 tok/s consumer GPU are both fair candidates; routing is by spare capacity, not performance ranking. The marketplace is not a leaderboard.

5. Stick around and you earn more

The platform actively rewards steady operators:

Rolling health window aggregates your success/failure outcomes into a 0–1 score, used to break ties among least-loaded candidates.
Ramp-up for new instances: new nodes start at a fraction of their declared capacity and scale up over their first week of good behavior — a Sybil deterrent.
Uptime credit: tracked over the rolling window, so both probe-success and time-online count.
Anti-flapping on model switches: change the model an instance serves and your multipliers reduce for a short cooldown before recovering. This keeps model coverage stable instead of chasing the biggest bounty week to week.

None of these mechanisms punish newcomers. They reward not leaving.

6. Trust takes time

The hardest problem in a peer-to-peer compute marketplace is proving an instance is running the model it claims to be running. Token Router validates online (statistical checks on logprob distributions, opportunistic semantic-consensus replicas) and offline (asynchronous Proof of Useful Work audits) before unlocking the highest emission tiers.

The practical upshot for operators: new instances earn at baseline until they’ve produced enough validated tokens for the network to trust them. Reputation tiers gate access to higher bounty multipliers. Long-term honesty is, by construction, far more profitable than short-term model-spoofing. More in Peer Validation.