Deploying oMLX on macOS

This guide walks you through setting up oMLX, a macOS-native LLM inference engine built on Apple’s MLX framework, and serving a 27B-parameter Qwen model on your Apple Silicon Mac. By the end you’ll have a model running locally, secured with an API key, exposed to the public internet via a Cloudflare Tunnel, and registered on Token Router so it can start earning credits.

1. Why oMLX?

oMLX leverages Apple Silicon’s unified memory, zero-copy arrays, and lazy computation to deliver large speedups over generic inference runners. It also implements an SSD KV-caching system that pages hot/cold cache blocks to disk, virtually eliminating wait times for tool-calling agents and long-context prompts.

2. Installing oMLX

oMLX ships as a native macOS app with a lightweight CLI shim.

Open the oMLX GitHub Releases page.
Download the latest .dmg.
Open the .dmg and drag the oMLX app into your Applications folder.
Launch oMLX from Applications.
- On first launch it installs its CLI shim to ~/.omlx/bin/omlx so you can drive it from the terminal.

3. Downloading Qwen 3.6 27B and choosing quantization

Quantization compresses the model to fit comfortably within your Mac’s RAM while minimizing quality loss.

Choosing the right quantization

4-bit (oQ4, sometimes tagged -4bit): the sweet spot for most Apple Silicon Macs. A 27B model in 4-bit needs roughly 16–18 GB of unified memory, leaving room for the OS and your context window. Recommended for 32 GB machines.
8-bit (oQ8): near-unquantized precision, but much more memory. Use it only on a 64 GB or 128 GB machine (M-series Max/Ultra).

Downloading the model

Open the oMLX dashboard from the macOS menu-bar icon.
Go to the Model Browser.
Search for Qwen3.6-27B.
Select the 4-bit build (tagged something like Qwen3.6-27B-A3B-oQ4).
Click Download. Models are stored in ~/.omlx/models/.

4. Enabling API authentication

Because we’re exposing this API to the public internet, an API key is mandatory. Configure it one of two ways.

Method A — GUI (recommended)

In the oMLX dashboard, open Settings → Security.
Enable Require API Key.
Generate or paste a strong secret, e.g. sk-my_super_secret_key_2026.

Method B — CLI

Launch the server manually with the --api-key flag:

omlx serve --model Qwen3.6-27B-A3B-oQ4 --api-key "sk-my_super_secret_key_2026"

oMLX serves OpenAI-compatible endpoints on http://127.0.0.1:8000 by default.

Confirm it’s up locally before going further:

curl http://127.0.0.1:8000/v1/models \
  -H "Authorization: Bearer sk-my_super_secret_key_2026"

5. Exposing the endpoint via Cloudflare Tunnel

We’ll use cloudflared to create a secure reverse-proxy tunnel to your local port 8000 — no router port-forwarding required.

Step 1 — Install cloudflared

brew install cloudflare/cloudflare/cloudflared

Step 2 — Quick tunnel (no domain required)

For a temporary public URL, great for testing:

cloudflared tunnel --url http://127.0.0.1:8000

This prints a https://<random>.trycloudflare.com URL.

Step 3 — Persistent tunnel (recommended for earning)

If you own a domain on Cloudflare, a named tunnel gives you a stable URL that survives restarts — which is what you want for a node that earns around the clock.

# Authenticate cloudflared with your Cloudflare account
cloudflared tunnel login

# Create a named tunnel
cloudflared tunnel create omlx-tunnel

# Point a hostname at it
cloudflared tunnel route dns omlx-tunnel ai.yourdomain.com

# Run it
cloudflared tunnel run --url http://127.0.0.1:8000 omlx-tunnel

6. Testing your setup

Your Qwen 3.6 27B model is now reachable from anywhere. Test it with your Cloudflare URL and API key:

curl https://ai.yourdomain.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-my_super_secret_key_2026" \
  -d '{
    "model": "qwen3.6-27b",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a Python script to reverse a string."}
    ],
    "temperature": 0.7
  }'

7. Register it on Token Router

Now turn that endpoint into an earning node:

Sign in to the Token Router dashboard with GitHub.
Open Instances → Add instance.
Provide:
- Model — the model id you’re serving (e.g. qwen3.6-27b).
- Endpoint URL — your public tunnel URL, including /v1 (e.g. https://ai.yourdomain.com/v1).
- Upstream API key — the sk-… key you set in step 4. It’s encrypted at rest and never stored in the clear.
(Optional) Fill in the hardware and software inventory for the instance so the network understands your capacity.
Save. Once active, your node enters rotation and starts receiving traffic whenever it’s the least-loaded healthy candidate for that model.

That’s it — your Mac is now a paid member of the Token Router network. See Earn Credits for how payouts work.