Deploying oMLX on macOS
This guide walks you through setting up oMLX, a macOS-native LLM inference engine built on Apple’s MLX framework, and serving a 27B-parameter Qwen model on your Apple Silicon Mac. By the end you’ll have a model running locally, secured with an API key, exposed to the public internet via a Cloudflare Tunnel, and registered on Token Router so it can start earning credits.
1. Why oMLX?
Section titled “1. Why oMLX?”oMLX leverages Apple Silicon’s unified memory, zero-copy arrays, and lazy computation to deliver large speedups over generic inference runners. It also implements an SSD KV-caching system that pages hot/cold cache blocks to disk, virtually eliminating wait times for tool-calling agents and long-context prompts.
2. Installing oMLX
Section titled “2. Installing oMLX”oMLX ships as a native macOS app with a lightweight CLI shim.
- Open the oMLX GitHub Releases page.
- Download the latest
.dmg. - Open the
.dmgand drag the oMLX app into yourApplicationsfolder. - Launch oMLX from Applications.
- On first launch it installs its CLI shim to
~/.omlx/bin/omlxso you can drive it from the terminal.
- On first launch it installs its CLI shim to
3. Downloading Qwen 3.6 27B and choosing quantization
Section titled “3. Downloading Qwen 3.6 27B and choosing quantization”Quantization compresses the model to fit comfortably within your Mac’s RAM while minimizing quality loss.
Choosing the right quantization
Section titled “Choosing the right quantization”- 4-bit (
oQ4, sometimes tagged-4bit): the sweet spot for most Apple Silicon Macs. A 27B model in 4-bit needs roughly 16–18 GB of unified memory, leaving room for the OS and your context window. Recommended for 32 GB machines. - 8-bit (
oQ8): near-unquantized precision, but much more memory. Use it only on a 64 GB or 128 GB machine (M-series Max/Ultra).
Downloading the model
Section titled “Downloading the model”- Open the oMLX dashboard from the macOS menu-bar icon.
- Go to the Model Browser.
- Search for
Qwen3.6-27B. - Select the 4-bit build (tagged something like
Qwen3.6-27B-A3B-oQ4). - Click Download. Models are stored in
~/.omlx/models/.
4. Enabling API authentication
Section titled “4. Enabling API authentication”Because we’re exposing this API to the public internet, an API key is mandatory. Configure it one of two ways.
Method A — GUI (recommended)
Section titled “Method A — GUI (recommended)”- In the oMLX dashboard, open Settings → Security.
- Enable Require API Key.
- Generate or paste a strong secret, e.g.
sk-my_super_secret_key_2026.
Method B — CLI
Section titled “Method B — CLI”Launch the server manually with the --api-key flag:
omlx serve --model Qwen3.6-27B-A3B-oQ4 --api-key "sk-my_super_secret_key_2026"oMLX serves OpenAI-compatible endpoints on http://127.0.0.1:8000 by default.
Confirm it’s up locally before going further:
curl http://127.0.0.1:8000/v1/models \ -H "Authorization: Bearer sk-my_super_secret_key_2026"5. Exposing the endpoint via Cloudflare Tunnel
Section titled “5. Exposing the endpoint via Cloudflare Tunnel”We’ll use cloudflared to create a secure reverse-proxy tunnel to your local port 8000 — no router port-forwarding required.
Step 1 — Install cloudflared
Section titled “Step 1 — Install cloudflared”brew install cloudflare/cloudflare/cloudflaredStep 2 — Quick tunnel (no domain required)
Section titled “Step 2 — Quick tunnel (no domain required)”For a temporary public URL, great for testing:
cloudflared tunnel --url http://127.0.0.1:8000This prints a https://<random>.trycloudflare.com URL.
Step 3 — Persistent tunnel (recommended for earning)
Section titled “Step 3 — Persistent tunnel (recommended for earning)”If you own a domain on Cloudflare, a named tunnel gives you a stable URL that survives restarts — which is what you want for a node that earns around the clock.
# Authenticate cloudflared with your Cloudflare accountcloudflared tunnel login
# Create a named tunnelcloudflared tunnel create omlx-tunnel
# Point a hostname at itcloudflared tunnel route dns omlx-tunnel ai.yourdomain.com
# Run itcloudflared tunnel run --url http://127.0.0.1:8000 omlx-tunnel6. Testing your setup
Section titled “6. Testing your setup”Your Qwen 3.6 27B model is now reachable from anywhere. Test it with your Cloudflare URL and API key:
curl https://ai.yourdomain.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-my_super_secret_key_2026" \ -d '{ "model": "qwen3.6-27b", "messages": [ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "Write a Python script to reverse a string."} ], "temperature": 0.7 }'7. Register it on Token Router
Section titled “7. Register it on Token Router”Now turn that endpoint into an earning node:
- Sign in to the Token Router dashboard with GitHub.
- Open Instances → Add instance.
- Provide:
- Model — the model id you’re serving (e.g.
qwen3.6-27b). - Endpoint URL — your public tunnel URL, including
/v1(e.g.https://ai.yourdomain.com/v1). - Upstream API key — the
sk-…key you set in step 4. It’s encrypted at rest and never stored in the clear.
- Model — the model id you’re serving (e.g.
- (Optional) Fill in the hardware and software inventory for the instance so the network understands your capacity.
- Save. Once active, your node enters rotation and starts receiving traffic whenever it’s the least-loaded healthy candidate for that model.
That’s it — your Mac is now a paid member of the Token Router network. See Earn Credits for how payouts work.