Skip to content

Deploying oMLX on macOS

This guide walks you through setting up oMLX, a macOS-native LLM inference engine built on Apple’s MLX framework, and serving a 27B-parameter Qwen model on your Apple Silicon Mac. By the end you’ll have a model running locally, secured with an API key, exposed to the public internet via a Cloudflare Tunnel, and registered on Token Router so it can start earning credits.


oMLX leverages Apple Silicon’s unified memory, zero-copy arrays, and lazy computation to deliver large speedups over generic inference runners. It also implements an SSD KV-caching system that pages hot/cold cache blocks to disk, virtually eliminating wait times for tool-calling agents and long-context prompts.


oMLX ships as a native macOS app with a lightweight CLI shim.

  1. Open the oMLX GitHub Releases page.
  2. Download the latest .dmg.
  3. Open the .dmg and drag the oMLX app into your Applications folder.
  4. Launch oMLX from Applications.
    • On first launch it installs its CLI shim to ~/.omlx/bin/omlx so you can drive it from the terminal.

3. Downloading Qwen 3.6 27B and choosing quantization

Section titled “3. Downloading Qwen 3.6 27B and choosing quantization”

Quantization compresses the model to fit comfortably within your Mac’s RAM while minimizing quality loss.

  • 4-bit (oQ4, sometimes tagged -4bit): the sweet spot for most Apple Silicon Macs. A 27B model in 4-bit needs roughly 16–18 GB of unified memory, leaving room for the OS and your context window. Recommended for 32 GB machines.
  • 8-bit (oQ8): near-unquantized precision, but much more memory. Use it only on a 64 GB or 128 GB machine (M-series Max/Ultra).
  1. Open the oMLX dashboard from the macOS menu-bar icon.
  2. Go to the Model Browser.
  3. Search for Qwen3.6-27B.
  4. Select the 4-bit build (tagged something like Qwen3.6-27B-A3B-oQ4).
  5. Click Download. Models are stored in ~/.omlx/models/.

Because we’re exposing this API to the public internet, an API key is mandatory. Configure it one of two ways.

  1. In the oMLX dashboard, open Settings → Security.
  2. Enable Require API Key.
  3. Generate or paste a strong secret, e.g. sk-my_super_secret_key_2026.

Launch the server manually with the --api-key flag:

Terminal window
omlx serve --model Qwen3.6-27B-A3B-oQ4 --api-key "sk-my_super_secret_key_2026"

oMLX serves OpenAI-compatible endpoints on http://127.0.0.1:8000 by default.

Confirm it’s up locally before going further:

Terminal window
curl http://127.0.0.1:8000/v1/models \
-H "Authorization: Bearer sk-my_super_secret_key_2026"

5. Exposing the endpoint via Cloudflare Tunnel

Section titled “5. Exposing the endpoint via Cloudflare Tunnel”

We’ll use cloudflared to create a secure reverse-proxy tunnel to your local port 8000 — no router port-forwarding required.

Terminal window
brew install cloudflare/cloudflare/cloudflared

Step 2 — Quick tunnel (no domain required)

Section titled “Step 2 — Quick tunnel (no domain required)”

For a temporary public URL, great for testing:

Terminal window
cloudflared tunnel --url http://127.0.0.1:8000

This prints a https://<random>.trycloudflare.com URL.

Section titled “Step 3 — Persistent tunnel (recommended for earning)”

If you own a domain on Cloudflare, a named tunnel gives you a stable URL that survives restarts — which is what you want for a node that earns around the clock.

Terminal window
# Authenticate cloudflared with your Cloudflare account
cloudflared tunnel login
# Create a named tunnel
cloudflared tunnel create omlx-tunnel
# Point a hostname at it
cloudflared tunnel route dns omlx-tunnel ai.yourdomain.com
# Run it
cloudflared tunnel run --url http://127.0.0.1:8000 omlx-tunnel

Your Qwen 3.6 27B model is now reachable from anywhere. Test it with your Cloudflare URL and API key:

Terminal window
curl https://ai.yourdomain.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-my_super_secret_key_2026" \
-d '{
"model": "qwen3.6-27b",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python script to reverse a string."}
],
"temperature": 0.7
}'

Now turn that endpoint into an earning node:

  1. Sign in to the Token Router dashboard with GitHub.
  2. Open Instances → Add instance.
  3. Provide:
    • Model — the model id you’re serving (e.g. qwen3.6-27b).
    • Endpoint URL — your public tunnel URL, including /v1 (e.g. https://ai.yourdomain.com/v1).
    • Upstream API key — the sk-… key you set in step 4. It’s encrypted at rest and never stored in the clear.
  4. (Optional) Fill in the hardware and software inventory for the instance so the network understands your capacity.
  5. Save. Once active, your node enters rotation and starts receiving traffic whenever it’s the least-loaded healthy candidate for that model.

That’s it — your Mac is now a paid member of the Token Router network. See Earn Credits for how payouts work.