How to Build a Multi-Model AI Router
Stop using one model for everything. Learn how to route prompts to the right model based on task complexity, cost, and speed — with working code in Python and TypeScript.
Most people pick one AI model and use it for everything. That's like using a sledgehammer to hang a picture frame.
The practitioners on ModelMix.ing run 2-5 models in combination — a daily driver for quick tasks, a heavy hitter for complex reasoning, a local model for privacy, and a fallback for when the primary is down or rate-limited.
This guide shows you how to build a working multi-model router from scratch in both Python and TypeScript.
Why Route Between Models?
Cost. Claude Opus costs 15x more than Haiku per token. If 80% of your queries are simple, you're burning money.
Speed. A local Llama 3.3 70B responds in milliseconds. Cloud models add network latency. Route simple tasks locally.
Reliability. APIs go down. Rate limits hit. A fallback model means your app keeps working.
Capability. Some models are better at code, others at creative writing, others at structured extraction. Match the model to the task.
The Architecture
User Prompt
│
▼
┌─────────────┐
│ Router │ ← classifies task type
└─────────────┘
│
├── coding → Claude Sonnet 4.6
├── reasoning → Claude Opus 4.6 / o3
├── quick tasks → Haiku / local Llama
├── creative → Claude Opus 4.6
└── fallback → Gemini 2.5 FlashThe router can be as simple as keyword matching or as smart as a small classifier model. Start simple.
Python Implementation
1. Install the SDKs
pip install anthropic openai google-generativeai
2. Set Your API Keys
export ANTHROPIC_API_KEY=your-key-here export OPENAI_API_KEY=your-key-here export GOOGLE_API_KEY=your-key-here
3. Define Your Stack
# stack.py — your model configuration
STACK = {
"daily-driver": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"description": "Fast, balanced, good at everything",
},
"heavy-lifting": {
"provider": "anthropic",
"model": "claude-opus-4-6",
"description": "Complex reasoning, agentic tasks",
},
"coding": {
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"description": "Best coding benchmarks",
},
"fallback": {
"provider": "google",
"model": "gemini-2.5-flash",
"description": "Cheap, fast, reliable backup",
},
}4. Build the Router
# router.py
import anthropic
import openai
from google import genai
def classify_task(prompt: str) -> str:
"""Simple keyword-based routing. Replace with a classifier for production."""
lower = prompt.lower()
if any(w in lower for w in ["code", "function", "debug", "error", "implement"]):
return "coding"
if any(w in lower for w in ["analyze", "compare", "reason", "proof", "explain why"]):
return "heavy-lifting"
if len(prompt) < 100:
return "daily-driver"
return "daily-driver"
def ask(prompt: str, role: str = None) -> str:
"""Route to the right model and get a response."""
if role is None:
role = classify_task(prompt)
config = STACK.get(role, STACK["daily-driver"])
provider = config["provider"]
model = config["model"]
try:
if provider == "anthropic":
client = anthropic.Anthropic()
response = client.messages.create(
model=model,
max_tokens=4096,
messages=[{"role": "user", "content": prompt}],
)
return response.content[0].text
elif provider == "openai":
client = openai.OpenAI()
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
elif provider == "google":
client = genai.Client()
response = client.models.generate_content(
model=model,
contents=prompt,
)
return response.text
except Exception as e:
if role != "fallback":
print(f"Error with {model}, falling back: {e}")
return ask(prompt, role="fallback")
raise
# Usage
print(ask("Write a Python function to merge two sorted lists"))
# → routes to "coding" → Claude Sonnet 4.6
print(ask("Why is the sky blue?"))
# → routes to "daily-driver" → Claude Sonnet 4.6
print(ask("Compare the epistemological frameworks of Kant and Hume"))
# → routes to "heavy-lifting" → Claude Opus 4.6TypeScript Implementation
1. Install
npm install @anthropic-ai/sdk openai @google/genai
2. Define Your Stack
// stack.ts
export const STACK = {
"daily-driver": { provider: "anthropic", model: "claude-sonnet-4-6" },
"heavy-lifting": { provider: "anthropic", model: "claude-opus-4-6" },
"coding": { provider: "anthropic", model: "claude-sonnet-4-6" },
"fallback": { provider: "google", model: "gemini-2.5-flash" },
} as const;
export type Role = keyof typeof STACK;3. Build the Router
// router.ts
import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";
import { GoogleGenAI } from "@google/genai";
import { STACK, type Role } from "./stack";
function classifyTask(prompt: string): Role {
const lower = prompt.toLowerCase();
if (/code|function|debug|error|implement/.test(lower)) return "coding";
if (/analyze|compare|reason|proof|explain why/.test(lower)) return "heavy-lifting";
return "daily-driver";
}
export async function ask(prompt: string, role?: Role): Promise<string> {
const resolvedRole = role ?? classifyTask(prompt);
const { provider, model } = STACK[resolvedRole];
try {
if (provider === "anthropic") {
const client = new Anthropic();
const response = await client.messages.create({
model,
max_tokens: 4096,
messages: [{ role: "user", content: prompt }],
});
return response.content[0].type === "text"
? response.content[0].text
: "";
}
if (provider === "google") {
const client = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY! });
const response = await client.models.generateContent({
model,
contents: prompt,
});
return response.text ?? "";
}
throw new Error(`Unknown provider: ${provider}`);
} catch (e) {
if (resolvedRole !== "fallback") {
console.error(`Error with ${model}, falling back:`, e);
return ask(prompt, "fallback");
}
throw e;
}
}Making the Router Smarter
The keyword classifier is a starting point. Here's how to level up:
Use a Small Model as Classifier
Instead of keywords, use a fast model (Haiku or Gemini Flash) to classify:
def smart_classify(prompt: str) -> str:
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=20,
messages=[{
"role": "user",
"content": f"""Classify this prompt into one category.
Categories: coding, heavy-lifting, daily-driver, creative
Reply with ONLY the category name.
Prompt: {prompt}"""
}],
)
return response.content[0].text.strip()Add Cost Tracking
from dataclasses import dataclass
@dataclass
class Usage:
model: str
input_tokens: int
output_tokens: int
cost_usd: float
usage_log: list[Usage] = []
# Track after each call
usage_log.append(Usage(
model=model,
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
cost_usd=calculate_cost(model, response.usage),
))Add Local Model Support
For local models via Ollama, add to your router:
import httpx
if provider == "ollama":
response = httpx.post("http://localhost:11434/api/chat", json={
"model": model,
"messages": [{"role": "user", "content": prompt}],
"stream": False,
})
return response.json()["message"]["content"]What the Community Runs
Based on the most upvoted stacks on ModelMix.ing, here are the most common patterns:
The Coding Stack: Claude Sonnet (daily driver) + Claude Opus (hard problems) + local Llama (offline/privacy)
The Research Stack: Gemini Pro (search grounding) + Claude Opus (analysis) + GPT-4.1 (summarization)
The Budget Stack: Gemma locally + Gemini Flash as fallback — $0/mo
The Enterprise Stack: o3 (planning) + Claude Sonnet (execution) + embeddings via Cohere
Browse all stacks at ModelMix.ing to find one that matches your use case and budget.
Start Simple
You don't need an orchestration framework to start. The ask(prompt, role) function above is a working multi-model router in under 50 lines. Start there, see which models you actually reach for, then share your stack on ModelMix.ing.