2026-04-07·8 min read·@scottshapiro__

How to Build a Multi-Model AI Router

Stop using one model for everything. Learn how to route prompts to the right model based on task complexity, cost, and speed — with working code in Python and TypeScript.

Most people pick one AI model and use it for everything. That's like using a sledgehammer to hang a picture frame.

The practitioners on ModelMix.ing run 2-5 models in combination — a daily driver for quick tasks, a heavy hitter for complex reasoning, a local model for privacy, and a fallback for when the primary is down or rate-limited.

This guide shows you how to build a working multi-model router from scratch in both Python and TypeScript.

Why Route Between Models?

Cost. Claude Opus costs 15x more than Haiku per token. If 80% of your queries are simple, you're burning money.

Speed. A local Llama 3.3 70B responds in milliseconds. Cloud models add network latency. Route simple tasks locally.

Reliability. APIs go down. Rate limits hit. A fallback model means your app keeps working.

Capability. Some models are better at code, others at creative writing, others at structured extraction. Match the model to the task.

The Architecture

User Prompt
    │
    ▼
┌─────────────┐
│   Router     │ ← classifies task type
└─────────────┘
    │
    ├── coding → Claude Sonnet 4.6
    ├── reasoning → Claude Opus 4.6 / o3
    ├── quick tasks → Haiku / local Llama
    ├── creative → Claude Opus 4.6
    └── fallback → Gemini 2.5 Flash

The router can be as simple as keyword matching or as smart as a small classifier model. Start simple.

Python Implementation

1. Install the SDKs

bash

pip install anthropic openai google-generativeai

2. Set Your API Keys

bash

export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
export GOOGLE_API_KEY=your-key-here

3. Define Your Stack

python

# stack.py — your model configuration
STACK = {
    "daily-driver": {
        "provider": "anthropic",
        "model": "claude-sonnet-4-6",
        "description": "Fast, balanced, good at everything",
    },
    "heavy-lifting": {
        "provider": "anthropic",
        "model": "claude-opus-4-6",
        "description": "Complex reasoning, agentic tasks",
    },
    "coding": {
        "provider": "anthropic",
        "model": "claude-sonnet-4-6",
        "description": "Best coding benchmarks",
    },
    "fallback": {
        "provider": "google",
        "model": "gemini-2.5-flash",
        "description": "Cheap, fast, reliable backup",
    },
}

4. Build the Router

python

# router.py
import anthropic
import openai
from google import genai

def classify_task(prompt: str) -> str:
    """Simple keyword-based routing. Replace with a classifier for production."""
    lower = prompt.lower()
    if any(w in lower for w in ["code", "function", "debug", "error", "implement"]):
        return "coding"
    if any(w in lower for w in ["analyze", "compare", "reason", "proof", "explain why"]):
        return "heavy-lifting"
    if len(prompt) < 100:
        return "daily-driver"
    return "daily-driver"

def ask(prompt: str, role: str = None) -> str:
    """Route to the right model and get a response."""
    if role is None:
        role = classify_task(prompt)

    config = STACK.get(role, STACK["daily-driver"])
    provider = config["provider"]
    model = config["model"]

    try:
        if provider == "anthropic":
            client = anthropic.Anthropic()
            response = client.messages.create(
                model=model,
                max_tokens=4096,
                messages=[{"role": "user", "content": prompt}],
            )
            return response.content[0].text

        elif provider == "openai":
            client = openai.OpenAI()
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
            )
            return response.choices[0].message.content

        elif provider == "google":
            client = genai.Client()
            response = client.models.generate_content(
                model=model,
                contents=prompt,
            )
            return response.text

    except Exception as e:
        if role != "fallback":
            print(f"Error with {model}, falling back: {e}")
            return ask(prompt, role="fallback")
        raise

# Usage
print(ask("Write a Python function to merge two sorted lists"))
# → routes to "coding" → Claude Sonnet 4.6

print(ask("Why is the sky blue?"))
# → routes to "daily-driver" → Claude Sonnet 4.6

print(ask("Compare the epistemological frameworks of Kant and Hume"))
# → routes to "heavy-lifting" → Claude Opus 4.6

TypeScript Implementation

1. Install

bash

npm install @anthropic-ai/sdk openai @google/genai

2. Define Your Stack

typescript

// stack.ts
export const STACK = {
  "daily-driver": { provider: "anthropic", model: "claude-sonnet-4-6" },
  "heavy-lifting": { provider: "anthropic", model: "claude-opus-4-6" },
  "coding": { provider: "anthropic", model: "claude-sonnet-4-6" },
  "fallback": { provider: "google", model: "gemini-2.5-flash" },
} as const;

export type Role = keyof typeof STACK;

3. Build the Router

typescript

// router.ts
import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";
import { GoogleGenAI } from "@google/genai";
import { STACK, type Role } from "./stack";

function classifyTask(prompt: string): Role {
  const lower = prompt.toLowerCase();
  if (/code|function|debug|error|implement/.test(lower)) return "coding";
  if (/analyze|compare|reason|proof|explain why/.test(lower)) return "heavy-lifting";
  return "daily-driver";
}

export async function ask(prompt: string, role?: Role): Promise<string> {
  const resolvedRole = role ?? classifyTask(prompt);
  const { provider, model } = STACK[resolvedRole];

  try {
    if (provider === "anthropic") {
      const client = new Anthropic();
      const response = await client.messages.create({
        model,
        max_tokens: 4096,
        messages: [{ role: "user", content: prompt }],
      });
      return response.content[0].type === "text"
        ? response.content[0].text
        : "";
    }

    if (provider === "google") {
      const client = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY! });
      const response = await client.models.generateContent({
        model,
        contents: prompt,
      });
      return response.text ?? "";
    }

    throw new Error(`Unknown provider: ${provider}`);
  } catch (e) {
    if (resolvedRole !== "fallback") {
      console.error(`Error with ${model}, falling back:`, e);
      return ask(prompt, "fallback");
    }
    throw e;
  }
}

Making the Router Smarter

The keyword classifier is a starting point. Here's how to level up:

Use a Small Model as Classifier

Instead of keywords, use a fast model (Haiku or Gemini Flash) to classify:

python

def smart_classify(prompt: str) -> str:
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=20,
        messages=[{
            "role": "user",
            "content": f"""Classify this prompt into one category.
Categories: coding, heavy-lifting, daily-driver, creative
Reply with ONLY the category name.
Prompt: {prompt}"""
        }],
    )
    return response.content[0].text.strip()

Add Cost Tracking

python

from dataclasses import dataclass

@dataclass
class Usage:
    model: str
    input_tokens: int
    output_tokens: int
    cost_usd: float

usage_log: list[Usage] = []

# Track after each call
usage_log.append(Usage(
    model=model,
    input_tokens=response.usage.input_tokens,
    output_tokens=response.usage.output_tokens,
    cost_usd=calculate_cost(model, response.usage),
))

Add Local Model Support

For local models via Ollama, add to your router:

python

import httpx

if provider == "ollama":
    response = httpx.post("http://localhost:11434/api/chat", json={
        "model": model,
        "messages": [{"role": "user", "content": prompt}],
        "stream": False,
    })
    return response.json()["message"]["content"]

What the Community Runs

Based on the most upvoted stacks on ModelMix.ing, here are the most common patterns:

The Coding Stack: Claude Sonnet (daily driver) + Claude Opus (hard problems) + local Llama (offline/privacy)

The Research Stack: Gemini Pro (search grounding) + Claude Opus (analysis) + GPT-4.1 (summarization)

The Budget Stack: Gemma locally + Gemini Flash as fallback — $0/mo

The Enterprise Stack: o3 (planning) + Claude Sonnet (execution) + embeddings via Cohere

Browse all stacks at ModelMix.ing to find one that matches your use case and budget.

Start Simple

You don't need an orchestration framework to start. The ask(prompt, role) function above is a working multi-model router in under 50 lines. Start there, see which models you actually reach for, then share your stack on ModelMix.ing.