LLM routing

The right model
for each job. No code.

Route summarization to a cheap model, deep reasoning to a frontier model, and regulated workflows to an on-prem one. Through simple admin rules, not hard-wiring. Bring your own keys. Every call is metered for the spend dashboard.

Two building blocks

Profiles and Routing.

A Profile is a named model bundle. A Routing rule binds a Profile to a scope. That's the whole mental model.

LLM Profile

A reusable bundle: provider, model, temperature, max tokens, system-prompt prefix, and the Connection that holds the key.

Anthropic ClaudeOpenAI GPT-4oGoogle GeminiAmazon BedrockOllama (on-prem)

LLM Routing

A rule that binds a Profile to a scope, with an optional purpose tag and priority. The narrowest matching rule wins.

atom→formula→chain→purpose-tag→engine→tenant→platform

Precedence chain: most specific override wins.

Routing rules

Say where each model runs.

quadrazene.app/admin/llm/routing

quadrazene v0.2.1

Quadrazene

ReactorComposeRecipesInboxGovernanceRecords

⌘K

Active rules

purpose-tag · summarization→Haiku · fast + cheap

purpose-tag · deep-reasoning→GPT-4o

engine · governance→Claude Opus

system · patients (restricted)→Ollama · on-prem

tenant · default→Claude Sonnet

Resolution per call: atom → formula → … → platform. Restricted patient data never leaves the building.

Multi-provider tool-use

Anthropic Messages, OpenAI tools, and Gemini function-calling all supported. Switch providers without rewriting Formulas.

Purpose tags cut across scopes

Tag atoms summarization / deep-reasoning / classification / sql-synth and route by tag anywhere in the tree.

Regulated-data routing

Pin restricted Systems to an on-prem model so sensitive prompts never reach a hosted provider.

Bring your own keys

Profiles point at your Connections. Your accounts, your keys, your rate limits.

Back-compatible

Legacy single-model config keeps working. The resolver delegates to the new routing automatically.

Spend dashboard

Know what every model costs you.

By profile, engine, tier

Cost, tokens, and run counts grouped the way you actually budget.

7 / 30 / 90-day views

Trend spend over the windows that match your reporting cadence.

Drill to the Reaction

Every dollar traces back to the individual run that spent it.

Tier telemetry

Each Reaction records which profile and scope tier served it. No guessing.

Route your models your way.

Request a demo

The right modelfor each job. No code.