Architecture

How It Works

Every request passes through 8 layers — cache, security, classification, quality matrix, routing, validation, learning, and reporting.

Step 0

Semantic Cache Check

Before anything else, the prompt is hashed and matched against the semantic cache. If a similar prompt was routed before, the answer comes back instantly — no model call, zero cost.

// Neural Insight:Two-level lookup: exact SHA-256 hash (< 1ms, free) → cosine similarity via pgvector (threshold: 0.92). The cache grows smarter with every request.

Step 1

Security Shield Scan

Every prompt is scanned by a real-time heuristic engine that detects prompt injection, DAN jailbreaks, system-prompt extraction, and token-smuggling attacks.

// Neural Insight:Pure regex/heuristic — no LLM call, under 1ms. Three tiers: CRITICAL and HIGH patterns are blocked (HTTP 403). MEDIUM patterns are flagged and logged.

Step 2

Prompt Analysis & Classification

A lightweight intent model classifies the task type (coding, summarization, reasoning, chat...) and complexity score in real-time.

// Neural Insight:This classification drives all downstream decisions — routing mode, confidence matrix lookup, and shadow engine triggers.

Step 3

Confidence Matrix Lookup

Before routing, the engine checks a live quality matrix: has this model/task-type combination historically produced poor results? If yes, it auto-escalates to a premium model.

// Neural Insight:Matrix is rebuilt from shadow audit data every 30 minutes. Pairs with < 20 samples are skipped to avoid cold-start false positives.

Step 4

Dynamic Model Routing

The router selects the cheapest model capable of handling the task at the required quality level — Auto, Cost, Speed, Quality, or your Custom rules.

// Neural Insight:Why pay for GPT-4o if a 10x cheaper model handles summarization at 99.9% accuracy? The router makes that call per request, in milliseconds.

Step 5

Shadow Quality Validation

For economy-tier responses, a silent A/B check runs in the background to validate quality before committing the routing decision to the confidence matrix.

// Neural Insight:If the shadow check flags a poor response, the result is escalated to a premium model automatically. This data feeds back into the confidence matrix.

Step 6

Feedback Loop & Learning

Every shadow audit result updates the confidence matrix. Over time, routing decisions improve automatically without any manual configuration.

// Neural Insight:This is the data moat: a competitor running the same code today starts with zero historical quality data. Your routing gets better as your traffic grows.

Step 7

FinOps Attribution & Reporting

Every cent saved is recorded. Per-user attribution, daily cost series, and ROI vs GPT-4o benchmark are all tracked in real-time.

// Neural Insight:Use the User Attribution field to track spend per end-user, generate monthly PDF reports, and show your CFO exactly how much AI routing saves.

Ready to route smarter?

Launch Dashboard