I Cut Our AI Costs by 73% in One Week —…

Our LLM bill was $12,400/month. Seven days later, it was $3,350 — without changing a single user-facing feature. Here's the exact playbook.

Our LLM bill was $12,400 in March. By the end of the first week of April, the run rate was $3,350/month. Same features, same output quality, same users. Nobody noticed the change — except finance.

This isn''t a theoretical exercise. It''s a step-by-step account of what I did, what I found, and how the numbers moved each day.

Day 0: The audit

Before touching anything, I needed to understand where the money was going.

I pulled a week of API logs — every request, every model, every token count. The breakdown was sobering:

Total requests/week: ~84,000
Model used: GPT-4o for 100% of requests
Average input tokens/request: ~800
Average output tokens/request: ~350
Weekly cost: ~$3,100 (extrapolating to $12,400/month)

Task Type	% of Requests	Example
Classification/routing	28%	Categorize support tickets into 6 types
Entity extraction	22%	Pull names, dates, amounts from text
Simple Q&A	18%	Answer FAQs from a knowledge base
Summarization	15%	Condense customer conversations
Complex generation	12%	Draft detailed responses, reports
Multi-step reasoning	5%	Analyze data, draw conclusions

Tier	% of Traffic	Daily Cost
Economy (Llama 3.1 8B)	50%	$1.20
Mid-range (GPT-4o-mini)	33%	$14.80
Premium (GPT-4o)	17%	$67.50
Total		$83.50/day

Component	Impact
Model routing (3-tier cascade)	-81%
Semantic caching (exact + vector)	-34% of remaining calls
Prompt optimization (trimmed system prompts)	-15% input tokens
Combined new run rate	~$3,350/month

Metric	Before	After
Classification accuracy	94.2%	94.0%
Extraction accuracy	97.8%	97.6%
User satisfaction (CSAT on support)	4.3/5	4.3/5
Average response latency	1.8s	1.2s

I Cut Our AI Costs by 73% in One Week — Here's How

Day 0: The audit

Day 1-2: Setting up the routing layer

Day 3: First results

Day 3-4: Quality validation

Day 5: Adding semantic caching

Day 6-7: Final numbers

What I''d do differently

Your turn