Cache as L1
Stable prompt fingerprints turn repeated AI requests into edge reads.
A Cloudflare-first product blueprint for routing, caching, and governing GPU-backed AI inference without exposing private API implementation code.
The public project describes the control plane: prompt classification, cache-first execution, model dispatch, memory tiers, and cost gates. Private deployment details stay outside the repository.
The project positions Cloudflare as the fast decision layer and external GPU inference as the expensive execution layer.
Stable prompt fingerprints turn repeated AI requests into edge reads.
Prompt length, intent, and context pressure select the cheapest viable model.
Short-term state, durable history, and object storage remain separate tiers.
The repository is intentionally presentation-grade: it can be shared, starred, and connected to Cloudflare Pages without publishing private service code.