Edge AI Assistant

A Cloudflare-first product blueprint for routing, caching, and governing GPU-backed AI inference without exposing private API implementation code.

edge-control-plane
prompt fingerprint cache hit
context window 4.1k chars
route decision small model
Public surface No API source
Runtime target Cloudflare Pages
Core thesis Manage AI, not models

Architecture without leaking implementation.

The public project describes the control plane: prompt classification, cache-first execution, model dispatch, memory tiers, and cost gates. Private deployment details stay outside the repository.

Client to Cloudflare edge to inference provider architecture

Designed as a low-cost inference router.

The project positions Cloudflare as the fast decision layer and external GPU inference as the expensive execution layer.

01

Cache as L1

Stable prompt fingerprints turn repeated AI requests into edge reads.

02

Routing as control logic

Prompt length, intent, and context pressure select the cheapest viable model.

03

Memory as hierarchy

Short-term state, durable history, and object storage remain separate tiers.

Launch path

The repository is intentionally presentation-grade: it can be shared, starred, and connected to Cloudflare Pages without publishing private service code.

  1. Public blueprint Architecture, positioning, and product narrative.
  2. Private control plane Worker logic, tokens, routing policy, and billing rules remain internal.
  3. Hosted demo shell Cloudflare Pages publishes the polished surface globally.