Edge AI Assistant

A Cloudflare-first product blueprint for routing, caching, and governing GPU-backed AI inference without exposing private API implementation code.

Explore architecture Review launch plan

edge-control-plane

prompt fingerprint cache hit

context window 4.1k chars

route decision small model

Public surface No API source

Runtime target Cloudflare Pages

Core thesis Manage AI, not models

Architecture without leaking implementation.

The public project describes the control plane: prompt classification, cache-first execution, model dispatch, memory tiers, and cost gates. Private deployment details stay outside the repository.

Client to Cloudflare edge to inference provider architecture

Designed as a low-cost inference router.

The project positions Cloudflare as the fast decision layer and external GPU inference as the expensive execution layer.

Cache as L1

Stable prompt fingerprints turn repeated AI requests into edge reads.

Routing as control logic

Prompt length, intent, and context pressure select the cheapest viable model.

Memory as hierarchy

Short-term state, durable history, and object storage remain separate tiers.

Launch path

The repository is intentionally presentation-grade: it can be shared, starred, and connected to Cloudflare Pages without publishing private service code.

Public blueprint Architecture, positioning, and product narrative.
Private control plane Worker logic, tokens, routing policy, and billing rules remain internal.
Hosted demo shell Cloudflare Pages publishes the polished surface globally.