Skip to content

Intelligence & Bots

CF Messenger integrates AI not as an external bolt-on, but as a first-class citizen within the edge mesh.

Bots embody deterministic personas. Character definitions include vocabulary, punctuation quirks, and stylistic prompts hard-coded into the orchestrator to maintain the 2005-era “L33T speak” and aesthetic.

Interactive contacts are powered by Llama 3.2 1B, running directly on Cloudflare’s GPUs. We utilise the instruction-tuned model for character-accurate responses with minimal latency.

To prevent “bill shock” from automated or runaway AI interactions, the system employs several layers of protection:

  • Daily Quota: 10,000 Worker AI interactions tracked in KV (eventually consistent).
  • Fallback Logic: When a quota is depleted, the bot emits a “Bot is sleeping” message and rejects new mentions.
  • Circuit Breaker: A global “Kill Switch” managed via Cloudflare KV can instantly disable all AI features across the zone.
  • Resource Budgeting: Workers AI billing is based on Neurons (an integrated unit of GPU output). 1,000 Neurons provides approximately 130 LLM responses for the 1B model.

  1. Stateful Memory: Migrating bot conversation history into dedicated Durable Objects for multi-turn context.
  2. Quota Ledger: Moving the quota tracking from KV to a Durable Object to eliminate consistency races during high-traffic demos.
  3. Advanced Personas: Upgrading to 8B models for deeper character nuance while maintaining cost efficiency via location-hint pinning.