Platform AI Part 1: Building a Personal AI Chatbot with Claude and Cloudflare — hero banner

January 11, 2026·12 min read

This is Part 1 of the Platform AI series. Part 2 adds agentic tool use for managing AKS clusters.

I wanted a specialized AI assistant that could answer my DevOps and Infrastructure as Code questions without the distraction of general-purpose chatbots. The goal was simple: build a focused tool that knows Kubernetes, Docker, AKS, Azure CLI, Istio, Terraform, Helm, and related IaC tools inside and out, and politely declines everything else.

What started as a weekend project turned into a surprisingly elegant solution using Claude's API, React, Cloudflare Pages Functions, and Firebase Realtime Database for persistent chat history. The result is a clean, minimal chat interface that feels like Claude in VS Code. No fluff, just helpful answers to infrastructure questions.

The assistant lives at /tools/ai on this blog and includes persistent multi-chat conversations, real-time usage tracking, configurable AI parameters (temperature and max tokens), fullscreen mode for deep-dive sessions, and a visual design that stays out of your way while you work through deployment issues or troubleshoot pod crashes.

Try it now →


Architecture Overview

The stack is straightforward. The frontend is a Gatsby static site with a React chat component. The backend is a single Cloudflare Pages Function running serverless. Claude 3 Haiku handles all the AI heavy lifting via the Anthropic API, and Firebase Realtime Database stores persistent chat history.

Authentication uses simple PAT validation where the backend checks the token against an environment variable. Scope control happens through a system prompt that enforces topic restrictions.

The backend acts as a thin proxy between the frontend and Claude API, validating the PAT and injecting the system prompt that restricts Claude's responses to only Kubernetes, Docker, AKS, Azure CLI, Istio, Terraform, Helm, and Infrastructure as Code topics. Firebase Realtime Database stores chat conversations and usage statistics, keyed by a SHA-256 hash of the user's PAT for privacy.


Backend Implementation

Cloudflare Pages Function

The backend is a single file at functions/api/ai-chat.js. Cloudflare Pages automatically discovers functions in the /functions directory and maps them to routes.

The function handles PAT validation against an environment variable, forwards messages to Claude API with the system prompt, extracts usage statistics from the API response, calculates cost estimates based on token usage, and returns the response with usage metadata.

Authentication and Parameters:

const { messages, pat, temperature = 0.3, maxTokens = 2048 } = await request.json();

if (!pat || pat !== env.ASSISTANT_PAT) {
  return new Response(JSON.stringify({ error: 'Invalid access token' }), {
    status: 401,
    headers: { ...corsHeaders, 'Content-Type': 'application/json' }
  });
}

The backend accepts configurable temperature (defaults to 0.3 for consistent technical responses) and maxTokens (defaults to 2048) from the frontend, with validation to ensure they're within safe ranges (temperature: 0-1, maxTokens: 256-4096).

Scope Enforcement System Prompt:

const systemPrompt = `You are a specialized DevOps and Infrastructure as Code assistant focused exclusively on Kubernetes, Docker, AKS (Azure Kubernetes Service), Azure CLI (az), Istio, Terraform, Helm, and related IaC tools.

STRICT RULES:
1. ONLY answer questions about:
   - Kubernetes (kubectl, resources, YAML, concepts, troubleshooting)
   - Docker (Dockerfile, images, containers, compose, build)
   - AKS (Azure Kubernetes Service)
   - Azure CLI (az commands for AKS, ACR, and related services)
   - Istio (service mesh, VirtualServices, DestinationRules, traffic management)
   - Terraform (HCL, providers, state management, modules, workspaces)
   - Helm (charts, values, releases, templating, repositories)
   - Infrastructure as Code tools (Ansible, Pulumi, CloudFormation, ARM templates when related to Kubernetes/Azure)
   - GitOps tools (ArgoCD, Flux when related to K8s deployments)

2. For ANY other topics, respond with:
   "I can only help with Kubernetes, Docker, AKS, Azure CLI, Istio, Terraform, Helm, and Infrastructure as Code questions. Please ask about those topics."

3. When answering in-scope questions:
   - Provide practical, actionable answers
   - Include relevant command examples in code blocks
   - Be concise but thorough
   - Explain WHY when helpful (not just HOW)
   - Use proper formatting for commands, YAML, and HCL`;

Claude API Call:

const anthropicResponse = await fetch('https://api.anthropic.com/v1/messages', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': env.ANTHROPIC_API_KEY,
    'anthropic-version': '2023-06-01'
  },
  body: JSON.stringify({
    model: 'claude-3-haiku-20240307',
    max_tokens: Math.min(Math.max(maxTokens, 256), 4096),
    temperature: Math.min(Math.max(temperature, 0), 1),
    system: systemPrompt,
    messages: messages
  })
});

Usage Tracking:

const usage = data.usage; // { input_tokens, output_tokens }

const inputCost = (usage.input_tokens / 1000000) * 3;
const outputCost = (usage.output_tokens / 1000000) * 15;
const totalCost = inputCost + outputCost;

return new Response(JSON.stringify({
  content,
  usage: {
    input_tokens: usage.input_tokens,
    output_tokens: usage.output_tokens,
    total_tokens: usage.input_tokens + usage.output_tokens,
    estimated_cost: totalCost.toFixed(4)
  }
}));

Frontend Implementation

React Chat Component

The chat interface lives in src/components/tools/AiAssistant.js with ~680 lines of React code managing state, API calls, Firebase integration, and UI rendering.

State Management:

const [pat, setPat] = useState(() =>
  localStorage.getItem('ai_assistant_pat') || ''
)
const [patHash, setPatHash] = useState(null)
const [chats, setChats] = useState({})
const [currentChatId, setCurrentChatId] = useState(null)
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [loading, setLoading] = useState(false)
const [sessionUsage, setSessionUsage] = useState({
  messages: 0, tokens: 0, cost: 0
})
const [totalUsage, setTotalUsage] = useState({
  messages: 0, tokens: 0, cost: 0
})
const [temperature, setTemperature] = useState(() =>
  parseFloat(localStorage.getItem('ai_assistant_temperature')) || 0.3
)
const [maxTokens, setMaxTokens] = useState(() =>
  parseInt(localStorage.getItem('ai_assistant_max_tokens')) || 2048
)

The PAT, temperature, and maxTokens persist in localStorage. Chat history and usage statistics persist in Firebase Realtime Database.

PAT Hashing for Privacy:

async function hashPat(pat) {
  const encoder = new TextEncoder()
  const data = encoder.encode(pat)
  const hashBuffer = await crypto.subtle.digest('SHA-256', data)
  const hashArray = Array.from(new Uint8Array(hashBuffer))
  return hashArray.map(b => b.toString(16).padStart(2, '0')).join('')
}

The PAT is hashed using SHA-256 before being used as a Firebase database key. This ensures that even if the database is compromised, the actual PAT values remain private.

API Call Flow:

const sendMessage = async () => {
  let chatId = currentChatId
  if (!chatId) {
    chatId = Date.now().toString()
    setCurrentChatId(chatId)
  }

  const newMessages = [...messages, { role: 'user', content: input }]
  setMessages(newMessages)
  setLoading(true)

  try {
    const response = await fetch('/api/ai-chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: newMessages, pat, temperature, maxTokens })
    })

    const data = await response.json()
    const finalMessages = [...newMessages, {
      role: 'assistant',
      content: data.content
    }]
    setMessages(finalMessages)

    // Track usage
    let newSessionUsage = sessionUsage
    if (data.usage) {
      newSessionUsage = {
        messages: sessionUsage.messages + 1,
        tokens: sessionUsage.tokens + data.usage.total_tokens,
        cost: sessionUsage.cost + parseFloat(data.usage.estimated_cost)
      }
      setSessionUsage(newSessionUsage)
      updateStatsInFirebase(data.usage)
    }

    // Save chat to Firebase
    const chatData = {
      id: chatId,
      title: generateChatTitle(finalMessages),
      messages: finalMessages,
      usage: newSessionUsage,
      createdAt: chats[chatId]?.createdAt || Date.now(),
      updatedAt: Date.now()
    }
    saveChatToFirebase(chatId, chatData)
  } catch (err) {
    setError(err.message)
  }
}

Firebase Integration & Chat History

Firebase Realtime Database stores all chat conversations and usage statistics, keyed by the SHA-256 hash of your PAT. When you enter your PAT, the app loads your complete chat history from Firebase. Every message is automatically saved, so you can close the browser and come back later to continue any conversation.

The database structure is simple: chats are stored under ai-assistant/sessions/{patHash}/chats/{chatId} with messages, usage stats, and timestamps. Aggregate usage statistics are stored separately under ai-assistant/stats/{patHash}.

Chat History Sidebar:

A separate AiChatHistory component displays all previous conversations in a right sidebar. It lists all chats sorted by most recently updated and shows the chat title (first 40 characters of the first user message). Dates use relative formatting like "Today", "Yesterday", or "X days ago". The currently active chat gets highlighted, each chat has a delete button, and there's a new chat button in the sidebar header.


Configurable AI Parameters

The UI includes sliders in the settings panel to adjust Claude's behavior.

The temperature slider (0 to 1, step 0.1) controls response randomness and creativity. It defaults to 0.3 for consistent, focused technical answers. Lower values between 0 and 0.3 give you more deterministic responses, which works better for kubectl commands and YAML configs. Higher values between 0.7 and 1.0 are more creative and useful for brainstorming solutions.

The max tokens slider (256 to 4096, step 256) controls maximum response length. The default is 2048 tokens, roughly 1500 words. Lower values give you shorter, more concise answers. Higher values let Claude write longer, more detailed explanations.

The context window slider (10 to 50 messages, step 5) controls how many recent messages are sent to the API. It defaults to 20 messages. Lower values like 10 or 15 reduce context but cost less and respond faster. Higher values like 30 or 50 give more context for complex troubleshooting sessions. The full conversation history is always preserved locally and in Firebase regardless of this setting.

All settings persist in localStorage and are sent with each API request. The backend validates and clamps temperature and maxTokens to safe ranges before passing them to the Claude API.

UI Implementation:

<input
  type="range"
  min="0"
  max="1"
  step="0.1"
  value={temperature}
  onChange={(e) => setTemperature(parseFloat(e.target.value))}
  className="ai-chat-settings-slider"
/>

<input
  type="range"
  min="256"
  max="4096"
  step="256"
  value={maxTokens}
  onChange={(e) => setMaxTokens(parseInt(e.target.value))}
  className="ai-chat-settings-slider"
/>

<input
  type="range"
  min="10"
  max="50"
  step="5"
  value={contextWindow}
  onChange={(e) => setContextWindow(parseInt(e.target.value))}
  className="ai-chat-settings-slider"
/>

Sliding Window Context Management

The assistant implements a sliding window to prevent conversations from becoming too expensive as they grow longer. Without this, each new message would include the entire conversation history, causing token usage to compound exponentially.

How it works:

// Apply sliding window: only send last N messages to API
const messagesToSend = newMessages.length <= contextWindow
  ? newMessages
  : newMessages.slice(-contextWindow)

const response = await fetch('/api/ai-chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ messages: messagesToSend, pat, temperature, maxTokens })
})

The full conversation history remains stored locally and in Firebase, but only the most recent messages (configurable, default 20) are sent to the Claude API. This keeps costs constant per message since each API call uses roughly the same number of tokens regardless of conversation length. Responses come back faster because smaller context means quicker processing. It also prevents exponential growth where a 50-message conversation would send 1,275 messages total to the API (1+2+3...+50). With a 20-message window, it only sends 695 messages total. Your full conversation is never lost, it's just not all sent to the API each time.


Deployment Configuration

Environment Variables:

Local development uses .dev.vars:

ANTHROPIC_API_KEY=TOP_SECRET...
ASSISTANT_PAT=YOURPAT

Production uses Cloudflare secrets (set via dashboard or CLI).

Firebase Setup:

Create a Firebase project and enable Realtime Database. Head to the Firebase Console and create a new project or use an existing one. Enable Realtime Database in the "Build" section and set the database rules for public read/write since we're using PAT hashing for privacy:

{
  "rules": {
    ".read": true,
    ".write": true
  }
}

Add your Firebase config to src/lib/firebase.js:

const firebaseConfig = {
  apiKey: "YOUR_API_KEY",
  authDomain: "your-project.firebaseapp.com",
  databaseURL: "https://your-project-default-rtdb.firebaseio.com",
  projectId: "your-project"
}

The Firebase SDK is modular and tree-shakeable, adding only ~15-20KB gzipped to the bundle size.

wrangler.jsonc:

{
  "name": "blog",
  "compatibility_date": "2025-12-25",
  "pages_build_output_dir": "./public"
}

Build and Deploy:

# Build Gatsby site
npm run build

# Deploy to Cloudflare Pages
npx wrangler pages deploy public --project-name=blog

The Cloudflare Pages deployment automatically serves the Gatsby static site and makes the Pages Function available at /api/ai-chat.


Usage Statistics

The UI displays real-time usage information in multiple locations.

The footer (always visible after the first message) shows current chat session stats including message count, total tokens, and cost, plus all-time cumulative stats across all chats.

The settings panel displays all-time total usage statistics, the last message's token usage and cost, and an estimate of remaining messages in a $5 API balance.

In fullscreen mode, session stats appear in the header for quick reference.

Each chat conversation tracks its own usage in terms of messages, tokens, and cost. Total usage gets aggregated across all chats and persisted in Firebase. Token counts come from the Claude API response, so they're exact rather than estimates. Cost calculation uses Claude 3 Haiku pricing at $0.25 per million input tokens and $1.25 per million output tokens.

When switching between chats, the session usage stats update to reflect that specific conversation's usage.


Cost Analysis

Based on initial testing, a typical question/answer exchange uses approximately 500-1500 tokens total, costing between $0.0005 and $0.002 per message. A $5 prepaid balance on the Claude API should provide several hundred interactions depending on response length.

Sliding window prevents cost escalation:

Without the sliding window, costs would compound exponentially as conversations grow. Message 1 uses roughly 500 tokens. Message 2 jumps to 1000 tokens because it includes the previous message. By message 10 you're at 5000 tokens, message 20 hits 10,000 tokens, and message 50 balloons to 25,000 tokens per API call.

With the default 20-message sliding window, messages 1 through 20 grow naturally up to around 10,000 tokens. But message 21 and beyond stay constant at roughly 10,000 tokens per API call. This means a 50-message conversation costs roughly the same as two 25-message conversations instead of exponentially more. The sliding window is the difference between spending $0.50 on a long troubleshooting session versus $5 or more.

I went with Claude 3 Haiku (claude-3-haiku-20240307) for the model. I originally planned to use Claude 3.5 Sonnet, but it's not available on the prepaid API tier. Haiku turned out to be sufficient for Kubernetes and Docker questions at a lower cost, and the response times are faster than Sonnet anyway.

There's no rate limiting implemented currently. PAT access control is sufficient for personal use, though I could add rate limiting via Cloudflare Workers KV or Durable Objects if this ever needs to support more users.


Technical Decisions

Why Cloudflare Pages Functions instead of direct API calls:

Going through a backend keeps the API key secure, enables PAT validation, and allows injection of the system prompt without exposing it client-side. Plus it's serverless, so there's no infrastructure to manage.

Why PAT instead of OAuth:

This is a personal use case with no multi-user requirements. A simple PAT implementation means no additional dependencies or auth providers to deal with.

Why Firebase Realtime Database for persistence:

The free tier is generous at 1GB storage and 10GB per month bandwidth. Firebase has real-time sync capabilities that aren't used yet but are there if needed. The key-value structure is perfect for chat storage with no server or database management required. The modular SDK only adds about 15-20KB gzipped to the bundle. I considered Cloudflare KV or Durable Objects but they're more complex for this use case.

Why hash PAT before using as database key:

SHA-256 is a one-way hash, so the PAT can't be recovered from the database even if it gets compromised. Each user's data is isolated by their unique hash without needing an additional authentication or encryption layer.

Why system prompt for scope enforcement:

Claude 3 Haiku is excellent at following instructions. No complex filtering logic needed. It works reliably across the entire conversation context as long as you give it a specific topic list rather than general categories.

Why Claude 3 Haiku instead of Sonnet:

Already covered this above, but the short version is Sonnet isn't available on prepaid, Haiku is sufficient for the use case, and it's faster and cheaper. Match the model capability to your task requirements.

Why sliding window for context management:

The sliding window prevents exponential token growth in long conversations and maintains constant cost per message after reaching the window size. I considered sending the full conversation (simple but expensive for long sessions), conversation compacting or summarization (complex and loses detail), or manual context reset (annoying user experience). The sliding window balances cost, performance, and UX without requiring manual intervention. Users can adjust the window size from 10 to 50 messages based on their needs, and the full history is still preserved for reference and chat switching.


What's Next

Future improvements could include conversation search and filtering, exporting chat history to markdown or JSON, code syntax highlighting in responses, and sharing individual conversations via URL.


Conclusion

Building this took maybe 4-5 hours spread across a weekend for the initial version, plus another few hours to add Firebase persistence and multi-chat support. The result is a specialized AI assistant that lives at /tools/ai on this blog, answers DevOps questions without distraction, remembers all your conversations, tracks its own costs, and feels like a native part of my workflow.

The architecture is elegantly simple: a static React frontend, a serverless Cloudflare Function as a thin proxy, Claude's API doing the heavy lifting, and Firebase Realtime Database storing chat history. No authentication servers, no complex infrastructure, no container orchestration. Just four components working together.

What surprised me most was how well the system prompt enforcement works. I expected to need keyword filtering or some kind of ML-based topic classification. Nope, just clear instructions in natural language, and Claude 3 Haiku follows them reliably. The Firebase integration was similarly straightforward. Hash the PAT for privacy, save chats on each message, load them on mount. Done.

The persistent chat history turned out to be more valuable than I expected. Being able to revisit past troubleshooting sessions or reference previous kubectl commands makes the tool feel less like a chatbot and more like a personal knowledge base that happens to answer questions.

If you're building something similar, here's my advice: start simple, match the model to the task, and don't over-engineer. Add features iteratively based on actual usage. The best tools are the ones that disappear, letting you focus on the actual problem you're trying to solve.

Enjoyed this post? Give it a clap!

SeriesPlatform AI
Part 1 of 4

Comments