January 11, 2026·5 min read
I wanted a specialized AI assistant that could answer my DevOps questions without the distraction of general-purpose chatbots. The goal was simple: build a focused tool that knows Kubernetes, Docker, AKS, Azure CLI, and Istio inside and out, and politely declines everything else.
What started as a weekend project turned into a surprisingly elegant solution using Claude's API, React, and Cloudflare Pages Functions. The result is a clean, minimal chat interface that feels like Claude in VS Code. No fluff, just helpful answers to infrastructure questions.
The assistant lives at /tools/ai on this blog and includes real-time usage tracking, fullscreen mode for deep-dive sessions, and a visual design that stays out of your way while you work through deployment issues or troubleshoot pod crashes.
Architecture Overview
Frontend: Gatsby static site with React chat component
Backend: Cloudflare Pages Function (serverless)
AI Model: Claude 3 Haiku via Anthropic API
Authentication: Simple PAT validation (environment variable match)
Scope Control: System prompt enforces topic restrictions
The backend acts as a thin proxy between the frontend and Claude API, validating the PAT and injecting the system prompt that restricts Claude's responses to only Kubernetes, Docker, AKS, Azure CLI, and Istio topics.
Backend Implementation
Cloudflare Pages Function
The backend is a single file at functions/api/ai-chat.js. Cloudflare Pages automatically discovers functions in the /functions directory and maps them to routes.
Key responsibilities:
- Validate PAT against environment variable
- Forward messages to Claude API with system prompt
- Extract usage statistics from API response
- Calculate cost estimates based on token usage
- Return response with usage metadata
Authentication:
const { messages, pat } = await request.json();
if (!pat || pat !== env.ASSISTANT_PAT) {
return new Response(JSON.stringify({ error: 'Invalid access token' }), {
status: 401,
headers: { ...corsHeaders, 'Content-Type': 'application/json' }
});
}Scope Enforcement System Prompt:
const systemPrompt = `You are a specialized DevOps assistant focused exclusively on Kubernetes, Docker, AKS (Azure Kubernetes Service), Azure CLI (az), and Istio.
STRICT RULES:
1. ONLY answer questions about:
- Kubernetes (kubectl, resources, YAML, concepts, troubleshooting)
- Docker (Dockerfile, images, containers, compose, build)
- AKS (Azure Kubernetes Service)
- Azure CLI (az commands for AKS, ACR, and related services)
- Istio (service mesh, VirtualServices, DestinationRules, traffic management)
2. For ANY other topics, respond with:
"I can only help with Kubernetes, Docker, AKS, Azure CLI, and Istio questions. Please ask about those topics."
3. When answering in-scope questions:
- Provide practical, actionable answers
- Include relevant command examples in code blocks
- Be concise but thorough
- Explain WHY when helpful (not just HOW)`;Claude API Call:
const anthropicResponse = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': env.ANTHROPIC_API_KEY,
'anthropic-version': '2023-06-01'
},
body: JSON.stringify({
model: 'claude-3-haiku-20240307',
max_tokens: 2048,
system: systemPrompt,
messages: messages
})
});Usage Tracking:
const usage = data.usage; // { input_tokens, output_tokens }
const inputCost = (usage.input_tokens / 1000000) * 3;
const outputCost = (usage.output_tokens / 1000000) * 15;
const totalCost = inputCost + outputCost;
return new Response(JSON.stringify({
content,
usage: {
input_tokens: usage.input_tokens,
output_tokens: usage.output_tokens,
total_tokens: usage.input_tokens + usage.output_tokens,
estimated_cost: totalCost.toFixed(4)
}
}));Frontend Implementation
React Chat Component
The chat interface lives in src/components/tools/AiAssistant.js with ~370 lines of React code managing state, API calls, and UI rendering.
State Management:
const [pat, setPat] = useState(() =>
localStorage.getItem('ai_assistant_pat') || ''
)
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [loading, setLoading] = useState(false)
const [sessionUsage, setSessionUsage] = useState({
messages: 0, tokens: 0, cost: 0
})
const [totalUsage, setTotalUsage] = useState(() => {
const saved = localStorage.getItem('ai_assistant_total_usage')
return saved ? JSON.parse(saved) : { messages: 0, tokens: 0, cost: 0 }
})The PAT and total usage statistics persist in localStorage. Session usage resets on page reload.
API Call Flow:
const sendMessage = async () => {
const newMessages = [...messages, { role: 'user', content: input }]
setMessages(newMessages)
setLoading(true)
try {
const response = await fetch('/api/ai-chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: newMessages, pat })
})
const data = await response.json()
setMessages([...newMessages, {
role: 'assistant',
content: data.content
}])
// Track usage
if (data.usage) {
setSessionUsage(prev => ({
messages: prev.messages + 1,
tokens: prev.tokens + data.usage.total_tokens,
cost: prev.cost + parseFloat(data.usage.estimated_cost)
}))
setTotalUsage(prev => ({
messages: prev.messages + 1,
tokens: prev.tokens + data.usage.total_tokens,
cost: prev.cost + parseFloat(data.usage.estimated_cost)
}))
}
} catch (err) {
setError(err.message)
}
}Deployment Configuration
Environment Variables:
Local development uses .dev.vars:
ANTHROPIC_API_KEY=TOP_SECRET...
ASSISTANT_PAT=YOURPATProduction uses Cloudflare secrets (set via dashboard or CLI).
wrangler.jsonc:
{
"name": "blog",
"compatibility_date": "2025-12-25",
"pages_build_output_dir": "./public"
}Build and Deploy:
# Build Gatsby site
npm run build
# Deploy to Cloudflare Pages
npx wrangler pages deploy public --project-name=blogThe Cloudflare Pages deployment automatically serves the Gatsby static site and makes the Pages Function available at /api/ai-chat.
Usage Statistics
The UI displays real-time usage information in a footer below the input area:
Session stats:
- Message count
- Total tokens
- Total cost for current session
All-time stats:
- Total messages sent
- Total tokens used
- Total cost across all sessions
Implementation:
- Session usage resets on page reload
- Total usage persists in localStorage
- Token counts from Claude API response (exact, not estimates)
- Cost calculation uses Claude 3 Haiku pricing:
- Input: $0.25 per million tokens
- Output: $1.25 per million tokens
Footer appears after first message, showing both session and cumulative statistics.
Cost Analysis
Based on initial testing, a typical question/answer exchange uses approximately 500-1500 tokens total, costing between $0.0005 and $0.002 per message. A $5 prepaid balance on the Claude API should provide several hundred interactions depending on response length.
Model selection: Claude 3 Haiku (claude-3-haiku-20240307)
- Originally planned Claude 3.5 Sonnet, but not available on prepaid API tier
- Haiku sufficient for Kubernetes/Docker questions at lower cost
- Faster response times than Sonnet
Rate limiting: None implemented currently
- PAT access control sufficient for personal use
- Could add rate limiting via Cloudflare Workers KV or Durable Objects for broader use
Technical Decisions
Why Cloudflare Pages Functions instead of direct API calls:
- Keeps API key secure on backend
- Enables PAT validation
- Allows injection of system prompt without client-side visibility
- Serverless, no infrastructure to manage
Why PAT instead of OAuth:
- Personal use case, no multi-user requirements
- Simple implementation
- No additional dependencies or auth providers needed
Why system prompt for scope enforcement:
- Claude 3 Haiku excellent at following instructions
- No complex filtering logic needed
- Works reliably across conversation context
- Requires specific topic list rather than general categories
Why Claude 3 Haiku instead of Sonnet:
- Claude 3.5 Sonnet not available on prepaid API tier
- Haiku sufficient for DevOps Q&A use case
- Faster responses, lower cost
- Match model capability to task requirements
What's Next
Future improvements could include:
- Custom temperature/max_tokens settings
Conclusion
Building this took maybe 4-5 hours spread across a weekend, including the blog post you're reading now. The result is a specialized AI assistant that lives at /tools/ai on this blog, answers DevOps questions without distraction, tracks its own costs, and feels like a native part of my workflow.
The architecture is almost boring in its simplicity: a static React frontend, a serverless Cloudflare Function as a thin proxy, and Claude's API doing the heavy lifting. No databases, no authentication servers, no complex infrastructure. Just three components working together.
What surprised me most was how well the system prompt enforcement works. I expected to need keyword filtering or some kind of ML-based topic classification. Nope—just clear instructions in natural language, and Claude 3 Haiku follows them reliably.
If you're building something similar, here's my advice: start simple, match the model to the task, and don't over-engineer. The best tools are the ones that disappear, letting you focus on the actual problem you're trying to solve.
Enjoyed this post? Give it a clap!
Comments