February 01, 2026·8 min read
This is Part 4 of the Platform AI series. Part 1 covers building the basic chatbot. Part 2 adds agentic tool use. Part 3 introduces the multi-agent system.
The AI assistant from the first three parts of this series has grown into a real system. Six specialized agents, a hallucination guard with four verification layers, 81 end-to-end tests, tool retry logic, parallel execution, budget tracking, and a self-evaluation pipeline. Making changes to it now requires understanding how all those pieces fit together.
I found myself repeating the same workflow every time I needed to fix a bug or add a feature. Export a chat session from the assistant to see what went wrong. Read through the codebase to find the relevant files. Make changes. Run the e2e suite. Check for regressions. That process is exactly the kind of structured, multi-step workflow that AI excels at.
So I built a Claude Code plugin to do it for me.
Live: https://chrishouse.io/tools/ai
What Is a Claude Code Skill?
Claude Code is Anthropic's CLI tool for using Claude directly in your terminal. It reads your codebase, edits files, runs commands, and understands project context. But out of the box, it's general-purpose. It doesn't know that my project has a specific agent architecture, that tests live in e2e/, or that every async function needs try/catch with bracket-prefixed logging.
Skills (also called slash commands) let you inject domain-specific expertise into Claude Code. A skill is a markdown file that acts as a system prompt, activated by typing /skill-name in the CLI. When you invoke a skill, Claude Code loads that markdown as additional context, effectively turning a general-purpose AI into a specialist for your project.
The skill lives at .claude/commands/ai-dev-studio.md in the repo. There's also a plugin registration that tells Claude Code this skill exists:
{
"name": "ai-dev-studio",
"version": "1.0.0",
"description": "AI Development Studio - Multi-agent system for building and improving the AI DevOps assistant",
"author": {
"name": "Chris House",
"email": "[email protected]"
}
}The AI Dev Studio
The skill is called /ai-dev-studio and it transforms Claude Code into a six-agent development team. Each agent has a specific role in the development lifecycle:
User Request
│
▼
┌─────────────┐
│ Classify │ ── New Feature? Bug Fix? Improvement? Review?
└─────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Agent Pipeline │
│ │
│ ARCHITECT → DEVELOPER → QA → REVIEWER → PRODUCT│
│ │
│ Bug fix: DEBUGGER → DEVELOPER → QA │
│ Improvement: REVIEWER → DEVELOPER → QA │
└─────────────────────────────────────────────────┘When I type /ai-dev-studio and paste a chat session export or describe a bug, the skill classifies the request and activates the right agents in sequence.
The Six Agents
Architect
Architect handles design work. When I ask it to add a new feature, it reads the existing codebase first, studies how similar features are built, then produces a blueprint with specific file paths, line numbers, data flows, and integration points. It knows about the BaseAgent pattern, the agent orchestrator, the tool definition conventions, and the module structure.
Developer
Developer writes the actual code. It follows the existing conventions exactly because the skill prompt includes specific code style examples from the codebase. Named functions instead of arrow functions. Try/catch with bracket-prefixed console logs like [ModuleName]. Zod schemas for tool inputs. The skill enforces these patterns so the output matches the rest of the codebase without manual cleanup.
QA
QA writes Playwright e2e tests and runs them. The skill includes the exact test fixture pattern from e2e/fixtures.js so QA knows to call clearBrowserState() first, use aiAssistant.sendMessage() with appropriate timeouts, and check for both positive results and error states. After writing tests, it runs them and reports results.
Debugger
Debugger is the investigator. When a test fails or a session export shows unexpected behavior, Debugger reads the error, traces the execution path through the code, forms hypotheses ranked by confidence, and proposes minimal fixes. It uses a structured format with root cause analysis, evidence, and validation steps.
Reviewer
Reviewer checks for code quality, security, DRY violations, and convention adherence. It produces prioritized findings: P0 (must fix), P1 (should fix), P2 (nice to have). The skill includes specific checklists for pattern adherence, error handling, security (input validation, injection risks), and code quality.
Product
Product is the final quality gate. It reviews user-facing elements like error messages, success states, and response formatting against what the skill calls "Apple-level quality standards." The idea is that every user-visible string should be clear, actionable, and helpful rather than generic.
The Session Export Workflow
The most powerful pattern is feeding the plugin exported chat sessions from the AI assistant itself. The assistant's frontend has an export button that dumps the full session as JSON, including every message, tool call, tool result, agent routing decision, thinking blocks, and warnings.
When I paste that export into /ai-dev-studio, the Debugger agent can trace exactly what happened:
User asked: "check my clusters status"
→ Triage agent selected (confidence: 0.70)
→ get_cluster_status called → ERROR (4 times)
→ Model generated response with cluster data table
→ No fabrication warning shown
Problem: All tools failed but the model fabricated dataThis is how I found and fixed the fabrication detector issues. The session showed that get_cluster_status failed four times with timeouts, but the assistant still presented a complete table with cluster names, regions, versions, and node counts. All fabricated. The Debugger agent traced the issue through the hallucination guard's four verification layers, identified why the detection missed it (markdown tables aren't code blocks, so the config-pattern matching didn't trigger), and proposed the exact fix.
In a later session, the IaC agent gave legitimate YAML suggestions for improving ArgoCD configurations, but the fabrication detector flagged them as hallucinations because no tools were called. The Debugger traced that false positive to the processHallucinationGuard function where the logic assumed "no tools + YAML = fabrication" without considering that advisory agents work from pre-loaded context.
Both fixes, across three files, came from the same workflow: export session, paste into /ai-dev-studio, let the agents trace the problem.
How It Knows the Codebase
The skill prompt is 600 lines of markdown that encodes the project's architecture, conventions, and patterns. It includes the agent capability system:
const AgentCapability = {
READ_ONLY: 'read_only', // Triage, Advisor
SUGGEST: 'suggest', // Suggestions requiring confirmation
EXECUTE: 'execute', // Operator - confirmation required
AUTONOMOUS: 'autonomous' // Rare
};The BaseAgent class pattern showing how to define tools, canHandle, system prompts, and handoff logic. The DynamicStructuredTool pattern for defining new tools with Zod schemas. The e2e test fixture pattern. And the meta-tool pattern for composite operations like diagnose_pod_crash.
This context means Claude Code doesn't have to rediscover these patterns every time. When the Developer agent writes a new agent, it follows the exact pattern from the skill. When QA writes tests, it uses the exact fixture API.
Request Classification
The skill routes requests to different agent pipelines based on keywords:
| Request Type | Keywords | Pipeline |
|---|---|---|
| New Feature | "add", "create", "build" | Architect, Developer, QA, Reviewer, Product |
| Bug Fix | "fix", "broken", "error" | Debugger, Developer, QA |
| Improvement | "improve", "optimize", "refactor" | Reviewer, Developer, QA |
| Test Coverage | "test", "e2e", "playwright" | QA |
| Architecture | "design", "architect", "plan" | Architect |
| Code Review | "review", "check", "quality" | Reviewer, Product |
This mirrors how the AI assistant itself routes requests to its own agents. There's something recursive about an AI development tool that uses the same multi-agent routing pattern as the AI it's building.
Context Preservation Between Agents
When agents hand off to each other, the skill specifies a structured handoff format:
### Handoff: Debugger → Developer
**Work Completed**:
- Traced fabrication detection through 4 layers of hallucination guard
- Identified false positive trigger at response.js:135
**Key Findings**:
- Advisory agents (iac, advisor) legitimately produce YAML without tools
- The "no tools + config = fabrication" heuristic is too aggressive
**Artifacts**:
- Root cause analysis with file:line references
**Next Steps**:
- Add agent type exemption for advisory agents
- Broaden suggestion detection patterns in guard.jsThis means the Developer agent gets exactly what it needs to implement the fix without re-reading the entire codebase. The Debugger already did the investigation and narrowed down the specific lines that need to change.
Quality Gates
Each agent phase has a quality gate before handing off:
- Architect: Blueprint must be complete with specific file paths and line numbers
- Developer: Code must parse without errors
- QA: Tests must be valid and pass
- Debugger: Root cause must have supporting evidence
- Reviewer: All P0 issues must be addressed
- Product: User-facing strings must be clear and actionable
If a gate fails, the pipeline loops back. QA failure goes to Debugger for diagnosis, then back to Developer for the fix, then back to QA.
Real Example: Fixing the Fabrication Detector
Here's how a real bug fix flowed through the plugin. I exported a session where the IaC agent's YAML suggestions triggered a false fabrication warning, and pasted it into /ai-dev-studio.
Debugger analyzed the session export and traced the issue through three code paths:
response.js:135- The "no tools + suspicious content" check didn't account for advisory agentsguard.js:384-389- The suggestion detection only matched narrow K8s manifest patterns (apiVersion,kind), missing ArgoCD patterns likeretry:,syncOptions:,finalizers:- The all-tools-failed case where the model fabricated data presented in markdown tables rather than code blocks
Developer implemented three targeted fixes:
- Added
isAdvisoryAgentcheck inresponse.jsto exempt IaC and Advisor agents - Added
iacSuggestionPatternsarray inguard.jswith 14 ArgoCD/Helm-specific patterns - Added
toolCallsAttemptedcounter inai-chat.jsto detect when all tool calls failed but the response contains structured data
QA ran the full 81-test e2e suite. Result: 81 passed, 3 flaky (API rate limits), 1 failed (pre-existing wrangler network issue). No regressions from the changes.
Three files changed, 51 lines added, 7 removed. The entire flow from session export to validated fix happened in a single Claude Code conversation.
Agents Building Agents
The most interesting aspect of this setup is that it's agents all the way down. The AI assistant running at /tools/ai uses six specialized agents (Triage, Operator, Debugger, IaC, Advisor, Network) to handle user requests. The development tool that builds and maintains that assistant uses six different specialized agents (Architect, Developer, QA, Debugger, Reviewer, Product) coordinated through a Claude Code skill.
Both systems use the same core patterns: intent classification to route to the right specialist, structured handoffs to preserve context, capability levels to control what each agent can do, and quality gates to catch issues before they ship.
The difference is scope. The runtime agents work with Kubernetes clusters and infrastructure. The development agents work with the codebase that defines those runtime agents. But the architecture is the same.
Setting It Up
The plugin requires two files in your repo:
Plugin registration at .claude/plugins/ai-dev-studio/.claude-plugin/plugin.json:
{
"name": "ai-dev-studio",
"version": "1.0.0",
"description": "AI Development Studio for the AI DevOps assistant"
}Skill definition at .claude/commands/ai-dev-studio.md:
---
name: ai-dev-studio
description: Multi-agent AI Development Studio...
---
# AI Development Studio
You are the AI Development Studio...
[600 lines of agent specs, patterns, and workflows]Then in Claude Code, type /ai-dev-studio followed by your request. The skill activates and Claude Code becomes the development studio.
Conclusion
Building a development tool for an AI system using the same multi-agent patterns as the AI system itself felt like the natural evolution of this project. The AI assistant at /tools/ai manages Kubernetes infrastructure through specialized agents. The Claude Code plugin at /ai-dev-studio manages the assistant's codebase through specialized agents. Same architecture, different domain.
The practical value is real. Exporting a broken session from the assistant, pasting it into the plugin, and getting a traced root cause with a validated fix across multiple files in a single conversation is a workflow I now use regularly. The plugin knows the codebase patterns deeply enough that the Developer agent's output matches the existing code style without manual cleanup, and the QA agent runs the actual e2e suite to catch regressions before they ship.
The 600-line skill definition is just markdown describing how the codebase works and what quality standards to follow. The leverage you get from having an AI that understands your specific architecture, conventions, and testing patterns, compounds with every bug fix and feature you build through it.
Enjoyed this post? Give it a clap!
- 1Platform AI Part 1: Building a Personal AI Chatbot with Claude and Cloudflare
- 2Platform AI Part 2: Adding Agentic Tool Use for AKS Management
- 3Platform AI Part 3: Building a Multi-Agent System for DevOps
- 4Platform AI Part 4: Building a Claude Code Plugin to Develop the AI ItselfReading
Comments