December 22, 2025·10 min read
This is the first in a multi-part series documenting the actual implementation of a golden path for microservices deployment. Not the theory, but the real commands, the actual errors, and what it takes to go from empty cluster to working self-service platform.
There's no shortage of conference talks about platform engineering. Plenty of blog posts explaining why golden paths matter. What's harder to find is someone walking through the actual implementation: the YAML files, the permission errors, the "why isn't this working" moments that don't make it into the polished demos.
That's what this series is. By the end of today's work: a developer fills in 4 fields in Backstage, and 5 minutes later has a running microservice deployed to Kubernetes with health checks responding.
Here's how we built it.
What We're Building
A self-service platform where developers can deploy microservices without:
- Manually creating Kubernetes namespaces
- Configuring CI/CD pipelines
- Setting up container registries
- Writing ArgoCD applications
- Requesting infrastructure tickets
The goal: minimize the distance between "I want to deploy a service" and "my service is running in production."
Phase 1: Shared Development Cluster Foundation
The Problem
Every team wants their own Kubernetes cluster. Understandable: clean isolation, no noisy neighbors, full control.
It's also expensive, operationally complex, and usually overkill for development workloads.
The alternative: one shared development cluster with proper namespace isolation. Teams get their own space, resource limits prevent noisy neighbor problems, and operations stays sane. The tradeoff is that you need to actually implement the isolation properly, which is what this section covers.
The Infrastructure Stack
Cluster:
- Azure Kubernetes Service (AKS)
- Named
aks-shared-dev - Single cluster for all development teams
Namespace Management:
- Crossplane for declarative namespace provisioning
- Custom Resource Definition:
XNamespaceClaim - Teams request namespaces, Crossplane creates them with isolation
Isolation Mechanisms:
# Resource quotas per team namespace
resourceQuota:
requestsCpu: "2"
requestsMemory: "4Gi"
limitsCpu: "4"
limitsMemory: "8Gi"
pods: 50
# Default limits for containers
limitRange:
cpuRequest: "100m"
cpuLimit: "500m"
memoryRequest: "128Mi"
memoryLimit: "512Mi"
# Network policies enabled
enableNetworkPolicy: trueEach team gets a namespace with hard resource limits. No single team can consume the entire cluster. This sounds obvious, but without these limits, one team's runaway process or misconfigured HPA can starve everyone else.
Why Crossplane Instead of kubectl?
We could just run kubectl create namespace red-dev and be done. It's one command. Why introduce another abstraction layer?
I had this debate with myself for longer than I'd like to admit. The answer comes down to what happens after day one. Creating a namespace is easy. Creating a namespace with the right resource quotas, limit ranges, network policies, RBAC bindings, and labels every single time, consistently, without forgetting anything? That's where things fall apart.
Instead, we use Crossplane to create a NamespaceClaim:
apiVersion: platform.chrishouse.io/v1alpha1
kind: NamespaceClaim
metadata:
name: red-dev
namespace: crossplane-system
labels:
team: red
environment: dev
created-by: backstage
spec:
parameters:
targetCluster: shared-dev-cluster
namespaceName: red-dev
enableResourceQuota: true
enableNetworkPolicy: true
enableLimitRange: trueWhy the extra abstraction?
- Consistency - Every namespace gets resource quotas, limit ranges, and network policies automatically
- GitOps - Namespace configuration is declarative and version-controlled
- Self-service - Backstage templates can create namespaces without cluster credentials
- Auditability - Clear record of who requested what and when
The Crossplane composition handles the actual Kubernetes API calls. Teams just declare what they need. The composition is essentially a contract: you give me a namespace name and a team label, I give you a fully configured namespace with all the guardrails in place.
Verification
After setting up the cluster and Crossplane:
$ kubectl get namespaceclaim -n crossplane-system
NAME SYNCED READY AGE
red-dev True True 15m
$ kubectl get namespace red-dev -o yaml
apiVersion: v1
kind: Namespace
metadata:
labels:
team: red
environment: dev
managed-by: crossplane
name: red-dev
$ kubectl get resourcequota -n red-dev
NAME AGE
red-dev-quota 15m
$ kubectl get limitrange -n red-dev
NAME AGE
red-dev-limits 15mNamespace exists. Quotas applied. Limits enforced. This might not look like much, but it's the foundation everything else builds on. Get this wrong, and you'll be debugging resource contention issues and namespace configuration drift for months.
Phase 1 complete.
Phase 2: Microservice Deployment Template
The Goal
Now we have a place to put things. The next question: how do developers actually get their code running there?
The developer experience should be:
- Click "Create Component" in Backstage
- Fill in: Service Name, Team, Description, Owner
- Wait 5 minutes
- Service is running
Everything else happens automatically. No terminal. No YAML editing. No waiting for someone in ops to process a ticket.
What "Everything Else" Means
That "everything else" is doing a lot of heavy lifting. Behind that 4-field form, the platform needs to:
- Create GitHub repository with proper structure
- Configure repository permissions for GHCR (GitHub Container Registry)
- Set up CI/CD pipeline to build and publish container images
- Create namespace in shared dev cluster (via Crossplane)
- Create ArgoCD application manifest
- Commit GitOps configuration to platform repo
- Let ArgoCD discover and deploy the service
That's seven separate systems that need to coordinate correctly. Any one of them failing silently would leave developers confused about why their service isn't running. This is where the abstraction of "fill in a form" starts looking a lot more complex than it sounds.
Let's break it down.
The Backstage Template
Core structure:
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: nodejs-microservice-quickstart
title: Node.js Microservice (Quick Start)
spec:
parameters:
- title: Service Information
properties:
service_name:
title: Service Name
type: string
team:
title: Team Name
type: string
description:
title: Description
type: string
owner:
title: Owner
type: stringFour input fields. That's it. Everything else is either defaulted or derived from these values. The temptation to add more fields is strong, but every additional input is friction. Resist it.
Template Steps
Step 1: Scaffold Repository
- id: fetch-template
name: Fetch Template
action: fetch:template
input:
url: ./skeleton
values:
serviceName: ${{ parameters.service_name }}
team: ${{ parameters.team }}
description: ${{ parameters.description }}Backstage copies the template skeleton and replaces variables.
The skeleton includes:
- Node.js application with Express
Dockerfilefor container builds- GitHub Actions workflow for CI/CD
- Helm chart for Kubernetes deployment
- Health check endpoints (
/health,/ready)
Step 2: Create GitHub Repository
- id: publish-github
name: Publish to GitHub
action: publish:github
input:
repoUrl: github.com?owner=crh225&repo=${{ parameters.service_name }}
defaultBranch: mainStep 3: Configure Repository Permissions (Critical Step)
This is where it gets interesting, and where I wasted more time than I'd like to admit.
GitHub Container Registry requires specific repository permissions. By default, new repositories have workflow permissions set to read. This seems fine until you try to push a container image. The build passes, the login succeeds, and then:
denied: permission_denied: write_packageThe error message is unhelpful. You'll search for PAT token issues, GHCR authentication problems, Docker login failures. None of that is the problem. The problem is a single checkbox in the repository settings that nobody told you about.
We could fix this manually for each repository, but that defeats the purpose of automation. Better: automate it with a custom Backstage action that configures permissions immediately after repository creation.
Custom action: armportal:github:configure-repo
export const configureGitHubRepoAction = (options?: { token?: string }) => {
return createTemplateAction({
id: 'armportal:github:configure-repo',
async handler(ctx) {
const { repoUrl, token } = ctx.input;
const [, owner, repo] = repoUrl.match(/github\.com\?.*owner=([^&]+).*repo=([^&]+)/);
const octokit = new Octokit({ auth: token });
// Set workflow permissions to 'write'
await octokit.request('PUT /repos/{owner}/{repo}/actions/permissions/workflow', {
owner,
repo,
default_workflow_permissions: 'write',
can_approve_pull_request_reviews: true,
});
// Enable GitHub Actions
await octokit.request('PUT /repos/{owner}/{repo}/actions/permissions', {
owner,
repo,
enabled: true,
allowed_actions: 'all',
});
},
});
};Used in template:
- id: configure-repo
name: Configure Repository Permissions
action: armportal:github:configure-repo
input:
repoUrl: github.com?owner=crh225&repo=${{ parameters.service_name }}Now GHCR publishing works automatically. Every new repository gets the right permissions from the start. No manual intervention, no mysterious failures on the first build.
Step 4: Create GitOps Configuration
- id: create-gitops-pr
name: Create GitOps PR
action: armportal:create-pr
input:
repoUrl: github.com?owner=crh225&repo=ARMServicePortal
branch: add-${{ parameters.service_name }}
title: "Add ${{ parameters.service_name }} to platform"
files:
- path: infra/crossplane/claims/aks-shared-dev/${{ parameters.team }}-dev/namespace-claim.yaml
content: ${{ steps['prepare-gitops'].output.namespaceClaim }}
- path: infra/quickstart-services/${{ parameters.service_name }}/argocd-application.yaml
content: ${{ steps['prepare-gitops'].output.argoApplication }}This creates a pull request in the platform repository with:
- Namespace claim for Crossplane
- ArgoCD application manifest
Merge the PR. GitOps takes over. This separation matters: the template creates the intent (a PR), but a human approves the actual deployment. You can make this automatic if you want, but having that gate gives teams a moment to review what's about to happen.
The CI/CD Pipeline
The template generates a complete GitHub Actions workflow for each new service. Nothing fancy here, just the standard build-and-push pattern:
name: Build and Push to GHCR
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v3
- name: Convert repository name to lowercase
id: repo
run: echo "repository=${{ github.repository }}" | tr '[:upper:]' '[:lower:]' >> $GITHUB_OUTPUT
- name: Log in to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ghcr.io/${{ steps.repo.outputs.repository }}:latestKey detail: lowercase repository name. GHCR requires it, and GitHub repository names can have uppercase letters. This mismatch will cause failures if you forget to normalize. The workflow handles it with that tr command, but I learned this the hard way after debugging "image not found" errors for an embarrassingly long time.
ArgoCD Service Discovery
Problem: Every time we create a new service, we'd need to manually register it with ArgoCD. That's exactly the kind of manual step that makes developers go around the platform.
Solution: App-of-Apps pattern. ArgoCD watches a directory. When new files appear, it creates applications automatically.
Created infra/argocd/apps/quickstart-services-app.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: quickstart-services
namespace: argocd
spec:
project: applications
source:
repoURL: https://github.com/crh225/ARMServicePortal.git
targetRevision: main
path: infra/quickstart-services
directory:
recurse: true
include: '*/argocd-application.yaml'
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: trueArgoCD watches infra/quickstart-services/. When it finds a new argocd-application.yaml, it creates the application automatically. The template commits the application manifest, and within a few minutes ArgoCD notices the new file and deploys the service.
No manual ArgoCD registration required. No "please add my app to ArgoCD" tickets.
The First Error: ServiceMonitor CRD Missing
Everything was working. Repository created, permissions configured, image built and pushed. ArgoCD picked up the application manifest and started syncing.
Then it failed. ArgoCD showed:
OutOfSync
Missing
Error: failed to discover server resources for group version monitoring.coreos.com/v1:
the server could not find the requested resourceThe template included a ServiceMonitor resource for Prometheus metrics, because of course you want metrics for your services. But Prometheus Operator wasn't installed on the cluster yet. The CRD didn't exist. ArgoCD couldn't apply something that the cluster didn't understand.
This is a common trap with golden path templates: including "best practices" resources that depend on infrastructure that doesn't exist yet. The fix is simple, but the lesson is important. Don't put aspirational resources in your default template. :D
Fix: Disable ServiceMonitor by default in template.
# helm/values.yaml
serviceMonitor:
enabled: false # Set to true after installing Prometheus OperatorRe-sync. Success. Enable it later when Prometheus is actually running.
Verification: pricing-api Deployment
Time to test the whole pipeline end-to-end. I created pricing-api via the Backstage template, filled in the four fields, and waited:
$ kubectl get pods -n red-dev
NAME READY STATUS RESTARTS AGE
pricing-api-6f4b8d9c7-8xk2p 1/1 Running 0 2m
pricing-api-6f4b8d9c7-m7n4q 1/1 Running 0 2m
$ kubectl port-forward -n red-dev svc/pricing-api 8080:80
Forwarding from 127.0.0.1:8080 -> 3000
$ curl http://localhost:8080/health
{"status":"healthy","timestamp":"2025-12-22T15:30:00.000Z"}Two pods running. Health checks responding. The entire chain worked: Backstage created the repo, configured GHCR permissions, GitHub Actions built and pushed the image, the GitOps PR got merged, ArgoCD deployed the application.
Time from Backstage form submission to running service: 4 minutes, 37 seconds.
That number matters. If this took 30 minutes, developers would start looking for shortcuts. If it took 2 days waiting for approvals, they'd definitely route around the platform. Under 5 minutes is fast enough that the golden path becomes the path of least resistance.
Phase 2 complete.
What Actually Happened
Starting from an empty shared development cluster, we built:
Infrastructure:
- Crossplane namespace provisioning with resource quotas
- Shared AKS cluster for all development teams
- Namespace isolation with limits and network policies
Automation:
- Backstage template for Node.js microservices
- Custom action for automatic GHCR permissions
- CI/CD pipeline for container builds
- ArgoCD App-of-Apps for service discovery
- GitOps workflow for all deployments
Developer Experience:
- 4 input fields in Backstage
- ~5 minutes to running service
- No manual configuration required
- No cluster credentials needed
- No infrastructure tickets
What worked:
- End-to-end automation from form to deployment
- Automatic GHCR configuration
- Service discovery via App-of-Apps pattern
- Resource isolation in shared cluster
What didn't (initially):
- ServiceMonitor CRD dependency
- Repository name case sensitivity in GHCR
- ArgoCD not watching quickstart-services directory
All fixed. System working. Each of these issues took time to debug, but once fixed, they stay fixed. Every developer who uses the template after this benefits from the lessons learned.
What's Next
This gets us from zero to deployed service. A developer can fill out a form and have a running microservice in under 5 minutes. But we're missing some important pieces:
Networking:
- External ingress (currently ClusterIP only)
- Istio service mesh for mTLS
- DNS configuration
Developer Workflow:
- Preview environments per pull request
- Automated testing in pipelines
- Rollback mechanisms
Security:
- Policy enforcement (OPA/Kyverno)
- Secret management (External Secrets Operator)
- Image scanning
Those are the next phases. Each one adds complexity, and each one needs to be optional until it's required.
The principle remains: make the golden path faster than the alternative.
If creating a service this way takes 5 minutes, and the manual alternative takes days of tickets and approvals, developers will use the platform.
If the platform becomes slower or more restrictive than doing it manually, they'll route around it.
That's the balance we're building toward.
Implementation Repository
Full implementation available at: github.com/crh225/ARMServicePortal
Key files referenced:
- Backstage template: backstage/templates/nodejs-quickstart/template.yaml
- Custom GitHub action: backstage/plugins/arm-portal-backend/src/scaffolder/actions/configureGitHubRepo.ts
- App-of-Apps: infra/argocd/apps/quickstart-services-app.yaml
- Crossplane composition: infra/crossplane/platform/namespace-composition.yaml
Next in series: Part 2 will cover preview environments: automatically creating ephemeral deployments for every pull request, with DNS, TLS certificates, and automatic cleanup.
Enjoyed this post? Give it a clap!
- 1Platform Engineering Golden Paths: Common Patterns
- 2Building Golden Paths with Backstage: Part 1 - FoundationReading
- 3Building Golden Paths with Backstage: Part 2 - Preview Environments
Comments