How I Built Ephemeral PR Environments on AKS — hero banner

November 28, 2025·4 min read

"Can you deploy this to staging so I can test it?"

If you've heard this a hundred times, you know the pain. Shared staging environments become bottlenecks. Developers step on each other's changes. QA can't reproduce bugs because someone else deployed over the fix.

I wanted something better for my Cloud Self-Service Portal project: every pull request gets its own isolated environment, automatically, with zero manual intervention.

Here's how I built it.


The Architecture

When a PR is opened against my repo, GitHub Actions:

  1. Builds a Docker image tagged with the PR number
  2. Creates a dedicated Kubernetes namespace (armportal-pr-{PR_NUMBER})
  3. Deploys the app with its own ingress to AKS
  4. Posts the live URL back to the PR as a comment

When the PR is closed or merged? Everything gets cleaned up automatically.

PR #42 gets:

  • Namespace: armportal-pr-42
  • URL: https://portal-api-pr-42.pr.chrishouse.io
  • Its own secrets, configs, and TLS certificate

The Wildcard Certificate Trick

Here's where it gets interesting. You might think each PR environment needs its own TLS certificate. That would mean:

  • Waiting for Let's Encrypt rate limits
  • Managing dozens of certificates
  • Slower deployments

Instead, I use a single wildcard certificate for *.pr.chrishouse.io.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: pr-wildcard-tls
  namespace: cert-manager
spec:
  secretName: pr-wildcard-tls
  dnsNames:
    - "*.pr.chrishouse.io"
  issuerRef:
    name: letsencrypt-dns01
    kind: ClusterIssuer

One certificate, unlimited PR environments. PR #1 through PR #9999 all just work.

The catch? Wildcard certificates require DNS-01 validation (HTTP-01 can't verify wildcards). That's where Cloudflare comes in.


Cloudflare DNS-01 Challenge

Let's Encrypt needs to verify I own *.pr.chrishouse.io. With DNS-01, cert-manager automatically:

  1. Creates a TXT record in Cloudflare: _acme-challenge.pr.chrishouse.io
  2. Let's Encrypt verifies the record exists
  3. Certificate is issued
  4. TXT record is cleaned up
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns01
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - dns01:
        cloudflare:
          apiTokenSecretRef:
            name: cloudflare-api-token
            key: api-token

The Cloudflare API token lives in Azure Key Vault and gets synced to Kubernetes via a daily CronJob using the Secrets Store CSI Driver. If I rotate the token in Azure, Kubernetes picks it up automatically.


The Deployment Flow

When a PR is opened, here's what happens:

1. Build & Push

Docker image gets built and pushed to Azure Container Registry with a PR-specific tag:

- name: Build and push Docker image
  run: |
    docker build -t $ACR_REGISTRY/armportal-backend:pr-${{ github.event.pull_request.number }} .
    docker push $ACR_REGISTRY/armportal-backend:pr-${{ github.event.pull_request.number }}

2. Create Namespace & Copy Secrets

The clever bit: I copy production secrets to the PR namespace, stripping Kubernetes metadata with jq so the apply doesn't fail. Same pattern for the wildcard TLS certificate—copy it from cert-manager namespace to the PR namespace.

kubectl get secret backend-secrets -n armportal-backend -o json | \
  jq 'del(.metadata.resourceVersion, .metadata.uid, .metadata.creationTimestamp)' | \
  jq '.metadata.namespace = "armportal-pr-'$PR_NUMBER'"' | \
  kubectl apply -f -

3. Deploy with Templating

I use envsubst to inject PR-specific values into a deployment template:

export PR_NUMBER=${{ github.event.pull_request.number }}
envsubst < pr-deployment-template.yaml | kubectl apply -f -

4. Post the URL

GitHub Actions comments on the PR with the live environment URL and health check endpoint:

- name: Comment on PR
  uses: actions/github-script@v7
  with:
    script: |
      github.rest.issues.createComment({
        owner: context.repo.owner,
        repo: context.repo.repo,
        issue_number: context.issue.number,
        body: `## PR Environment Deployed!

        **API URL:** https://portal-api-pr-${prNumber}.pr.chrishouse.io
        **Health Check:** https://portal-api-pr-${prNumber}.pr.chrishouse.io/api/health`
      })

Auto-Cleanup: The Unsung Hero

The deployment is cool. The cleanup is what makes it sustainable.

When a PR is closed (merged or abandoned), GitHub Actions:

on:
  pull_request:
    types: [closed]

jobs:
  cleanup:
    steps:
      - name: Delete namespace
        run: kubectl delete namespace armportal-pr-${{ github.event.pull_request.number }}

      - name: Delete Docker image
        run: |
          az acr repository delete \
            --name $ACR_NAME \
            --image armportal-backend:pr-${{ github.event.pull_request.number }} \
            --yes
        continue-on-error: true
  • Deletes the Kubernetes namespace — cascades to everything inside (pods, services, ingress, secrets, configmaps)
  • Deletes the Docker image from ACR — with continue-on-error so failed builds don't break cleanup

One command, complete cleanup.


Cost Optimization

PR environments are intentionally lightweight:

Setting Value
Replicas 1 (no HA needed for testing)
CPU Request 100m
Memory Request 128Mi
Storage None (stateless by design)

The AKS cluster runs on a single B2s node (2 vCPU, 4GB RAM). With proper resource limits, I can run 10+ PR environments simultaneously without issues.


What I Learned

  1. Wildcard certs scale infinitely. One certificate handles unlimited PR environments without rate limits or per-deployment delays.

  2. DNS-01 > HTTP-01 for automation. No need to expose port 80 or deal with ingress routing during cert validation.

  3. Namespace-per-PR is the right abstraction. Kubernetes namespaces provide perfect isolation with cascading deletes built in.

  4. Copy secrets, don't recreate them. The jq metadata stripping pattern is simple but powerful.

  5. Post URLs to PRs. Discoverability matters. If developers can't find the environment, they won't use it.


The Result

Every PR now gets:

  • A live, isolated environment in ~2 minutes
  • HTTPS with valid certificates
  • Automatic cleanup when merged/closed
  • Zero manual intervention required

Reviewers can test changes in production-like conditions. QA can reproduce bugs on the exact commit. No more "works on my machine."

Enjoyed this post? Give it a clap!

Comments