December 23, 2025·15 min read
In Part 1, we built the foundation: a shared development cluster, namespace isolation, and a Backstage template that deploys a running service in under 5 minutes.
Today we add something developers actually love: preview environments.
If you've ever waited for a staging environment to free up so you could test your changes, or merged a PR only to find it broke something obvious that would have been caught with a quick manual test, you understand the problem. Shared staging environments create bottlenecks. Developers queue up behind each other, or worse, deploy over each other's changes and then spend time debugging phantom issues.
Preview environments solve this by giving every pull request its own isolated deployment with a unique URL. Code reviewers can click a link in the PR and see the actual running application. Not screenshots, not local recordings, the real thing. QA can test changes before they hit the main branch. Product managers can review features without asking developers to deploy something special for them.
The concept is simple: PR opens, environment spins up, PR closes, environment disappears. The implementation? That's where it gets interesting.
Here's what we built, and what broke along the way.
What We're Building
When a developer opens a pull request:
- GitHub Actions builds the PR branch into a container image
- A GitOps workflow creates an ephemeral namespace
- ArgoCD deploys the preview environment
- A unique URL is generated:
pricing-api-pr-1-red.chrishouse.io - The PR gets a comment with the preview link
- When the PR closes, everything is cleaned up automatically
The hard part isn't the workflow. It's the routing.
The Architecture Challenge
Here's where our design decisions from Part 1 created an interesting puzzle. We have a hub-spoke cluster topology:
- Hub Cluster: Handles all external ingress, runs ArgoCD, Crossplane, Backstage
- Dev Spoke Cluster: Runs application workloads, has no external ingress
This separation is intentional. The hub cluster is the control plane: it manages infrastructure, handles GitOps, and serves as the single entry point from the internet. The spoke cluster runs actual workloads, isolated from the management plane. It's a common pattern for enterprise Kubernetes deployments where you want to keep your cattle separate from your pets.
But preview environments run on the dev spoke. And DNS points to the hub.
Traffic flow needs to be:
Internet → Hub Istio Gateway → ??? → Dev Spoke → Preview PodThat middle part is the question mark we need to solve. How do you route traffic from one cluster to another when they're on different networks? The clusters can talk to each other through VNet peering, but Kubernetes services don't automatically span clusters. The hub's Istio gateway has no native way to forward traffic to a service running in a completely different cluster.
This is a solved problem in the Kubernetes ecosystem. There are several approaches, but each comes with tradeoffs. Let's walk through what we tried.
Phase 1: The Preview Workflow
Before tackling the routing problem, let's set up the workflow that will trigger everything. This part is relatively straightforward: a GitHub Actions workflow that fires on pull request events.
The workflow lives in the Backstage template skeleton, which means every service created through the golden path automatically gets preview environment support. Developers don't have to configure anything; it just works.
# .github/workflows/preview.yml
name: Preview Environment
on:
pull_request:
types: [opened, synchronize, reopened, closed]
env:
REGISTRY: ghcr.io
SERVICE_NAME: pricing-api
TEAM_NAME: red
GITOPS_REPO: crh225/ARMServicePortal
jobs:
deploy-preview:
if: github.event.action != 'closed'
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
pull-requests: write
steps:
- uses: actions/checkout@v3
- name: Set environment variables
id: vars
run: |
echo "pr_number=${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT
echo "namespace=${TEAM_NAME}-dev-pr-${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT
echo "hostname=${SERVICE_NAME}-pr-${{ github.event.pull_request.number }}-${TEAM_NAME}.chrishouse.io" >> $GITHUB_OUTPUT
echo "image_tag=pr-${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT
- name: Build and push preview image
uses: docker/build-push-action@v4
with:
push: true
tags: ghcr.io/${{ github.repository }}:pr-${{ github.event.pull_request.number }}
- name: Create GitOps manifests
run: |
# Creates NamespaceClaim and ArgoCD Application
# Commits to ARMServicePortal repo
# ArgoCD discovers and deploysThe workflow creates two files in the platform repository:
NamespaceClaim - Creates the isolated namespace:
apiVersion: platform.chrishouse.io/v1alpha1
kind: NamespaceClaim
metadata:
name: red-dev-pr-1
labels:
ephemeral: "true"
pr-number: "1"
spec:
parameters:
targetCluster: shared-dev-cluster
namespaceName: red-dev-pr-1
enableResourceQuota: trueArgoCD Application - Deploys the preview:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: pricing-api-pr-1
namespace: argocd
labels:
ephemeral: "true"
spec:
source:
repoURL: https://github.com/crh225/pricing-api
targetRevision: feature-branch
path: helm
helm:
parameters:
- name: image.tag
value: pr-1
- name: istio.preview.enabled
value: "true"
- name: istio.preview.hostname
value: pricing-api-pr-1-red.chrishouse.io
destination:
name: shared-dev-cluster
namespace: red-dev-pr-1The workflow uses a personal access token (GITOPS_TOKEN) to commit to the platform repository. This feels like a hack but is actually the standard GitOps pattern: your application repo triggers changes in your infrastructure repo, and your GitOps tool (ArgoCD in our case) picks up those changes and applies them.
This part worked immediately. Within a couple minutes of opening a PR, pods were running and the service was created. The Crossplane NamespaceClaim created the isolated namespace, ArgoCD deployed the application, and we had a working preview environment.
Except nobody could reach it. Now comes routing.
Phase 2: The Routing Problem
This is the part that took the longest to solve. Not because the concepts are hard, but because cloud networking has a way of surprising you with limitations that seem arbitrary until you understand the underlying infrastructure.
Attempt 1: Internal LoadBalancer
The obvious first attempt: put an Istio east-west gateway on the dev spoke with an internal (private) LoadBalancer. The hub cluster is on the same Azure VNet peering, so it should be able to reach an internal IP in the dev spoke's VNet.
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"Hub cluster gets the internal IP. ServiceEntry points to it. VirtualService routes preview traffic.
Result:
$ curl http://10.1.0.158/health
# ... hangs ...
# 100% packet lossRoot cause: Azure internal LoadBalancers are not accessible across VNet peering by default. The hub cluster (10.0.0.0/16) cannot reach the dev spoke's internal LoadBalancer (10.1.0.158 in 10.1.0.0/16).
This was frustrating because everything looked correct. VNet peering was connected and showing "Connected" status in the Azure portal. NSG rules allowed VirtualNetwork traffic in both directions. Route tables looked fine. I spent a good hour checking every setting.
The issue is that Azure internal LoadBalancers use a different networking path than regular VNet traffic. They're implemented with a load balancer frontend IP that lives in a special Azure networking plane, and that plane doesn't traverse VNet peering without additional configuration. The VM-to-VM traffic works fine over peering; the traffic to the LoadBalancer frontend IP doesn't.
Azure networking strikes again.
Attempt 2: Azure Private Link Service
The Azure-recommended solution for this exact problem is Private Link. You create a Private Link Service that fronts the internal LoadBalancer, then create a Private Endpoint in the hub VNet that connects to that service. Traffic flows through Azure's backbone, never touching the public internet, and you get a private IP in the hub VNet that routes to the dev spoke's LoadBalancer.
Pros: Proper Azure-native cross-VNet LoadBalancer access. Fully private. Works reliably.
Cons: Adds complexity (two more Azure resources to manage), cost (~$7/month for the private endpoint), and another moving part that can break. For production environments handling sensitive traffic, this is the right answer. For dev preview environments where I'm trying to minimize costs, it felt like overkill.
Attempt 3: NodePort with Hardcoded IPs
I considered bypassing the LoadBalancer entirely. Kubernetes NodePort services expose a port on every node's IP address. Since VNet peering does work for node-to-node traffic, I could add all the dev spoke's node IPs to the ServiceEntry.
endpoints:
- address: "10.1.0.4" # node1
- address: "10.1.0.5" # node2
- address: "10.1.0.6" # node3Pros: Works with VNet peering. No additional Azure services needed.
Cons: Hardcoded IPs. The moment the cluster scales up, scales down, or nodes get replaced (which happens regularly with AKS upgrades), this breaks. I'd have to build automation to keep the ServiceEntry in sync with the node pool, and that felt like fighting Kubernetes rather than working with it. Not acceptable for anything beyond a quick test.
Attempt 4: Istio Multi-Cluster with Remote Secrets
Istio was designed for exactly this scenario. Istio's multi-cluster support allows service meshes to span multiple Kubernetes clusters, with traffic routed seamlessly between them. It is a way how large organizations run Istio across regions, clouds, and network boundaries.
This is what worked.
Phase 3: Istio Multi-Cluster
The Concept
Istio multi-cluster is one of those features that sounds complex but solves a real problem elegantly. At its core, it allows Istio's control plane (istiod) to discover and route to services running in other clusters, as if they were local services.
The key insight is that cross-cluster communication doesn't have to be complicated if your service mesh understands the topology. Instead of manually configuring routing rules and endpoints, you tell Istio "here's another cluster you should know about" and it handles the rest.
The key components:
- Remote Secrets: Kubeconfig credentials that allow istiod in one cluster to query the Kubernetes API of another cluster. Once istiod can list services and endpoints in the remote cluster, it can route traffic there.
- East-West Gateway: A dedicated ingress point for cross-cluster traffic. Unlike the north-south gateway that handles external traffic, the east-west gateway handles internal mesh traffic between clusters.
- Network Topology Labels: Labels on the istio-system namespace that tell Istio which network each cluster belongs to. This helps Istio understand when traffic needs to cross a network boundary.
Implementation
Step 1: Label namespaces with network topology
# Hub cluster
kubectl label namespace aks-istio-system \
topology.istio.io/network=hub-network
# Dev spoke
kubectl label namespace istio-system \
topology.istio.io/network=shared-dev-networkStep 2: Create remote secrets
Remote secrets allow istiod in one cluster to discover services running in another cluster. The Istio documentation covers this well. Use istioctl create-remote-secret to generate the secret for each cluster, then apply it to the other cluster's Istio namespace.
See the Istio multi-cluster installation guide for the complete setup process.
Step 3: Change east-west gateway to public LoadBalancer
Here's where we make the pragmatic choice. The internal LoadBalancer didn't work due to Azure's networking model, and Private Link adds cost and complexity. So we use a public LoadBalancer instead.
# istio-eastwest-gateway-argocd-app.yaml
service:
type: LoadBalancer
annotations:
# Removed: azure-load-balancer-internal annotation
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz/readyResult: East-west gateway gets public IP 52.255.217.180.
This means the east-west gateway is technically internet-accessible. I'll discuss the security implications later, but the short version is: for a dev environment with no sensitive data, it's an acceptable tradeoff.
Why public instead of private?
This is a cost/complexity tradeoff. The "proper" Azure solution would be:
- Create an Azure Private Link Service exposing the east-west gateway
- Create a Private Endpoint in the hub VNet
- Route through the private endpoint
That adds ~$7/month for the private endpoint, plus complexity. For a personal lab environment running on my credit card, the public LoadBalancer at ~$3.65/month is acceptable.
Security implications:
The east-west gateway is now internet-accessible on port 80. However:
- It only routes traffic to services with explicit VirtualService configurations
- Services require the correct Host header to match
- No sensitive data is exposed without intentional configuration
- This is a dev environment for preview URLs, not production traffic
For production environments, I'd recommend:
- Azure Private Link for fully private cross-cluster routing
- Cilium ClusterMesh when Azure CNI adds support (blocked by GitHub issue #5194)
- IP whitelisting on the east-west gateway if public access is required
Step 4: Configure hub routing
ServiceEntry tells the hub where to find the dev spoke:
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: shared-dev-cluster
namespace: istio-ingress
spec:
hosts:
- shared-dev.internal
location: MESH_EXTERNAL
ports:
- number: 80
name: http
protocol: HTTP
resolution: STATIC
endpoints:
- address: "52.255.217.180" # East-west gateway public IPVirtualService routes preview traffic to the ServiceEntry:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: preview-envs-vs
namespace: istio-ingress
spec:
hosts:
- "*.chrishouse.io"
gateways:
- main-gateway
http:
- match:
- headers:
":authority":
regex: ".*-pr-[0-9]+-[a-z0-9]+\\.chrishouse\\.io"
route:
- destination:
host: shared-dev.internal
port:
number: 80The regex matches preview URLs like pricing-api-pr-1-red.chrishouse.io.
Step 5: Update Helm template for preview VirtualServices
The application's VirtualService needs to reference both the hub gateway and the cross-network gateway:
# helm/templates/virtualservice.yaml
spec:
hosts:
- {{ .Values.istio.preview.hostname }}
gateways:
- {{ .Values.istio.gateway }}
{{- if .Values.istio.preview.enabled }}
- istio-system/cross-network-gateway
{{- end }}Verification
The moment of truth. After all this configuration, does it actually work?
$ curl https://pricing-api-pr-1-red.chrishouse.io/health
{"status":"healthy","service":"pricing-api","timestamp":"2025-12-23T15:13:18.004Z"}That healthy response took way too long to see, but there it is. Traffic is flowing from the internet, through the hub cluster, across the cluster boundary, and into the preview environment running on the dev spoke.
Here's the full traffic flow:
- DNS resolution: Browser looks up
pricing-api-pr-1-red.chrishouse.io, gets48.194.61.98(hub Istio ingress) - TLS termination: Hub's Istio gateway terminates TLS using the
*.chrishouse.iowildcard certificate - Pattern matching: Hub's VirtualService sees the hostname matches the preview regex pattern
- Cross-cluster routing: Request is forwarded to ServiceEntry
shared-dev.internal - External routing: ServiceEntry resolves to
52.255.217.180(dev spoke's east-west gateway) - Service routing: East-west gateway routes to the pricing-api service based on the Host header
- Response: The whole chain reverses, response arrives at the browser
Phase 4: The TLS Certificate Issue
Just when I thought we were done, there was one more surprise waiting.
The Problem
Testing the preview URL with the original naming scheme:
$ curl https://pricing-api-pr-1.red.chrishouse.io/health
curl: (60) SSL certificate problem: unable to get local issuer certificateThe routing works (we verified that with HTTP), but HTTPS is failing with a certificate error?
The original URL pattern was {service}-pr-{number}.{team}.chrishouse.io, something like pricing-api-pr-1.red.chrishouse.io. That's a two-level subdomain: pricing-api-pr-1 under red under chrishouse.io.
The existing wildcard certificate is *.chrishouse.io. And here's the thing about wildcard certificates that trips people up: they only match one level of subdomain, not arbitrary depth.
pricing-api.chrishouse.io→ matches*.chrishouse.io✓anything.chrishouse.io→ matches*.chrishouse.io✓pricing-api-pr-1.red.chrishouse.io→ does NOT match*.chrishouse.io✗
The wildcard only replaces the single * portion. It doesn't recursively match nested subdomains.
Options Considered
Option A: Per-team wildcard certs (*.red.chrishouse.io)
I could create a wildcard certificate for each team's subdomain. *.red.chrishouse.io would match pricing-api-pr-1.red.chrishouse.io just fine.
Rejected. That means managing N certificates where N is the number of teams. Each new team requires provisioning a new certificate, configuring it in the gateway, and keeping track of renewals. Certificate management is already tedious; multiplying it doesn't help.
Option B: SAN certificate with all patterns
A single certificate with Subject Alternative Names (SANs) for each team pattern: *.chrishouse.io, *.red.chrishouse.io, *.blue.chrishouse.io, etc.
Rejected. Same problem: the certificate needs to be regenerated every time a new team is added. Plus there are limits on SAN entries, and it adds operational overhead.
Option C: Change URL pattern to single-level subdomain
Instead of {service}-pr-{number}.{team}.chrishouse.io, use {service}-pr-{number}-{team}.chrishouse.io. The team name becomes part of the single subdomain rather than its own level.
Accepted. It's the simplest solution and works with the existing wildcard certificate without any changes to certificate management.
The Fix
Changed URL pattern from:
{service}-pr-{number}.{team}.chrishouse.io (two-level)To:
{service}-pr-{number}-{team}.chrishouse.io (single-level)Updated files:
backstage/templates/nodejs-quickstart/skeleton/.github/workflows/preview.ymlbackstage/templates/nodejs-quickstart/skeleton/helm/values.yamlinfra/kubernetes/istio-hub/virtualservice-preview-envs.yamlpricing-api/.github/workflows/preview.yml
New regex pattern:
headers:
":authority":
regex: ".*-pr-[0-9]+-[a-z0-9]+\\.chrishouse\\.io"DNS Update
The final piece of the puzzle was DNS. I updated the wildcard DNS record *.chrishouse.io to point to the hub Istio ingress at 48.194.61.98. This means any subdomain that doesn't have a more specific A record will resolve to the hub gateway.
Importantly, existing services with explicit A records (like argohub.chrishouse.io and backstage.chrishouse.io pointing to nginx-ingress at 20.253.73.108) are unaffected. DNS resolution prefers more specific records over wildcards, so those services continue to work exactly as before.
Phase 5: Automatic Cleanup
Preview environments are only useful if they don't accumulate. Without cleanup, you'd end up with dozens of abandoned namespaces consuming cluster resources, each one a forgotten artifact of a PR from three months ago.
When a PR is merged or closed, the preview environment should be deleted automatically. This is the "ephemeral" part of ephemeral environments.
cleanup-preview:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- name: Checkout GitOps repository
uses: actions/checkout@v3
with:
repository: crh225/ARMServicePortal
token: ${{ secrets.GITOPS_TOKEN }}
path: gitops
- name: Remove preview environment manifests
working-directory: gitops
run: |
rm -rf infra/quickstart-services/${SERVICE_NAME}/preview-pr-${PR_NUM}
- name: Commit and push cleanup
run: |
git add .
git commit -m "Cleanup preview environment for PR #${PR_NUM}"
git pushThe cleanup follows the same GitOps pattern as creation. The workflow deletes the manifests from Git, commits the change, and lets ArgoCD handle the rest. ArgoCD has prune: true in its sync policy, which means when it detects that a resource exists in the cluster but not in Git, it deletes it.
This is one of the elegant things about GitOps: cleanup is just another commit. Namespace, pods, services, VirtualService, all removed automatically when the ArgoCD Application manifest disappears from the repository. No custom cleanup scripts, no cron jobs scanning for orphaned resources, no manual intervention.
The Final Architecture
Internet
│
▼
┌─────────────────────────────────────────┐
│ Hub Cluster (aks-mgmt-hub) │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Istio Gateway (48.194.61.98) │ │
│ │ - TLS termination (*.chrishouse.io) │ │
│ │ - VirtualService regex matching │ │
│ └──────────────┬──────────────────────┘ │
│ │ │
│ ┌──────────────▼──────────────────────┐ │
│ │ ServiceEntry (shared-dev.internal) │ │
│ │ → 52.255.217.180 │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Dev Spoke (aks-shared-dev) │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ East-West Gateway (52.255.217.180) │ │
│ │ - Public LoadBalancer │ │
│ └──────────────┬──────────────────────┘ │
│ │ │
│ ┌──────────────▼──────────────────────┐ │
│ │ VirtualService (pricing-api) │ │
│ │ - Routes to service in namespace │ │
│ └──────────────┬──────────────────────┘ │
│ │ │
│ ┌──────────────▼──────────────────────┐ │
│ │ pricing-api Service │ │
│ │ Namespace: red-dev-pr-1 │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘What We Learned
Building this feature took longer than expected, but most of the time wasn't spent on the preview workflow itself (that was straightforward). The complexity was in the cross-cluster routing, and specifically in working around Azure's networking limitations.
What Worked
- GitOps-driven preview environments: The pattern of PR workflow → GitOps commit → ArgoCD deploy → cleanup commit is clean and reliable. Every state change is recorded in Git, which makes debugging and auditing trivial.
- Istio multi-cluster with remote secrets: Once configured, this just works. Istio handles service discovery across clusters without any per-service configuration.
- Single-level subdomain URLs: A simple URL pattern change avoided certificate complexity entirely.
- Automatic cleanup: GitOps makes cleanup as reliable as deployment. If the manifest isn't in Git, the resource doesn't exist.
What Didn't Work (Initially)
- Azure internal LoadBalancers: Not accessible across VNet peering without Private Link. This was a frustrating discovery because everything else about VNet peering works fine.
- Two-level subdomain URLs: Wildcard certs only match one level. This is well-documented behavior, but easy to forget when designing URL schemes.
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Cross-cluster routing | Istio multi-cluster | Industry standard, mTLS, service discovery |
| East-west gateway | Public LoadBalancer | Azure internal LBs not accessible cross-VNet; cheaper than Private Link |
| URL pattern | Single-level subdomain | Works with existing wildcard cert |
| Cleanup trigger | PR close event | Immediate, no TTL complexity |
Cost vs Security Tradeoff
This is a personal lab environment running on my Azure subscription (and credit card). The architecture choices reflect that:
| Option | Monthly Cost | Complexity | Security |
|---|---|---|---|
| Public LoadBalancer (chosen) | ~$3.65 | Low | Dev-acceptable |
| Azure Private Link | ~$10.65 | Medium | Production-ready |
| VPN/ExpressRoute | $50+ | High | Enterprise-grade |
For production workloads, we could use Azure Private Link. For a dev environment where the only exposed services are ephemeral preview deployments with no sensitive data, the public LoadBalancer is a reasonable tradeoff.
The east-west gateway only routes traffic to services with explicit VirtualService configurations. It's not an open proxy.
Operational Costs
- East-west gateway public IP: ~$3.65/month (Azure)
- Additional Istio overhead: Minimal (already running Istio on both clusters)
- Network egress: Hub → dev spoke traffic stays within Azure
Implementation Repository
Full implementation: github.com/crh225/ARMServicePortal
Key files:
- Preview workflow template: backstage/templates/nodejs-quickstart/skeleton/.github/workflows/preview.yml
- Hub VirtualService: infra/kubernetes/istio-hub/virtualservice-preview-envs.yaml
- ServiceEntry: infra/kubernetes/istio-hub/service-entry-shared-dev.yaml
- East-west gateway: infra/cluster-bootstrap-shared-dev/istio-eastwest-gateway-argocd-app.yaml
- VirtualService Helm template: backstage/templates/nodejs-quickstart/skeleton/helm/templates/virtualservice.yaml
Next in series: Part 3 will cover cost visibility, showing developers the real cost of their applications directly in Backstage with Azure Cost Management integration and resource tagging.
Enjoyed this post? Give it a clap!
- 1Platform Engineering Golden Paths: Common Patterns
- 2Building Golden Paths with Backstage: Part 1 - Foundation
- 3Building Golden Paths with Backstage: Part 2 - Preview EnvironmentsReading
Comments