December 20, 2025·4 min read
After running a single-cluster Kubernetes setup for a while, I recently migrated to a hub-spoke architecture. This post covers why I made the change, how the traffic routing works, and the key decisions around where to place the Istio ingress gateways.
Why Hub-Spoke?
The single-cluster approach is simple but has limitations:
- Blast radius - A misconfiguration or resource exhaustion affects everything
- Scaling constraints - Platform services compete with applications for resources
- Upgrade risk - Cluster upgrades put everything at risk simultaneously
The hub-spoke pattern separates concerns:
- Hub cluster - Runs platform/control plane services (ArgoCD, Crossplane, Backstage)
- Spoke cluster(s) - Runs application workloads (APIs, frontends, microservices)
The Architecture
Here's what my setup looks like:
+---------------------------+
| Cloudflare (DNS/CDN) |
+-------------+-------------+
|
+----------------------------+----------------------------+
| |
v v
+---------------------------+ +---------------------------+
| Hub Traffic | | Spoke Traffic |
| backstage.chrishouse.io | | portal.chrishouse.io |
| argocd.chrishouse.io | | portal-api.chrishouse.io |
+---------------------------+ | blog.chrishouse.io |
| +---------------------------+
v |
+--------------------------------------+ |
| AKS Hub Cluster (aks-mgmt-hub) | |
| +--------------------------------+ | |
| | Istio Ingress Gateway | | |
| | (Hub Services Only) | | |
| +--------------------------------+ | |
| | |
| ArgoCD | Crossplane | Cert-Manager | |
| Backstage | Argo Rollouts | |
+--------------------------------------+ |
| |
| Manages via ArgoCD |
v v
+------------------------------------------------------------------------+
| AKS Spoke Cluster (aks-app-spoke) |
| +------------------------------------------------------------------+ |
| | Istio Ingress Gateway | |
| | (Application Traffic - Direct from Cloudflare) | |
| +------------------------------------------------------------------+ |
| |
| portal-api (Node.js) | blog (Gatsby) | frontend (React) |
+------------------------------------------------------------------------+Key Design Decision: Decentralized Ingress
The critical decision was where to place the Istio ingress gateways. There are two patterns:
Centralized (Hub Ingress)
Internet -> Hub Gateway -> Routes to Spoke clustersDecentralized (Spoke Ingress) - What I chose
Internet -> Each cluster's own GatewayI went with decentralized ingress for one main reason: fault isolation. If the hub cluster goes down (maintenance, failed upgrade, resource issues), my applications remain accessible. The hub is a control plane, not a data plane.
Cluster Breakdown
Hub Cluster Services
| Service | Purpose |
|---|---|
| ArgoCD | GitOps controller - manages deployments to all clusters |
| Crossplane | Infrastructure as Code - provisions cloud resources |
| Cert-Manager | TLS certificate automation via Let's Encrypt |
| Backstage | Developer portal and service catalog |
| Argo Rollouts | Progressive delivery controller |
Spoke Cluster Platform Services
| Service | Purpose |
|---|---|
| Cert-Manager | Independent TLS certificates for spoke ingress |
| Istio Service Mesh | Traffic management and mTLS |
Spoke Cluster Workloads
| Application | Description |
|---|---|
| portal-api | Node.js backend API |
| portal-frontend | React SPA |
| blog | Static site (Gatsby) |
Istio Configuration
Each cluster runs its own Istio service mesh with an ingress gateway. The spoke cluster handles HTTPS termination with its own TLS certificates:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: external-gateway
namespace: istio-ingress
spec:
selector:
istio: aks-istio-ingressgateway-external
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "portal.chrishouse.io"
- "portal-api.chrishouse.io"
- "blog.chrishouse.io"
tls:
httpsRedirect: true
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "portal.chrishouse.io"
- "portal-api.chrishouse.io"
- "blog.chrishouse.io"
tls:
mode: SIMPLE
credentialName: wildcard-tlsThe wildcard-tls secret is created by cert-manager using a wildcard certificate for *.chrishouse.io. This means the spoke cluster is fully independent for TLS - it doesn't rely on the hub for certificate management.
Each application gets a VirtualService that routes traffic:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: portal-api-vs
namespace: istio-ingress
spec:
hosts:
- "portal-api.chrishouse.io"
gateways:
- internal-gateway
http:
- route:
- destination:
host: portal-api.portal-api.svc.cluster.local
port:
number: 80Removing Redundant Ingress Resources
One issue I ran into: my Helm charts still had nginx Ingress resources defined, even though Istio handles all traffic routing. This caused ArgoCD to show applications as "Progressing" indefinitely.
Why? ArgoCD's health check for Ingress resources waits for a load balancer IP to be assigned. Since nginx-ingress wasn't assigning IPs (Istio handles traffic instead), the Ingress stayed in a pending state forever.
The fix was simple - disable the Ingress in Helm values:
# values.yaml
ingress:
enabled: false # Istio VirtualService handles routingAnd remove the Ingress from Kustomize resources:
# kustomization.yaml
resources:
- namespace.yaml
- serviceaccount.yaml
- configmap.yaml
- deployment.yaml
- service.yaml
# - ingress.yaml # Removed - using IstioTraffic Flow Explained
- DNS: Cloudflare manages DNS for
*.chrishouse.io, pointing to the spoke cluster's external IP - TLS: Terminated at the Istio ingress gateway using wildcard certificates issued by cert-manager (each cluster manages its own certs)
- Service Mesh: Istio routes to the correct service based on VirtualService rules
- mTLS: All pod-to-pod traffic within the mesh is encrypted
For hub services (backstage.chrishouse.io):
Cloudflare -> Hub Istio Gateway -> Backstage PodFor spoke services (portal-api.chrishouse.io):
Cloudflare -> Spoke Istio Gateway -> Portal API PodThe hub is never in the path for spoke traffic.
ArgoCD Multi-Cluster Management
ArgoCD on the hub manages applications across both clusters. Each Application specifies its destination:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: portal-api
namespace: argocd
spec:
destination:
name: aks-app-spoke # Target cluster
namespace: portal-api
source:
repoURL: https://github.com/crh225/ARMServicePortal.git
path: infra/kubernetes/portal-api
targetRevision: mainThe hub cluster is registered as a destination in ArgoCD, allowing centralized management while keeping workloads distributed.
Pros and Cons
Advantages
- Fault isolation - Hub issues don't affect running applications
- Independent scaling - Clusters scale based on their workload type
- Cleaner upgrades - Upgrade hub without touching production apps
- Security boundaries - Platform credentials isolated from app workloads
Tradeoffs
- Complexity - Two clusters to manage instead of one
- Cost - Additional control plane costs (though node pools can be sized appropriately)
- Networking - Cross-cluster communication requires additional configuration
- Observability - Metrics and logs spread across clusters
When to Use This Pattern
Good fit:
- Multiple teams deploying to Kubernetes
- High availability requirements for applications
- Frequent platform upgrades
- Compliance requirements for separation of concerns
Overkill for:
- Single small application
- Development/testing environments
- Cost-sensitive projects with low traffic
The hub-spoke pattern provides a solid foundation for scaling the platform as needs grow.
Ultimately I chose this pattern also for cost. I want to be able to have my apps and blog available and shut down the hub when not in use and active development.
A key improvement was making the spoke cluster fully independent with its own cert-manager and TLS certificates. This means the spoke can serve HTTPS traffic even when the hub is completely offline - true fault isolation for production workloads.
Resources
Enjoyed this post? Give it a clap!
Comments