Skip to content

Chapter 15: Mesh, Multi-Cluster, and the Edge

L7 authorization policies (Chapter 13) need a caller identity to evaluate. Where does that identity come from? How does it work across clusters? And what happens at the edge, where the caller has no mesh identity at all?

The decisions — cryptographic identity as the mesh’s primary purpose, pipeline-transparent infrastructure, cross-cluster trust — are universal. The reference implementation chose Istio ambient mesh; other meshes (Linkerd, Cilium service mesh) make different trade-offs but face the same decisions. The decisions:

  • Why does the platform need a mesh? (Not for observability or traffic management — for cryptographic identity.)
  • Sidecar or ambient? (Does the mesh modify the pod spec the pipeline produced, or is it invisible infrastructure?)
  • How does identity work across clusters? (Shared root CA, or independent CAs with trust bundle federation?)
  • Where does the platform’s trust model end? (At the edge — where mesh identity meets browser HTTPS.)

A service mesh is not an observability tool. For a platform, the mesh serves one irreplaceable function: cryptographic identity for service-to-service communication.

Every workload gets a SPIFFE ID: spiffe://cluster.local/ns/commerce/sa/checkout. This identity is an X.509 certificate, issued by the mesh’s CA (Istio’s istiod), short-lived (24 hours), and automatically rotated. When two services communicate, they perform mutual TLS — each side presents its certificate. The identity is unforgeable (absent CA compromise) and independent of network topology (survives pod rescheduling, IP changes).

Without the mesh, the AuthorizationPolicies from Chapter 13 have nothing to evaluate. There’s no verified caller identity, no mTLS, no SPIFFE ID. The bilateral agreement model requires knowing who is calling — and the mesh is what provides that knowledge.

Everything else the mesh offers — traffic shifting, retries, distributed tracing — is secondary. The platform needs the mesh for identity. Identity enables enforcement. Enforcement enables default-deny.

This decision affects the derivation pipeline directly.

Sidecar injection (traditional Istio, Linkerd). Every pod gets an Envoy sidecar. All traffic enters and exits through the sidecar. The sidecar handles mTLS, policy, and telemetry.

The problem for a platform: the derivation pipeline produces clean pod specs. Sidecar injection modifies them after derivation — adding a container, volumes, and init containers that the platform didn’t produce. The config hash changes. Upgrades require pod restarts. The platform’s derived output and the actual running pod diverge.

Ambient mesh (Istio ambient). No sidecar injection. L4 (mTLS, identity) is handled by ztunnel — a per-node DaemonSet. L7 (authorization, routing) is handled by waypoint proxies — per-namespace or per-service. Traffic is redirected transparently via eBPF.

graph LR
    subgraph Node 1
        Pod1[Pod: checkout]
        ZT1[ztunnel<br/>L4 mTLS]
    end
    subgraph commerce namespace
        WP[Waypoint Proxy<br/>L7 AuthorizationPolicy]
    end
    subgraph Node 2
        Pod2[Pod: payments]
        ZT2[ztunnel<br/>L4 mTLS]
    end
    Pod1 --> ZT1
    ZT1 -->|mTLS| WP
    WP -->|evaluates SPIFFE ID<br/>method + path| ZT2
    ZT2 --> Pod2

The advantage: the derivation pipeline produces clean pod specs, and they stay clean. The mesh is infrastructure — installed during bootstrapping (Chapter 5), invisible to the workload spec. Upgrades to ztunnel or waypoint proxies don’t restart application pods.

The reference implementation uses Istio ambient. The platform derives a Deployment, and the Deployment runs as derived — no injected sidecars, no modified pod specs, no config hash drift.

The trade-off. Ambient is newer and less battle-tested. Ztunnel is per-node — if it fails, every pod on the node loses L4 mTLS. Sidecar mode is per-pod — one sidecar failure affects one pod. The platform accepts the per-node risk because clean pod specs and non-disruptive upgrades matter more for the derivation model.

Alternatives. Cilium’s mesh capabilities are maturing and may eventually replace Istio for L7. Linkerd is simpler but sidecar-only. Calico + no mesh (L4 only) works for platforms that don’t need request-level authorization. The principle — the mesh provides identity, identity enables enforcement — applies regardless of implementation.

15.3 Multi-Cluster: Identity Federation and Service Discovery

Section titled “15.3 Multi-Cluster: Identity Federation and Service Discovery”

Within a single cluster, the mesh provides identity and policy enforcement. Across clusters, two problems emerge.

Each cluster has its own CA and trust domain. A SPIFFE ID from cluster-east isn’t trusted by cluster-west by default. The reference implementation handles trust through the cluster hierarchy: the parent’s CA is the root, child CAs are intermediates signed by the parent. Any SPIFFE ID is verifiable up to the shared root. After pivot (Chapter 6), the trust chain survives self-management.

The trade-off: a shared root means a compromised parent CA compromises every cluster’s identities. Independent CAs with trust bundle federation (each cluster distributes its CA certificate to every other cluster’s trust store) limit the blast radius but require O(N²) trust relationships.

A service in cluster A needs to find a service in cluster B. The reference implementation uses the gRPC stream (Chapter 7): SubtreeState carries service inventories, PeerRouteSync distributes routing information, and ServiceLookupRequest/Response handles synchronous lookups.

Checkout runs on cluster-east. Payments runs on cluster-west. Both connected to a shared parent.

  1. Checkout’s spec declares payments: outbound. Payments’ spec declares checkout: inbound.
  2. Both agents send SubtreeState to the parent. The parent matches the bilateral declarations across clusters.
  3. The parent sends PeerRouteSync to each cluster with the remote service’s address and SPIFFE identity.
  4. Each cluster’s mesh-member controller generates the Istio multi-cluster configuration — remote service entries, destination rules, and AuthorizationPolicies referencing the remote identity.
  5. Checkout sends a request. The mesh on cluster-east routes it directly to cluster-west using the Istio multi-cluster configuration the parent compiled. The parent is not on the data path — it compiled the configuration, but the traffic flows cluster-to-cluster through the mesh. The waypoint proxy on cluster-west verifies checkout’s SPIFFE identity against the AuthorizationPolicy. The request is authorized.

The parent’s role is compilation, not relay. It matches bilateral declarations across clusters and distributes the resulting Istio configuration. Once the configuration is in place, cross-cluster traffic flows directly between clusters through the mesh — the same way standard Istio multi-cluster works. The parent never sees application traffic.

Convergence delay. SubtreeState updates are piggybacked on heartbeats (~30 seconds). If cluster-east deploys a service at t=0 and cluster-west deploys its counterpart at t=15s, the parent may not see both declarations until the next heartbeat from each cluster. Worst case: each cluster’s deployment lands just after a heartbeat, so the parent waits a full heartbeat window for each — two 30-second windows plus compilation time, roughly 90 seconds after the first deployment. During this window, cross-cluster traffic is denied because the bilateral match hasn’t been compiled. For services with tight cross-cluster dependencies, this convergence delay is operationally relevant. The alternative — pushing SubtreeState on every spec change — trades latency for stream bandwidth.

For each cross-cluster bilateral match, the parent instructs each cluster’s mesh-member controller to generate:

  • ServiceEntry on cluster-east pointing to payments’ endpoint on cluster-west
  • DestinationRule with mTLS settings for the remote service
  • AuthorizationPolicy on cluster-west permitting checkout’s SPIFFE identity

This is the same set of resources that standard Istio multi-cluster uses — the difference is that Lattice compiles them from bilateral declarations instead of requiring manual configuration.

Cross-cluster mesh traffic requires network connectivity between clusters. This is a real constraint. The outbound-only model from Chapter 7 applies to the management plane (agent-cell gRPC stream) — workload clusters don’t accept inbound management connections. But the data plane (application traffic between meshed services) requires that clusters can reach each other’s service endpoints, typically through a shared network, VPN, or east-west gateway.

If direct connectivity between clusters isn’t available (strict network isolation, different cloud providers, air-gapped environments), cross-cluster mesh doesn’t work. For these cases, an API gateway at the cluster boundary is the alternative — see below.

The parent compiled the Istio multi-cluster configuration and distributed it. If the parent goes down, the existing configuration persists — cross-cluster traffic continues to flow because the ServiceEntries, DestinationRules, and AuthorizationPolicies are already on each cluster. New cross-cluster bilateral agreements can’t be established (the parent isn’t there to match them), but existing ones keep working.

This is consistent with the self-management principle from Chapter 6: the parent is needed for coordination, not for ongoing operation.

When multi-cluster mesh is worth it. Tight coupling (synchronous calls, shared data). Compliance mandating mTLS across all inter-cluster traffic. Same authorization model across cluster boundaries.

When an API gateway at the boundary is simpler. Loosely-coupled services with a few well-defined cross-cluster APIs. Clusters in different trust domains by design. Operational cost of mesh federation exceeds the benefit.

Inside the mesh: cryptographic identity, mutual authentication, policy enforcement. At the edge: a browser sends HTTPS without a SPIFFE ID.

DNS. When the pipeline processes a service with an ingress block, the platform manages DNS records through the configured provider (Route53, Cloud DNS, Cloudflare). The developer declares a hostname; the platform ensures the record exists.

Gateway API role separation. The Gateway API splits edge networking into three resources, and this separation maps directly onto the derivation model.

  • GatewayClass — the infrastructure provider (e.g., istio, envoyproxy, gke-l7). Installed during cluster bootstrapping. The platform team selects it; developers never touch it.
  • Gateway — the platform team’s domain. Defines which ports are open, which TLS certificates to use, what type of load balancer backs it. The platform team controls TLS policy (minimum version, cipher suites), connection limits, and access logging. One Gateway per cluster or per environment — not per service.
  • HTTPRoute — derived from the service spec’s ingress block. This is the only edge resource the derivation pipeline produces per service.

Why this separation matters: the platform team controls infrastructure decisions (load balancer type, TLS policy, IP allocation) through the Gateway, while the pipeline derives only the routing rules from the developer’s intent. A developer adding a new service with an ingress block doesn’t need to know whether the Gateway is backed by an NLB or an ALB, whether TLS 1.2 is allowed, or how many Gateway instances exist. They declare intent; the pipeline maps it to an HTTPRoute that attaches to the platform’s Gateway.

HTTPRoute derivation. The developer’s service spec contains an ingress block:

ingress:
hostname: checkout.example.com
paths:
- path: /
port: 8080

The pipeline derives an HTTPRoute:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: checkout
namespace: commerce
spec:
parentRefs:
- name: platform-gateway
namespace: gateway-system
hostnames:
- checkout.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: checkout
port: 8080

The developer specified four things (hostname, path, port, and implicitly the service name). The pipeline filled in the parentRef (which Gateway to attach to), the namespace, and the resource metadata. This is the same pattern as every other derived resource — developer intent in, Kubernetes resource out.

cert-manager integration. Every external hostname needs a TLS certificate. When the pipeline creates an HTTPRoute with a hostname, it also creates a Certificate resource targeting the same hostname:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: checkout-tls
namespace: gateway-system
spec:
secretName: checkout-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- checkout.example.com

cert-manager handles the ACME challenge (DNS-01 or HTTP-01, depending on the ClusterIssuer configuration), obtains the certificate from Let’s Encrypt (or another CA), stores it as a Secret in gateway-system, and renews it before expiry. The Gateway’s TLS configuration references this Secret. The developer declared a hostname; the platform handled the rest — challenge, issuance, storage, rotation, and Gateway binding.

An alternative approach: annotate the Gateway with cert-manager.io/cluster-issuer and let cert-manager automatically issue certificates for any hostname that appears in a Gateway listener. This reduces the per-service work but gives less control over certificate lifecycle per hostname.

TLS termination vs. passthrough. For most services, the Gateway terminates TLS. The platform manages the certificate, the Gateway decrypts the traffic, the HTTPRoute inspects headers and paths for routing, and the request enters the mesh as plaintext (re-encrypted by ztunnel’s mTLS for the internal hop). This is the default and the common case.

Some services need end-to-end encryption — the service itself holds a certificate and terminates TLS. Financial services with compliance requirements, services proxying to external TLS backends, or services using client certificate authentication. For these, the ingress block specifies passthrough:

ingress:
hostname: payments.example.com
tls: passthrough

The pipeline derives a TLSRoute instead of an HTTPRoute. The Gateway forwards the encrypted TCP stream directly to the service without decrypting it. The trade-off: the Gateway can’t inspect headers, can’t do path-based routing, can’t add headers — and a WAF at the edge can’t inspect request content. Passthrough is more secure in transit but blind at the boundary. The architect must decide if end-to-end encryption is worth losing request-level inspection at the Gateway. The service is responsible for its own certificate management. This is a per-service decision expressed in the service spec — the platform supports both models.

An ingress request, step by step. A user hits https://checkout.example.com/cart.

  1. DNS resolves to the Gateway load balancer IP (managed by the platform’s DNS provider integration).
  2. The Gateway terminates TLS using a certificate from cert-manager.
  3. The HTTPRoute (derived from checkout’s ingress block) matches the hostname and routes to the checkout Service.
  4. The request enters the mesh with the Gateway’s SPIFFE identity. Internal traffic from checkout to its dependencies uses the normal bilateral model.

Edge authentication. The platform provides transport-level security: TLS termination, certificate management, DNS. This is infrastructure. Application-level authentication — OAuth flows, API key validation, session management — is the application’s responsibility. The platform can’t know which users are authorized for which endpoints.

But there’s a useful middle ground. Many platforms add JWT validation at the Gateway level. The Gateway verifies the token signature (using a JWKS endpoint), checks expiration, and rejects requests with invalid or missing tokens before they reach the application. This doesn’t require the platform to understand application-specific authorization logic — it only verifies that the caller has a valid token from a trusted issuer. The application still decides what a valid token is authorized to do. The Gateway just prevents obviously unauthenticated traffic from consuming application resources. Whether to implement Gateway-level JWT validation is a platform decision. It adds operational complexity (JWKS endpoint configuration, token format assumptions) but reduces the attack surface for every service behind the Gateway.

The trust boundary. Inside: SPIFFE identity, mTLS, AuthorizationPolicy. At the edge: client identity comes from the application’s auth model (OAuth, API keys), optionally pre-validated by the Gateway’s JWT filter. The platform manages transport security (TLS termination, DNS, routing). The application manages authorization. The Gateway sits at the boundary — it can verify credentials but should not make authorization decisions.

Scenario: ztunnel crash on a node. The ztunnel DaemonSet pod on node-12 crashes due to a memory leak. L4 mTLS for every pod on that node is gone. Pods continue running — the application doesn’t know the mesh is down. But traffic between those pods and other nodes fails mTLS handshake.

What the developer sees: intermittent connection failures from services running on node-12. Behavior during ztunnel failure depends on PeerAuthentication mode and eBPF redirect configuration — in STRICT mode, same-node pod-to-pod traffic fails rather than bypassing ztunnel, because eBPF redirects route all traffic through ztunnel. In PERMISSIVE mode, traffic may fall back to plaintext. Cross-node traffic fails the mTLS handshake regardless. The failures are confusing because they depend on which node the caller and callee are on and the mesh’s authentication mode.

What the on-call sees: the ztunnel DaemonSet reports 9/10 ready. Kubernetes restarts the crashed pod. Within 30-60 seconds, ztunnel is back and mTLS resumes. If the crash loops (memory leak recurs), the node loses mTLS indefinitely until the DaemonSet is fixed.

Detection: monitor ztunnel DaemonSet readiness. Alert when any node’s ztunnel is not ready for > 60 seconds. The compliance controller (Chapter 14) can probe: “submit a cross-node mTLS request and verify it succeeds.”

Why this matters for the sidecar-vs-ambient decision (Section 15.2). In sidecar mode, a crash affects one pod. In ambient mode, a crash affects one node. The blast radius is larger but the failure is rarer (one DaemonSet vs. N sidecars) and recovery is faster (DaemonSet restart vs. pod restart). The trade-off is explicit: fewer failure points, larger blast radius per failure.

The networking layer:

  • Chapter 15: Ambient mesh for L7 identity, cross-cluster service discovery compiled by the parent, edge networking through Gateway API.

Part VI tests whether the platform’s architecture extends to workloads that don’t fit the stateless service model — batch jobs (Chapter 16), GPU infrastructure (Chapter 17), and inference (Chapter 18).

15.1. [M10] A namespace has no waypoint proxy deployed. L4 (ztunnel) still works — mTLS and identity. But L7 AuthorizationPolicies are not evaluated. What traffic is allowed that shouldn’t be? What’s the security implication?

15.2. [H30] Design the independent-CA alternative to shared-root federation. Each cluster has its own root CA. Cross-cluster trust is established through trust bundle federation — each cluster distributes its CA certificate to every other cluster’s trust store. What are the trade-offs vs. shared root? How does adding a new cluster work (O(N) trust bundle updates)? What happens during root CA rotation?

15.3. [R] Ambient mesh gives you per-node failure blast radius. Sidecar gives you per-pod. When is per-pod isolation worth the cost of modified pod specs? Is there a workload type where sidecar mode is the right choice even on a platform?

15.4. [H30] Design the API gateway approach for cross-cluster communication. Where does the gateway run? How does it authenticate requests (no mesh identity at the boundary)? How does it interact with bilateral agreements? Does a gateway approach violate default-deny?

15.5. [M10] The edge trust boundary says “application-level authentication is the application’s responsibility.” But many platforms provide JWT validation at the gateway. Should the platform handle edge auth? Where is the boundary?

15.6. [R] Mesh technology evolves fast. Cilium’s L7 is maturing. Istio ambient is new. How should a platform team evaluate mesh technology? How do you migrate meshes without disrupting services? Does the derivation model make migration easier or harder?

15.7. [M10] A service’s ingress block declares hostname checkout.example.com. cert-manager fails to issue the certificate. Should the service deploy without TLS? Should the pipeline wait? What does the status report?