Glossary

AuthorizationPolicy. Istio’s L7 authorization resource, evaluated by waypoint proxies. Permits or denies requests based on SPIFFE identity, HTTP method, path, and headers. Generated by the mesh-member controller from bilateral agreement declarations. See Chapter 13.

Agent. The component that runs on each workload cluster and establishes an outbound gRPC connection to the parent cluster’s cell. The agent handles health reporting (heartbeats), receives pivot commands and policy updates, and proxies Kubernetes API requests from the parent to the local API server. It is installed during cluster bootstrapping and authenticated via mTLS. See Chapter 7.

Ambient mesh. Istio’s sidecar-free service mesh architecture, in which L4 concerns (mTLS, identity) are handled by a per-node ztunnel DaemonSet and L7 concerns (authorization, routing) are handled by per-namespace waypoint proxies. The reference implementation uses ambient mesh because it leaves the derivation pipeline’s pod specs unmodified — no injected sidecars, no config hash drift. See Chapter 15.

Bilateral agreement. The network security model in which traffic between two services flows only when both sides declare the relationship: the caller declares an outbound dependency and the callee declares an inbound dependency. If only one side declares, no network policies are generated and default-deny blocks the traffic. This enables cross-service reasoning that template-based approaches cannot replicate. See Chapter 13.

CAPI (Cluster API). Kubernetes project for declarative cluster lifecycle management. Manages Cluster, Machine, MachineDeployment resources. See Chapter 5.

CedarPolicy. The CRD for distributing Cedar authorization policies to workload clusters. Validated by a webhook (syntax and type checking) and distributed via the gRPC stream. See Chapter 13.

Cedar. The policy engine used for derivation-time authorization. Cedar evaluates focused questions (gates) against the service spec before any Kubernetes resources are generated. Its forbid-overrides-permit semantics align with the platform’s default-deny posture: if any matching policy says forbid, the result is deny regardless of permits. Policies are distributed to workload clusters as CRDs via the gRPC stream. See Chapter 13.

Cell. The component that runs on the parent cluster and accepts inbound gRPC connections from agents on workload clusters. The cell verifies mTLS certificates, extracts cluster identity from the certificate CN field, multiplexes management communication over established streams, and serves as the parent-side endpoint for pivot commands, API proxying, policy distribution, and health monitoring. See Chapter 7.

CiliumNetworkPolicy. The L4 (network-level) policy resource generated by the mesh-member controller from bilateral agreement declarations. CiliumNetworkPolicies enforce packet-level filtering via eBPF in the kernel, controlling which pods can send or receive traffic on which ports. They operate independently of L7 identity-based authorization. See Chapter 13.

Compliance controller / LatticeComplianceProfile. The controller that continuously evaluates whether the platform’s security model is working, not just configured. The LatticeComplianceProfile CRD specifies which compliance frameworks to evaluate (e.g., NIST 800-53, CIS Kubernetes, SOC 2) and the scan interval. The controller runs probes — Cedar authorization probes, resource existence probes, configuration probes — and records pass/fail results with evidence in the CRD status. See Chapter 14.

Cosign. Sigstore tool for container image signing and verification. Integration with the derivation pipeline is described in Chapter 14 but not yet implemented in the reference code. See the Implementation Roadmap.

Default-deny. The security posture in which every layer of the platform starts closed: no network traffic is permitted (L4), no service requests are authorized (L7), and every Cedar gate denies by default. Every capability a service has must be explicitly granted. The derivation pipeline absorbs the operational cost by generating the necessary permit policies from dependency declarations. See Chapter 12.

Derivation / derivation pipeline. The process by which the platform transforms a developer’s intent-level CRD spec into the full set of Kubernetes infrastructure resources (Deployments, network policies, external secrets, scrape targets, disruption budgets, etc.). The pipeline authorizes (Cedar gates), derives (shared WorkloadCompiler), and applies resources in dependency-ordered layers. It runs continuously via the reconciliation loop, not as a one-shot operation. See Chapter 8.

Enforcement layer. One of the independent security mechanisms in the platform’s defense-in-depth architecture. The reference implementation uses four layers: admission webhooks (schema validation), Cedar authorization (derivation-time policy), Cilium (L4 kernel-level packet filtering), and Istio (L7 identity-based request authorization). Each layer catches threats the others cannot, and no single failure opens all access. See Chapter 12.

Escape hatch. A mechanism for deploying workloads that do not fit the derivation pipeline, such as third-party Helm charts (Redis, Prometheus) or edge cases requiring Kubernetes features the CRD does not expose. The primary escape hatch is the LatticePackage CRD for Helm charts, bridged into the security model via LatticeMeshMember. Escape hatches trade the full derivation pipeline’s guarantees for flexibility. See Chapter 11.

ECC (Error Correcting Code). GPU memory error detection. Single-bit errors (SBE) are correctable; double-bit errors (DBE) may cause hard failures or silent data corruption depending on location and firmware. See Chapter 17.

ExternalSecret / ESO. External Secrets Operator (ESO) and its ExternalSecret CRD, which the derivation pipeline uses to sync secrets from external backends (Vault, AWS Secrets Manager) into standard Kubernetes Secrets before pod startup. This decouples applications from the secret backend, enables derivation-time authorization via Cedar AccessSecret, and solves startup ordering. The pipeline routes secrets through five paths based on consumption pattern: pure secret variable, mixed-content variable, file mount, image pull secret, and bulk extraction. See Chapter 9.

FIPS (Federal Information Processing Standards). US government cryptographic standards. FIPS 140-3 specifies approved algorithms for encryption modules. Relevant to mesh encryption (WireGuard is not FIPS-approved; standard TLS with AES-GCM is), certificate algorithms, and secret backend configuration. See Chapter 22.

Gateway API. The Kubernetes API for edge networking (GatewayClass, Gateway, HTTPRoute). Replaces legacy Ingress with a role-oriented model: platform teams configure GatewayClass and Gateway, developers configure HTTPRoute. See Chapter 15.

Gang scheduling. The atomic placement of a group of pods as a unit, ensuring either all members of the group are scheduled or none are. The default Kubernetes scheduler commits per-pod and cannot roll back partial placements, which wastes resources when batch workloads (distributed training, MPI jobs) need all participants to proceed. The reference implementation uses Volcano’s two-phase commit: tentatively place all pods, then commit or discard the entire batch. See Chapter 16.

Ghost loss. A GPU failure mode in which the GPU produces incorrect computation results without crashing the pod. Caused by double-bit ECC errors that do not trigger a device reset, XID errors in certain categories, or driver instability. Ghost losses are the most expensive GPU failure mode because the workload continues running with corrupted results, potentially wasting an entire training run before detection. See Chapter 17.

gRPC stream (agent-cell). The persistent, bidirectional gRPC stream initiated outbound from the workload cluster’s agent to the parent cluster’s cell. All management communication — health reporting, pivot commands, API proxying, policy distribution, exec sessions — is multiplexed over this single stream. The connection uses mTLS with short-lived certificates. The workload cluster exposes zero inbound ports. See Chapter 7.

InfraProvider. The CRD that holds provider-specific credentials and defaults for cluster provisioning (e.g., Proxmox API endpoint, AWS region, Docker socket). One InfraProvider per provider per environment. See Chapter 5.

Intent (vs. infrastructure). The foundational distinction of the book. An intent-based spec describes what the developer needs (dependencies, secrets, replica count); an infrastructure-based spec describes what Kubernetes resources to produce (NetworkPolicy rules, ServiceMonitor intervals). When the developer describes intent, the platform controls the infrastructure and can evolve independently — changing policy models, secret backends, or compliance requirements without modifying any service spec. See Chapter 2.

KEDA. Kubernetes Event-Driven Autoscaler. Scales workloads based on external metrics. See Chapter 10.

KV cache. Key-value cache in transformer model inference. Stores computed attention keys and values to avoid recomputation during token generation. KV cache memory usage is the primary scaling metric for inference workloads — unlike services that scale on CPU. See Chapter 18.

LatticeCluster. The CRD that declares a cluster’s desired state: Kubernetes version, provider configuration, node pools, and platform services. The platform provisions infrastructure, bootstraps components in dependency order, executes the pivot, and reports status. Applying the same LatticeCluster spec produces an identical cluster, enabling clusters-as-cattle. See Chapter 5.

LatticeJob. The CRD for batch workloads. It supports gang scheduling (minAvailable), Volcano queue assignment, multiple task groups, and completion tracking. The LatticeJob compiler wraps the shared WorkloadCompiler’s output in a Volcano VCJob with PodGroup, rather than a Deployment. Jobs get the same security model as services (image verification, Cedar authorization, bilateral agreements) through the shared compiler. See Chapter 16.

LatticeModel. The CRD for inference workloads. It supports disaggregated serving roles (prefill and decode), model source configuration, GPU-aware autoscaling on metrics like KV cache usage, and long stabilization windows to account for cold-start costs. The LatticeModel compiler produces separate Deployments per role with inference-specific routing configuration. See Chapter 18.

LatticeMeshMember. The CRD that declares a workload’s participation in the bilateral agreement model. For derived services, the derivation pipeline generates it automatically. For escape-hatch workloads (Helm-installed packages), operators create it manually to specify the target pod selector, ports, peer authentication mode, and allowed callers. The mesh-member controller uses LatticeMeshMembers to generate CiliumNetworkPolicies and AuthorizationPolicies. See Chapters 8 and 11.

LatticePackage. The CRD for third-party software installed via Helm charts. Designed to provide lifecycle management (install, upgrade, rollback, uninstall), values interpolation with the platform’s secret resolution system, dependency ordering, and garbage collection. Packages get Cedar AccessSecret authorization and secret backend abstraction but not the full derivation pipeline — network policy integration requires a separate LatticeMeshMember. Not yet implemented in the reference code. See Chapter 11 and the Implementation Roadmap.

LatticeService. The primary CRD for long-running workloads (web services, APIs). The developer declares containers, dependencies, secrets, and replica count; the derivation pipeline produces the full infrastructure set — Deployment, Service, network policies, external secrets, scrape targets, disruption budgets, and more. It is the central example of intent-over-infrastructure throughout the book. See Chapters 2 and 4.

Mesh member controller. The controller that watches all LatticeMeshMember resources across the cluster, matches bilateral declarations (outbound on one service with inbound on another), and generates the corresponding CiliumNetworkPolicies (L4) and Istio AuthorizationPolicies (L7). It performs the cross-service reasoning that makes bilateral agreements work, operating on the full service graph rather than individual resources. See Chapter 13.

Pivot. The operation that transfers CAPI resource ownership from the management cluster to the workload cluster, making the workload cluster self-managing. The parent serializes CAPI resources (Cluster, MachineDeployment, Machine, etc.) and sends them to the agent over the gRPC stream. The agent imports them in topological order, remaps UIDs, and unpauses them for local CAPI controllers to reconcile. The pivot is idempotent, ordered, reversible (via unpivot), and completes in approximately 13 seconds. See Chapter 6.

PeerRouteSync. A message in the agent-cell gRPC protocol that distributes cross-cluster routing information. When bilateral agreements span clusters, the parent matches declarations from both sides and sends PeerRouteSync to each cluster with the remote service’s address and SPIFFE identity. Each cluster’s mesh-member controller uses this information to generate policies referencing the remote identity. See Chapters 7 and 15.

RPO / RTO. Recovery Point Objective (how much data loss is acceptable) and Recovery Time Objective (how long recovery can take). For self-managing clusters, RPO is bounded by etcd snapshot frequency; RTO depends on whether the cluster rebuilds from scratch or restores from snapshot. See Chapter 20.

RED metrics. Rate, Errors, Duration — the three standard signals for monitoring request-driven services. Rate is requests per second, Errors is the fraction of requests that fail, and Duration is the distribution of response times (typically measured as percentiles). RED metrics provide the minimum viable observability for any service handling requests. The derivation pipeline auto-generates scrape targets (VMServiceScrape) so RED metrics are collected for every compiled service without developer configuration. See Chapter 19.

Reconciliation loop. The continuous control loop in which a controller watches CRD specs, derives the desired state, diffs it against the current state, applies changes incrementally, updates status, and requeues. Unlike a CI/CD pipeline that derives infrastructure once at deploy time, the reconciliation loop derives continuously — correcting drift, re-deriving when policies change, and maintaining the platform’s guarantees over time rather than at a point in time. See Chapter 2.

SecretProvider. The CRD that abstracts the secret backend (Vault, AWS Secrets Manager) into an ESO ClusterSecretStore. Platform teams create one per backend per environment. The derivation pipeline resolves secret resources against the configured ClusterSecretStore. See Chapter 9.

Self-managing cluster. A cluster that owns its own CAPI resources and infrastructure lifecycle after the pivot. It scales, upgrades, and replaces failed nodes locally without any parent cluster involvement. It survives total network isolation. Self-managing does not mean unmanaged (a spec still governs it) or self-provisioned (something must create it initially). The independence test verifies self-management by deleting the management cluster and confirming operations continue. See Chapter 6.

SPIFFE / SVID. SPIFFE (Secure Production Identity Framework for Everyone) provides cryptographic workload identity. Each workload receives a SPIFFE ID (e.g., spiffe://cluster.local/ns/commerce/sa/checkout) encoded as an X.509 certificate called an SVID (SPIFFE Verifiable Identity Document). The mesh’s CA (istiod) issues short-lived SVIDs that are automatically rotated. SPIFFE IDs are the identity that Istio AuthorizationPolicies evaluate for L7 request authorization. See Chapter 15.

Tetragon / TracingPolicyNamespaced. Cilium’s eBPF-based runtime security tool. Intercepts syscalls (including execve) to enforce binary allowlists. The derivation pipeline generates a TracingPolicyNamespaced for every compiled service, deriving the allowlist from the container image’s entrypoint and declared binaries. See Chapter 14.

TrustPolicy. The CRD that maps signing keys to container registries for image verification. When a service deploys an image, the DeployImage Cedar gate checks the image signature against the matching TrustPolicy. Not yet implemented in the reference code. See Chapter 14 and the Implementation Roadmap.

Trust bundle federation. A mechanism for establishing cross-cluster identity trust. In the reference implementation, the parent cluster’s CA is the root and child cluster CAs are intermediates signed by the parent, so any SPIFFE ID is verifiable up to the shared root. The alternative — independent CAs with mutual trust bundle distribution — limits blast radius but requires O(N²) trust relationships. See Chapter 15.

VPA (Vertical Pod Autoscaler). Adjusts container resource requests based on observed usage. The reference implementation does not use VPA due to conflicts with the derivation pipeline’s server-side apply field management. See Chapter 10.

Volcano. Batch scheduling system for Kubernetes. Provides gang scheduling via two-phase commit. See Chapter 16.

VCJob. Volcano’s job CRD, the output of the LatticeJob compiler. Supports task groups, gang scheduling via PodGroup, and queue assignment. See Chapter 16.

Velero. Kubernetes backup tool for etcd snapshots and persistent volume snapshots. The platform schedules Velero backups via LatticeClusterBackup CRDs. See Chapter 20.

VMServiceScrape. The VictoriaMetrics CRD that defines a metrics scrape target. The derivation pipeline generates a VMServiceScrape for every compiled service automatically — there is no metrics.enabled: true field. The developer’s service exists, therefore it is observable. VMAgent discovers VMServiceScrape resources and begins scraping the service’s /metrics endpoint. See Chapter 19.

Waypoint proxy. The per-namespace (or per-service) Envoy proxy in Istio’s ambient mesh architecture that handles L7 concerns: evaluating AuthorizationPolicies against SPIFFE identities, HTTP method and path constraints, and request-level routing. Waypoint proxies complement ztunnel, which handles L4. If a waypoint proxy crashes, L7 enforcement is lost for the affected namespace, but L4 (Cilium) enforcement continues independently. See Chapter 15.

WorkloadCompiler (shared). The shared compilation core used by all CRD-specific controllers (LatticeService, LatticeJob, LatticeModel). It handles everything common across workload types: container compilation, image verification, Cedar authorization gates, secret resolution, environment and file compilation, and mesh member generation. Type-specific wrappers add lifecycle-specific resources (Deployment vs. VCJob vs. disaggregated Deployments). A bug fix or new gate in the shared compiler applies to all workload types simultaneously. See Chapter 8.

XID error. NVIDIA GPU error codes reported through the kernel driver. Different XID codes indicate different failure types: XID 79 (GPU fallen off bus) is a hard loss, XID 48 (double-bit ECC) typically indicates a hard loss, XID 63 (ECC page retirement) indicates degradation. Monitored via DCGM_FI_DEV_XID_ERRORS. See Chapter 17.

Ztunnel. The per-node DaemonSet in Istio’s ambient mesh that handles L4 concerns: mTLS termination and initiation, SPIFFE identity assertion, and encrypted transport between nodes. Ztunnel operates transparently via eBPF traffic redirection, requiring no sidecar injection or pod spec modification. If a ztunnel pod crashes, every pod on that node loses L4 mTLS — a larger blast radius than sidecar mode but a rarer failure. See Chapter 15.