Chapter 2: Intent Over Infrastructure

Chapter 1 argued that a platform delivers security, observability, and governance as automatic properties of every deployment. What makes those properties automatic?

Not a particular mechanism — not CRDs over Helm, not controllers over pipelines. A CRD spec is a values file. A controller is a template engine that runs continuously. The tooling is equivalent in kind. The difference is in what the developer is describing.

In a Helm values file, the developer describes infrastructure: network policy rules, service monitor settings, pod disruption budget parameters, secret store references. They are configuring the output directly — deciding which Kubernetes resources to produce and what those resources should contain.

In a platform’s CRD spec, the developer describes intent: I have a service, it runs this image, it depends on these other services, it reads these secrets, I want this many replicas. They are not configuring infrastructure. They are declaring what they need. The platform derives the infrastructure from the intent.

This distinction — intent versus infrastructure — is the foundation of everything that follows. When the developer describes intent, the platform controls the infrastructure. When the developer describes infrastructure, the platform is just a template engine with extra steps.

graph LR
    Dev[Developer writes spec<br/>image, replicas,<br/>dependencies, secrets] --> Pipeline[Derivation Pipeline]
    Pipeline --> D[Deployment]
    Pipeline --> S[Service]
    Pipeline --> NP[NetworkPolicy]
    Pipeline --> ES[ExternalSecret]
    Pipeline --> SM[VMServiceScrape]
    Pipeline --> PDB[PodDisruptionBudget]
    Pipeline --> AP[AuthorizationPolicy]

The practical consequence: if compliance requirements change, the developer shouldn’t care. If the organization adopts a new network policy model, switches secret backends, adds a runtime enforcement layer, or tightens image signing requirements — none of that changes the developer’s spec. The intent stands. The developer declared what their service needs. The platform’s job is to satisfy that intent under whatever constraints currently apply. The developer wrote “I depend on the orders service and the orders-db secret” a year ago. Since then, the platform migrated from unilateral to bilateral network policies, switched from Vault to AWS Secrets Manager, added Tetragon binary enforcement, and started requiring FIPS-validated TLS. The developer’s spec is unchanged. The derived infrastructure is completely different. That’s the point.

2.1 The Infrastructure Description Problem

A Helm values file for a web service asks the developer to describe infrastructure:

networkPolicy:
  enabled: true
  egress:
    - to: orders
      port: 8080
serviceMonitor:
  enabled: true
  interval: 30s
podDisruptionBudget:
  enabled: true
  maxUnavailable: 1
image:
  repository: registry.example.com/myapp
  tag: v1.2.3
resources:
  requests:
    cpu: 500m
    memory: 512Mi

Each of these fields describes a Kubernetes resource the developer wants produced. networkPolicy.egress is the content of a NetworkPolicy’s egress rules. serviceMonitor.interval is a field on the ServiceMonitor resource. podDisruptionBudget.maxUnavailable is a field on the PDB.

The developer is authoring infrastructure through an indirection layer. The chart templates the resources, but the developer controls their content. This is equivalent to writing the Kubernetes YAML directly, with the convenience that the chart handles boilerplate.

The problem, as Chapter 1 established, is that every infrastructure field is a decision the developer must make correctly — and an enabled: false the developer can set to skip a capability entirely. But there is a deeper problem: the developer is operating at the wrong level of abstraction.

A developer building a payments service thinks: “I have a service. It calls the orders service. It reads the database password from a secret. I want 3 replicas.” They do not think: “I need a CiliumNetworkPolicy with an egress rule to the orders service’s label selector on port 8080, a CiliumNetworkPolicy on the orders service with an ingress rule from my label selector, an Istio AuthorizationPolicy on the orders service permitting my SPIFFE identity, an ExternalSecret referencing a ClusterSecretStore with a Go template for mixed-content interpolation, a VMServiceScrape with a 30-second interval, and a PodDisruptionBudget with maxUnavailable 1.”

The first description is intent. The second is infrastructure. They express the same information, but at different levels of abstraction. The developer knows the first. The platform should derive the second.

2.2 What an Intent-Based Spec Looks Like

A platform CRD spec captures intent:

apiVersion: lattice.dev/v1alpha1
kind: LatticeService
metadata:
  name: checkout
  namespace: commerce
spec:
  replicas: 3
  workload:
    containers:
      api:
        image: registry.example.com/checkout@sha256:abc123
        variables:
          DATABASE_URL: "postgres://app:${resources.orders-db.password}@db:5432/orders"
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 500m
            memory: 512Mi
    resources:
      orders:
        type: service
        direction: outbound
      orders-db:
        type: secret
        params:
          keys: [password]

There is no networkPolicy block. There is no serviceMonitor block. There is no podDisruptionBudget block. The developer declared: I have a container, it calls the orders service, it reads the orders-db secret, I want 3 replicas.

The platform derives the infrastructure:

From type: service, direction: outbound on orders → CiliumNetworkPolicy (egress), AuthorizationPolicy (when matched with orders’ inbound declaration).
From type: secret on orders-db and ${resources.orders-db.password} in a connection string → ExternalSecret with mixed-content interpolation template, referencing the cluster’s configured secret store.
From replicas: 3 → PodDisruptionBudget with appropriate maxUnavailable.
From the existence of a compiled Deployment → VMServiceScrape for metrics collection.

The developer filled in a values file — it’s YAML with fields, just like Helm. The difference is that the fields describe what the service needs, not what Kubernetes resources to produce. The mapping from intent to infrastructure is the platform’s job.

This is not a distinction between CRDs and Helm charts as technologies. You could build a Helm chart that accepts intent-level values and produces the full infrastructure set. The distinction is between what the values describe: infrastructure (NetworkPolicy rules, ServiceMonitor intervals) or intent (dependencies, secrets, replica counts). A CRD whose schema mirrors Kubernetes resource fields is just a Helm chart with a different API. A CRD whose schema captures developer intent and derives infrastructure from it is a platform.

But even an intent-level Helm chart hits a ceiling that no amount of schema validation removes. Helm renders templates and produces manifests. It cannot reason across services — it has no access to the other service’s spec when generating your NetworkPolicy. It cannot re-derive your infrastructure when a platform-wide policy changes, because it only runs at install or upgrade time. It cannot evaluate policy at compile time, because it has no compile time. These are not limitations of a bad chart. They are limitations of templating. A derivation pipeline written in a programming language can do all of these things because it holds the full state of the fleet in memory during compilation.

This doesn’t mean Helm has no role. Helm is excellent at last-mile rendering — taking a set of decisions and producing YAML — and that role doesn’t conflict with the derivation model. The platform derives intent into infrastructure decisions; Helm (or kustomize, or plain manifests) can handle the final packaging if that’s useful. A Helm chart can even create and manage the CRDs themselves. They are different layers, not competing approaches. The mistake is building the orchestration logic inside the templating layer, where it doesn’t have the power to do the job.

2.3 Why Intent Enables Automatic Features

When the developer describes infrastructure, every capability is a field they must set. Network policies require networkPolicy.enabled: true. Observability requires serviceMonitor.enabled: true. Each field is an opt-in. Forgetting one means the capability is absent.

When the developer describes intent, capabilities are derived from the intent — not from explicit fields. The developer doesn’t enable network policies. They declare a dependency, and the platform produces network policies as a consequence. The developer doesn’t enable observability. They declare a service, and the platform produces a scrape target as a consequence. There is no enabled flag because there is no opt-in. The capabilities are properties of the platform, not parameters of the spec.

This is what makes platform features automatic. The developer can’t forget to enable security because security isn’t an option. They can’t misconfigure the network policy because they don’t write one. The platform derives the correct policy from the dependency declaration, using its own policy model, its own knowledge of other services, and its own understanding of the network enforcement stack.

The key capability this unlocks — which no amount of infrastructure-level configuration can replicate — is cross-service reasoning. A Helm chart templates one service at a time. It sees one values file. It cannot produce a bilateral network policy that requires both the caller and callee to agree, because it only sees one side.

A platform that processes intent-level specs sees the full graph. Service A declares orders: type: service, direction: outbound. Service B declares checkout: type: service, direction: inbound. The platform matches these declarations across service boundaries and produces the correct policies on both sides. If one side hasn’t declared, no policy is generated — default-deny blocks the traffic. This cross-service policy derivation is impossible when the developer is writing infrastructure directly, because infrastructure descriptions are per-service.

The honest cost: bilateral agreements create a deploy-time cross-team dependency. You ship your service, and nothing works until the other team adds their side of the declaration. That’s a Slack message, a PR, maybe a 2-hour wait. It’s annoying. But in any environment with real compliance requirements, that coordination was going to happen anyway — through a security review ticket instead of a one-line YAML change. The bilateral model makes the coordination explicit and fast. Chapter 13 covers this friction in depth.

Circular dependencies (A depends on B, B depends on A) are valid in this model — both sides declare both directions, and the platform produces policies for each. The converged output is ordering-independent: running the derivation twice produces the same result regardless of the order services are processed. The graph doesn’t need to be acyclic because the platform isn’t traversing a build dependency chain — it’s matching bilateral declarations. But the convergence path is ordering-dependent: if Service A is reconciled before Service B exists, A’s status shows “no matching inbound declaration found” until B is processed. The system is eventually consistent — intermediate states during partial reconciliation may show unmatched dependencies that resolve on the next cycle.

2.4 The Reconciliation Loop

Whether you call it compilation, derivation, or transformation, the platform takes intent and produces infrastructure. The mechanism that makes this continuous rather than one-shot is the Kubernetes reconciliation loop.

The controller watches CRD specs and the resources it has produced. When a spec changes, it re-derives the infrastructure. When a produced resource drifts from the desired state, it corrects it. The loop runs on events, on schedule, and on errors.

The pattern has six steps:

1. Observe. Read the CRD spec and the current state of the cluster.

2. Derive desired state. From the spec, produce the full set of infrastructure resources that should exist.

3. Diff. Compare desired state to current state.

4. Apply. Make one change to move current toward desired.

5. Update status. Report the result to the CRD’s status subresource.

6. Requeue. Schedule the next reconciliation.

In pseudocode:

function reconcile(service_crd):
    spec = service_crd.spec

    // Phase 1: Authorization
    for gate in [DeployImage, AccessSecret, OverrideSecurity, ...]:
        result = policy_engine.evaluate(gate, spec)
        if result == DENY:
            set_status(service_crd, Failed, gate.denial_reason)
            return requeue_after(60s)

    // Phase 2: Derive infrastructure from intent
    desired_resources = []
    desired_resources.append(derive_deployment(spec))
    desired_resources.append(derive_service(spec))
    desired_resources.append(derive_network_policies(spec, all_services))  // indexed cache, not API list
    desired_resources.append(derive_external_secrets(spec))
    desired_resources.append(derive_scrape_target(spec))
    desired_resources.append(derive_pdb(spec))

    // Phase 3: Apply
    current_resources = get_owned_resources(service_crd)
    diff = compute_diff(current_resources, desired_resources)
    apply_one_change(diff)

    // Phase 4: Status
    set_status(service_crd, Ready, compiled_count=len(desired_resources))
    return requeue_after(60s)

Note the all_services parameter in derive_network_policies. The controller watches all LatticeService CRDs — not just the one being reconciled — because bilateral network agreements require matching declarations across services. A change to service B’s spec can trigger re-derivation of service A’s network policies. The reconciliation model is graph-based, not per-resource.

A performance note: all_services is not a live API server query on every reconciliation. Real controllers use indexed in-memory caches (Kubernetes informers or equivalent) that mirror the API server’s state via watch streams. The bilateral matching requires two custom indexes: a forward index (what does this service depend on?) for O(1) dependency lookups during derivation, and a reverse index (which services depend on this one?) so that a change to service B’s spec triggers re-derivation of the services that reference B — not every service in the cluster. Without the reverse index, every spec change triggers fleet-wide reconciliation. With it, only affected services are re-derived. At 500 services, bilateral matching is sub-millisecond. At 5,000+, cache memory and watch fan-out become real concerns — but the derivation itself remains fast.

This continuous loop is what separates a platform from a pipeline. A CI/CD pipeline derives infrastructure from intent at deploy time — once. Between deploys, drift is invisible. The reconciliation loop derives continuously. If someone deletes a NetworkPolicy, the next reconciliation recreates it. If the platform team changes the policy model, every service’s infrastructure is re-derived on the next cycle. The platform’s guarantees are not point-in-time — they hold continuously.

In practice, people do sometimes edit derived resources directly during debugging or emergencies. The reconciliation loop reverts these edits on the next cycle. This is deliberate: the CRD spec is the source of truth, and derived resources are always regenerated from it. The platform should provide adequate debugging tools (detailed status, a CLI that shows derived output) so that direct editing is unnecessary for diagnosis.

2.5 The Intent/Infrastructure Contract

The separation of intent from infrastructure creates a contract between the platform team and the application teams.

The developer’s side: Express what your service needs through the CRD spec. Containers, dependencies, secrets, resource requests, replica count. These are things only you know.

The platform’s side: Derive the correct infrastructure from the developer’s intent. Network policies, authorization policies, external secrets, scrape targets, disruption budgets. These are things the platform knows how to produce for the current tool stack, the current policy model, and the current cluster configuration.

The contract’s power is that intent is stable across implementation changes. The platform team can upgrade Cilium, switch metrics backends, adopt new compliance requirements, add enforcement layers — and the developer’s spec doesn’t change. The spec says “I depend on orders and I read orders-db.” That statement was true a year ago, it’s true today, and it will be true next year. What the platform derives from it — which network policy format, which secret sync mechanism, which compliance controls — changes as the platform evolves. The developer’s declaration of intent survives all of it.

This is the property that makes a platform maintainable at scale. When NIST 800-53 adds a new control, the platform team updates the derivation logic to produce the required resources. Two hundred services are re-derived on the next reconciliation cycle. No developer files a ticket. No spec is modified. The compliance requirement changed; the intent didn’t.

One caveat: this works as long as the CRD schema itself is stable. When the schema changes — v1alpha1 to v1beta1 — every spec must be migrated. The platform can automate this (conversion webhooks, migration controllers), but schema evolution is a real cost. Chapter 23 covers this honestly.

Consider a concrete migration. The platform team switches from VictoriaMetrics to Prometheus. In an infrastructure-description model, every service’s values file references VictoriaMetrics-specific resources (VMServiceScrape). Every team must update to Prometheus-specific resources (ServiceMonitor). This is a fleet-wide migration.

In an intent-description model, no service’s spec mentions the metrics backend. The platform team updates the derivation logic: produce ServiceMonitor instead of VMServiceScrape. On the next reconciliation, every service’s infrastructure is re-derived with the new resource type. No developer changes anything. Most developers never know it happened.

To be honest about the limits: this covers the derived resource migration. A full metrics backend migration also involves query language changes, alerting rule migration, and dashboard updates. The intent model eliminates one category of migration cost (updating every service’s spec) while leaving others. This is a genuine advantage, not a complete solution.

2.6 The Costs

The intent-based model has real costs.

You must build the derivation logic. This is a controller that watches CRDs, evaluates policies, resolves secrets, derives network policies across the service graph, and manages resource lifecycle. It is a stateful, distributed program running in production — comparable in scope to building a production CI/CD system. It needs testing, observability, and operational discipline. Chapter 21 covers this.

You must design the intent language. The CRD schema must be expressive enough to capture real workloads and constrained enough that the platform can derive correct infrastructure from any valid spec. This is harder than designing an infrastructure-description schema, because the derivation must handle every combination of fields the schema permits. Chapter 4 covers CRD design.

You own the derived output — and its blast radius. When the platform produces a broken NetworkPolicy, that’s the platform team’s bug — not the developer’s misconfiguration. A common objection: “a Helm chart bug only affects teams that pull that version — a controller bug affects everyone at once.” This is true if you treat the controller like a Helm chart. You shouldn’t. The controller owns the reconciliation loop, so it can implement canary rollouts natively: re-derive 5% of services, verify health metrics, expand to 100%. This is strictly better than Helm’s accidental blast radius limiter (version adoption lag), because it’s intentional and observable. But you have to build it — and until you do, the default posture is full blast radius. A naive controller reconciles every matching CRD on every cycle; a bug in the derivation logic hits every service in the cluster on the next reconciliation. This is a genuine trade-off that the platform team must address early, not defer as an optimization. The canary mechanism is not a nice-to-have — it is load-bearing infrastructure that makes centralized derivation safe at scale. Without it, you’ve traded Helm’s distributed failure modes for a centralized one that’s faster and more severe. Chapter 21 covers the implementation.

Escape hatches are harder. When a developer needs infrastructure the platform doesn’t derive from intent — a custom volume, a host-network pod, a non-standard probe — they can’t just add a field. They need an escape hatch mechanism (Chapter 11) or a schema extension.

The controller must perform — and must be operated. This is production software running on the critical path. It needs its own monitoring (reconciliation latency, queue depth, error rate), its own alerting, its own upgrade strategy, and its own incident response. When it breaks at 3 AM, there is no community Slack channel — your team owns it. This is the operational cost of building custom infrastructure, and it’s real. Helm + ArgoCD are maintained by large open-source communities. Your controller is maintained by your team. Platform engineers are software engineers; the controller is their production system, and they must operate it like one.

The trade-off: The intent model moves complexity from many teams to one team. It doesn’t reduce total complexity — it concentrates it where it can be solved with software engineering instead of process. The developer’s experience is simpler. The platform team’s operational surface is larger. Whether this is worth it depends on scale (more services = more value from consistent derivation), risk tolerance (regulated industries pay more for the infrastructure-description model’s gaps), and platform team capability (you need engineers who can build and operate production software, not just configure tools).

2.7 Looking Ahead

The distinction between intent and infrastructure raises a question: what should the derivation logic be written in? The developer’s spec is YAML — that’s appropriate for declaring intent. But the logic that reads the spec, evaluates policies, resolves secrets, and produces Kubernetes resources is not data. It’s code. Chapter 3 makes the case for writing it as such.

Exercises

2.1. [M10] A Helm chart includes a NetworkPolicy with networkPolicy.enabled: true as the default. A platform derives a NetworkPolicy from the service’s dependency declarations. Describe a scenario where the Helm approach produces a correct NetworkPolicy and the platform produces an incorrect one. What does this tell you about where bugs live in each model?

2.2. [H30] This chapter argues that a CRD spec is equivalent to a values file — both are YAML with fields that the developer fills in. Identify the precise property that makes one an “intent description” and the other an “infrastructure description.” Is this property inherent in the technology (CRDs vs Helm), or could you build an intent-based platform on Helm? If so, what would it look like? What would you lose compared to CRDs?

2.3. [R] Section 2.4 describes continuous reconciliation as the mechanism that makes the platform’s guarantees hold over time, not just at deployment. Consider an alternative: the derivation runs once per spec change, and a separate GitOps tool detects and corrects drift. Under what conditions is this equivalent to continuous reconciliation? Under what conditions does it fail? Is the reconciliation loop essential to the intent model, or is it an optimization?

2.4. [H30] Section 2.5 argues that the intent/infrastructure contract allows the platform to evolve independently of its users. Consider a platform that changes its network policy model from unilateral to bilateral. The CRD schema doesn’t change — direction: outbound means the same thing. But the derived behavior changes: now both sides must declare for traffic to flow. Is this a breaking change? The schema is the same. The semantics are different. How should the platform handle this transition?

2.5. [R] The chapter argues that cross-service reasoning (bilateral agreements, dependency graph) is the key capability enabled by intent-based specs. Critique this claim. Could you achieve cross-service reasoning with infrastructure-level descriptions? For example: a controller that watches all NetworkPolicies in the cluster and validates that egress and ingress rules are consistent. Would this provide the same guarantees as deriving both from intent? What are the differences?

2.6. [H30] Section 2.6 states that the platform team “owns the derived output.” Model this concretely: the platform derives network policies for 300 services. A bug in the derivation logic produces an egress rule that’s too broad — it allows traffic to services the developer didn’t declare as dependencies. Describe the blast radius and recovery path. Compare this to a similar bug in a Helm chart used by 300 teams. In which model is the bug more dangerous? In which is it faster to fix?

2.7. [M10] The reconciliation loop applies one change per cycle. Why not apply all changes at once? Construct a scenario where applying all changes simultaneously produces a worse outcome than applying them incrementally.

2.8. [R] This chapter claims the distinction between intent and infrastructure is what matters — not the distinction between CRDs and Helm, or controllers and templates. Test this claim against: (a) a Helm chart whose values file contains only intent-level fields (image, replicas, dependency names) and whose templates derive all infrastructure, (b) a CRD whose spec mirrors Kubernetes Deployment fields one-to-one, (c) Crossplane Compositions. Which of these are intent-based? Which are infrastructure-based? Does the technology determine the abstraction level, or does the schema design?