Skip to content

Chapter 3: Code Over Configuration

Chapter 2 established the distinction between intent and infrastructure — the developer declares what they need, the platform derives the rest. The question that follows: what should the platform itself be built from?

The Kubernetes ecosystem’s default answer is configuration. Helm charts are Go templates over YAML. Kustomize is YAML patches over YAML. Jsonnet is a configuration language that produces JSON (which is YAML-compatible). Even CRD controllers are often generated from YAML-like specifications using code generators. The ecosystem has a deep bias toward configuration languages and away from general-purpose programming.

This chapter argues that bias is wrong. Your platform should be a program — written in a real programming language, with types, tests, abstractions, and the full power of a mature software engineering ecosystem. Configuration languages feel simpler up front. Once the logic gets complex — and derivation logic always gets complex — they cost more to maintain, test, and evolve than a real programming language.

This is not an argument against YAML for the developer-facing spec. The CRD schema (Chapter 4) should absolutely be YAML — it’s the right format for declaring intent. The argument is about what processes that YAML: should the derivation logic, the policy evaluation, the secret routing, the network policy generation be written in a configuration language or a programming language?

Section titled “3.1 Why Configuration Languages Are Popular”

Configuration languages are popular because they’re genuinely easy to start with. A Helm chart template is a YAML file with some {{ }} expressions. A Kustomize overlay is a YAML patch. A Jsonnet file is a JSON-like structure with functions. The learning curve is shallow. The first 20 lines are easy to write.

The problems emerge at line 200.

No type system. A Helm chart has no types. .Values.replicas might be an integer, a string, or absent. The template doesn’t know until it renders. A typo in a value name — .Values.replicsa — produces a silent empty string, not a compile error. The bug manifests at deploy time (or later, when someone notices the Deployment has 0 replicas), not at authoring time.

No testing. You can test a Helm chart by rendering it and checking the output. This is integration testing — you’re testing the template engine, the values, and the template together. You cannot unit test a single function in a Helm chart because Helm charts don’t have typed, testable functions. A complex conditional in a template — “if this value is set and that value is not set and the other value matches this regex” — is either correct or it isn’t, and the only way to know is to render every combination of values.

No abstraction. Helm charts have _helpers.tpl for shared template fragments. This is string concatenation, not abstraction. You can pass data to a fragment via include with dict arguments, but the mechanism is stringly-typed — no compile-time validation, no composable interfaces. You can’t build a library of reusable logic that multiple charts depend on with versioned interfaces. Every chart is an island.

No refactoring. Rename a value in a Helm chart and you must find every reference in every template file by text search. There is no IDE support for “rename symbol.” There is no compiler that tells you which references you missed. A Kustomize base that changes a field name silently breaks every overlay that references the old name.

No error handling. A Helm template that encounters an unexpected value can {{ fail "message" }} — but this is a render-time error, not a type error. The error message is a string the template author wrote. There is no stack trace, no structured error type, no way to catch the error and handle it differently in different contexts.

These limitations are manageable for small, simple templates. They become engineering liabilities for platform-scale derivation logic that handles policy evaluation, cross-service graph traversal, secret routing through five different paths, and compliance verification.

No state. A Helm template is a pure function of its values — it can’t query external systems during rendering. A controller written in code can call an IPAM to allocate an IP, query a metrics backend for right-sizing data, or check an asset inventory during reconciliation. This isn’t a theoretical advantage — it’s what makes features like derivation-time quota checking (Chapter 10) and cross-cluster bilateral matching (Chapter 13) possible. Templates can’t do this because they have no concept of “the current state of the world.”

A platform’s derivation logic written in a general-purpose language (Rust, Go, TypeScript, Python — the choice matters less than the decision to use one) provides:

A type system. The CRD spec is a typed struct. replicas is a u32, not a maybe-integer maybe-string maybe-absent value. A dependency is a ResourceSpec with a ResourceType enum (Service, Secret, Volume, ExternalService, Gpu). The compiler rejects invalid field access at build time. A typo in a field name is a compile error, not a silent empty value.

pub struct LatticeServiceSpec {
pub workload: WorkloadSpec,
pub replicas: u32,
pub autoscaling: Option<AutoscalingSpec>,
// ...
}
pub enum ResourceType {
Service,
Secret,
Volume,
ExternalService,
Gpu,
}

When you write derivation logic against these types, the compiler guarantees you’ve handled every variant. A match on ResourceType that doesn’t cover Gpu is a compile error. A function that expects an AutoscalingSpec can’t be called with a String. The type system catches an entire class of bugs that configuration languages defer to runtime.

Unit tests. The derivation logic is a function: given this spec, produce these resources. You can test each derivation phase independently. “Given a service with an outbound dependency on payments, the network policy derivation produces a CiliumNetworkPolicy with egress to payments’ label selector.” This is a unit test. It runs in milliseconds, with no cluster, no API server, no deployment.

#[test]
fn outbound_dependency_produces_egress_policy() {
let spec = service_spec_with_dependency("payments", Direction::Outbound);
let policies = derive_network_policies(&spec, &all_services);
assert!(policies.iter().any(|p| p.is_egress_to("payments")));
}

You can’t write this test in Helm. You can test Helm’s rendered output, but you can’t isolate the network policy logic from the template rendering engine, from the values parsing, from the YAML serialization. In a programming language, you test the logic directly.

Abstraction and reuse. The shared WorkloadSpec type is used by LatticeService, LatticeJob, and LatticeModel. The container derivation logic, the secret resolution logic, and the resource validation logic are shared functions called by all three CRD controllers. A bug fix in the shared secret resolution code fixes it for services, jobs, and models simultaneously. In Helm, each chart has its own copy of similar-but-not-identical template logic. A bug fix in one chart doesn’t propagate to the others.

Refactoring. Rename a field in the WorkloadSpec struct and the compiler tells you every file that references the old name. The IDE finds all usages. The rename is atomic — either every reference is updated or the build fails. In configuration languages, renaming is a text search with hope.

Error handling. The derivation logic can return structured errors: PolicyDenied { gate: AccessSecret, resource: "payments/api-key", reason: "no permit policy found" }. The controller catches this error, writes it to the CRD status, and the developer sees a specific, actionable message. In Helm, the error is {{ fail "something went wrong" }}.

Ecosystem. A programming language gives you package managers, dependency versioning, linting, formatting, documentation generators, benchmarking, profiling, and the entire tooling ecosystem the language community has built over decades. Configuration languages have a fraction of this.

This argument applies to validation as well as derivation. Many teams keep validation in YAML-based tools (Kyverno policies, OPA Rego) because the syntax feels lighter. For simple rules (“every Deployment must have resource limits”), this works. For complex validation (“this service’s secret references must match existing SecretProvider backends, and the referenced keys must exist in the backend, and the Cedar policy must permit access”), the validation logic is as complex as the derivation logic — and benefits from the same treatment: types, tests, structured errors. The reference implementation’s admission webhook is code, not Kyverno policies, for this reason.

The historical objection to “write it in a real language” was the learning curve. Not every infrastructure engineer is a software engineer. Helm templates are accessible to someone who knows YAML and can learn a few {{ }} constructs. Writing a Kubernetes controller in Rust or Go requires understanding async programming, API clients, reconciliation patterns, and error handling.

This objection has weakened significantly. AI coding tools — language models that generate, explain, and refactor code — have reduced the effective learning curve for general-purpose languages. An infrastructure engineer who has never written Rust can describe what they want (“a function that takes a service spec and produces a CiliumNetworkPolicy with egress rules for each outbound dependency”) and get working code that they review, test, and iterate on.

This is not a claim that AI replaces understanding. The engineer must still understand the reconciliation model, the Kubernetes API, and the platform’s derivation logic. What AI tools eliminate is the syntax barrier — the gap between knowing what you want to express and knowing the language mechanics to express it. That gap was the primary argument for configuration languages, and it’s closing fast.

The honest objection. Writing code is 10% syntax and 90% operational context. An AI can generate a Rust function, but it won’t debug the Arc<RwLock<T>> deadlock at 3 AM or diagnose why the async runtime panicked under load. You are effectively asking infrastructure engineers to become software engineers — and many will resist or struggle. This is a real cultural and skill-gap cost. The mitigation is not “AI will handle it” — it’s team composition. The platform team needs at least one strong software engineer who owns the controller codebase. The rest of the team can contribute through AI-assisted development, code review, and testing. But someone must understand the runtime deeply. If nobody on the team can debug the controller under pressure, the platform is a liability, not an asset.

More importantly, AI tools are better at assisting with typed, structured code than with configuration languages. An AI can understand a Rust struct, generate correct field access, suggest match arms for an enum, and write tests against typed interfaces. It struggles with Helm templates because the template language has no types, no structure, and no semantic model for the AI to reason against. The tools that make programming languages accessible work because programming languages have the properties that configuration languages lack.

The practical consequence: the upfront investment in writing platform logic as code — rather than configuration — is smaller than it used to be and shrinking. The long-term return — types, tests, refactoring, abstraction, AI assistance — is larger than it has ever been.

3.4 Configuration Languages Have Their Place

Section titled “3.4 Configuration Languages Have Their Place”

This is not an argument against all configuration. It’s an argument about where the boundary should be.

The developer-facing spec should be YAML. The CRD schema is a configuration interface. Developers declare intent in YAML because YAML is universally understood, requires no build step, and can be written in any editor. This is the right format for the input. Chapter 4 covers how to design it.

Platform configuration should be YAML (or similar). Policy rules, quota definitions, secret backend configuration, cluster parameters — these are platform team configurations that change infrequently and benefit from the simplicity of declarative formats. A Cedar policy file, a YAML-based quota definition, or a ConfigMap with cluster settings are all appropriate uses of configuration.

The derivation logic should be code. The logic that reads a CRD spec, evaluates policies, resolves secrets, derives network policies across the service graph, and produces Kubernetes resources — this should be a program. It has conditional logic, error handling, cross-resource reasoning, and compositional structure that configuration languages handle poorly and programming languages handle well.

The boundary: data is configuration, logic is code. The developer’s intent is data — declare it in YAML. The platform’s opinions are data — declare them in policy files and config maps. The transformation from intent to infrastructure is logic — write it in a programming language.

An honest tension: Cedar policies have when clauses, attribute matching, and conditional logic. They’re logic expressed as configuration — exactly what this chapter argues against for derivation code. The distinction is scope: Cedar policies express focused authorization decisions (permit/forbid for a specific resource). The derivation pipeline expresses complex multi-step transformations across the service graph. Policy languages are appropriate for the former because the logic is bounded and domain-specific. General-purpose languages are required for the latter because the logic is unbounded and compositional. The boundary isn’t “all logic in code” — it’s “use the right language for the complexity of the logic.”

The book’s reference implementation uses Rust. This is not a prescription. The principles of this chapter — types, tests, abstraction, refactoring — apply in any general-purpose language. The choice depends on your team’s expertise, your performance requirements, and your ecosystem preferences.

Go is the Kubernetes ecosystem’s lingua franca. controller-runtime and client-go are mature, well-documented, and widely used. Most existing Kubernetes controllers are written in Go. If your team already writes Go, this is the path of least resistance.

Rust offers stronger type safety (no nil panics, exhaustive pattern matching, ownership-based memory safety) and better performance (no garbage collector). The kube-rs library provides a controller runtime comparable to controller-runtime in Go. The trade-off is a steeper learning curve and a smaller ecosystem of Kubernetes-specific libraries.

TypeScript (via the Kubernetes JavaScript client and frameworks like cdk8s) is accessible and fast to prototype with. The type system is weaker than Rust’s or Go’s, but far stronger than any configuration language. Suitable for platforms that prioritize developer velocity over runtime performance.

Python (via kopf or the Kubernetes Python client) has the lowest learning curve. It lacks the type safety and performance of compiled languages but is adequate for smaller platforms where development speed matters more than runtime characteristics.

The choice of language is less important than the choice to use a language. A platform controller written in any of these is more testable, more maintainable, and more correct than the equivalent logic expressed in Helm templates or Kustomize overlays.

Part I’s foundation is now:

  • Chapter 1: The platform problem — developers shouldn’t assemble infrastructure from primitives.
  • Chapter 2: Intent over infrastructure — the developer declares what they need, the platform derives the rest, and the intent survives compliance and tooling changes.
  • Chapter 3: Code over configuration — the derivation logic should be a program, not a template, because programs have types, tests, abstractions, and AI-assisted development.
  • Chapter 4: Designing the API surface — how to design the CRD schema that captures developer intent.

Chapter 4 takes on the CRD schema — the YAML interface that developers write against. The schema is configuration (appropriately). The logic that processes it is code (also appropriately). The boundary between them is the subject of the next chapter.

3.1. [M10] A Helm chart has a conditional: if .Values.networkPolicy.enabled is true, render a NetworkPolicy; if .Values.networkPolicy.egress is set, add egress rules; if .Values.networkPolicy.egressPorts is also set, use port-specific rules instead of the default. Write the equivalent logic as a typed function in pseudocode (or any language). Compare the two: which one makes the edge cases visible? Which one can be unit tested? Which one will an AI tool generate more reliably?

3.2. [H30] An organization has 50 Helm charts, each with 100-200 lines of template logic. They want to migrate to a code-based platform. Estimate the migration effort. Consider: how much template logic is duplicated across charts? How much is chart-specific? What is the testing burden for the Helm charts today vs. the code-based approach? At what point does the migration investment pay for itself in reduced maintenance?

3.3. [R] This chapter argues that AI tools are better at assisting with typed code than with configuration languages. Test this claim empirically. Take a complex Helm chart template (one with nested conditionals and cross-value dependencies) and the equivalent logic as a typed function. Ask an AI tool to (a) explain the logic, (b) add a new feature, (c) find a bug you introduce. Compare the quality of the AI’s assistance in each case. Does the type system help?

3.4. [H30] Section 3.4 draws the boundary: “data is configuration, logic is code.” Identify a case where this boundary is ambiguous. Cedar policies are data (policy rules) that encode logic (authorization decisions). Kyverno policies are YAML that express conditional logic. Are these configuration or code? Where exactly does the boundary lie, and what happens when you put logic into a configuration format?

3.5. [R] A team objects: “We’re infrastructure engineers, not software engineers. We know YAML and Bash. Asking us to write Rust or Go is unrealistic.” Evaluate this objection. Is it a permanent constraint or a transitional one? How does AI-assisted development change the calculus? What is the minimum programming language competence required to maintain a platform controller, and how does it compare to the competence required to maintain 50 Helm charts with 200 values each?

3.6. [M10] The chapter claims that renaming a field in a Helm chart requires a text search, while renaming a field in a typed struct gives you compiler errors. Construct a scenario where the Helm approach is actually safer — where the compiler-error approach introduces a risk that the text-search approach avoids.