Chapter 9: Secrets

Every service needs secrets — database passwords, API keys, registry credentials. The standard approach on Kubernetes is runtime fetching: Vault sidecars, init containers, SDK calls. The application starts, fetches its secrets, and proceeds.

This works. But it has three problems that a platform should solve:

Coupling. The application knows about Vault (or AWS Secrets Manager, or the sidecar filesystem). Changing the secret backend means changing every application.

Startup ordering. The application starts before the secret is available. The sidecar hasn’t authenticated yet. The init container hasn’t finished. The application crash-loops until the secret arrives.

Authorization opacity. Who authorized this application to read this secret? With runtime fetching, the answer is “whoever configured the Vault role” or “whatever IAM policy the service account has.” These are separate authorization systems from the platform’s Cedar model.

The reference implementation eliminates all three: secrets are declared in the service spec, authorized at derivation time (Cedar AccessSecret), and resolved into standard Kubernetes resources before the pod starts. The application receives secrets through environment variables and file mounts — the same mechanisms it would use for non-secret configuration. It never knows how the secrets got there. The decisions — decouple applications from secret backends, resolve before startup, authorize through the platform’s policy engine — are universal. The specific routing paths and tools shown here are one valid implementation.

Here’s what the developer writes:

workload:
  containers:
    api:
      variables:
        STRIPE_KEY: "${resources.stripe.api-key}"
        DATABASE_URL: "postgres://app:${resources.orders-db.password}@db:5432/orders"
      files:
        /etc/app/config.yaml:
          content: |
            database:
              host: db.internal
              password: "${resources.orders-db.password}"
  resources:
    stripe:
      type: secret
      params:
        keys: [api-key]
    orders-db:
      type: secret
      params:
        keys: [password]

Three secret references, two secret declarations. The developer didn’t specify Vault paths, AWS ARNs, or which backend to use. The platform resolves all of it.

This chapter covers the design decisions behind this approach: how to choose between runtime and derivation-time resolution, how to handle the different ways secrets are consumed, how to abstract the backend, and how to handle rotation.

9.1 The Decision: Runtime Fetching or Derivation-Time Resolution?

This is the foundational choice. It determines whether the application is coupled to the secret backend.

Runtime fetching. The application (or a sidecar) calls the secret backend at startup. Vault Agent injects a sidecar that writes secrets to a shared volume. The CSI Secrets Store Driver mounts secrets directly from the backend. The application imports the Vault SDK and fetches at startup.

The advantage: secrets never pass through Kubernetes. No Secret object in etcd. No RBAC risk from Secret read access. The secret travels directly from the backend to the pod.

The disadvantage: the application knows about the backend. A Vault-to-AWS migration touches every application. Startup ordering is the application’s problem. Authorization happens at the backend layer (Vault ACLs), not at the platform layer (Cedar).

Derivation-time resolution. The platform resolves secret references at derivation time. The pipeline produces ExternalSecret resources. ESO syncs the secret from the backend into a Kubernetes Secret. The pod mounts the Secret through standard env vars or file mounts.

The advantage: the application is decoupled from the backend. The developer writes ${resources.orders-db.password} and the platform handles the routing. Backend migration is a platform-team change, not an application change. Authorization is at the platform layer (Cedar AccessSecret). Startup ordering is solved — the pipeline waits for ESO to sync before creating the Deployment (Chapter 8, Section 8.4).

The disadvantage: the secret value exists as a Kubernetes Secret in etcd. Anyone with RBAC read access to Secrets in the namespace can read it. Mitigation: enable KMS encryption at rest for etcd — cloud providers (EKS, GKE) handle this automatically; on-premise deployments need an explicit KMS provider configuration. For most organizations, encrypted etcd is sufficient. For organizations that can’t accept secrets in etcd under any circumstances, the runtime approach (CSI Secrets Store Driver) is necessary.

The reference implementation uses derivation-time resolution. The trade-off (secrets in etcd) is acceptable for the gains (decoupling, authorization, startup ordering). The platform should support CSI as an alternative path for organizations that need it.

9.2 The Decision: How to Handle Different Consumption Patterns

Secrets are consumed in different ways: as environment variables, as interpolated strings, as files, as image pull credentials. Each consumption pattern produces different Kubernetes resources. A platform that routes all secrets through one mechanism produces a lowest-common-denominator implementation that handles no case well.

The reference implementation routes through five paths based on how the secret is used:

Path 1: Pure secret variable. STRIPE_KEY: "${resources.stripe.api-key}" — the entire value is a single secret reference. Output: a secretKeyRef in the container’s env block. No ESO template needed — ESO syncs the key directly, kubelet injects it.

Path 2: Mixed-content variable. DATABASE_URL: "postgres://app:${resources.orders-db.password}@db:5432/orders" — secret embedded in a larger string. Output: an ExternalSecret with a Go template that interpolates the secret into the static content. This is the most common path — connection strings almost always mix secrets with non-secrets.

Path 3: File mount. A config file with embedded secrets:

files:
  /etc/app/config.yaml:
    content: |
      database:
        host: db.internal
        password: "${resources.orders-db.password}"
      cache:
        token: "${resources.redis.auth-token}"

Output: an ExternalSecret with target.template.data that renders the file content with secrets interpolated. The resulting Secret is mounted as a file. This handles both pure-secret files and mixed-content files — the pipeline detects ${resources.*} references and routes accordingly.

This path matters because real applications have config files that mix static settings with secrets. Without it, developers split their config into secret and non-secret files — awkward and error-prone.

Path 4: Image pull secrets. A secret resource with secretType: kubernetes.io/dockerconfigjson:

workload:
  resources:
    registry-creds:
      type: secret
      params:
        secretType: kubernetes.io/dockerconfigjson

Output: an ExternalSecret that produces a Docker config JSON Secret, injected into the pod’s imagePullSecrets. The developer declares the secret; the pipeline handles the pod spec injection. The secretType field is unique to this path — it tells the pipeline to produce a typed Kubernetes Secret rather than the default Opaque type.

Path 5: Bulk extraction. A secret with no explicit keys — meaning “sync all keys.” Output: an ExternalSecret with dataFrom.extract. ESO syncs every key from the backend entry. Used when key names aren’t known at derivation time (dynamically generated credentials, rotating keys).

graph TD
    Ref["Secret reference in spec<br/>${resources.*.key}"] --> P1[Path 1: Pure env var<br/>secretKeyRef]
    Ref --> P2[Path 2: Mixed-content env var<br/>ESO Go template]
    Ref --> P3[Path 3: File mount<br/>ESO template → volume mount]
    Ref --> P4[Path 4: imagePullSecrets<br/>dockerconfigjson Secret]
    Ref --> P5[Path 5: Bulk extraction<br/>dataFrom.extract]

The design principle: the consumption pattern determines the routing path, not the developer’s configuration. The developer writes STRIPE_KEY: "${resources.stripe.api-key}" and the pipeline detects it’s a pure secret variable (Path 1). The developer writes DATABASE_URL: "postgres://...${resources.orders-db.password}..." and the pipeline detects it’s mixed content (Path 2). The routing decision is derived from the intent, not configured separately.

These five paths are the reference implementation’s answer to ESO’s capabilities. Your platform may have different paths — if you use the CSI Secrets Store Driver instead of ESO, the routing is different (mount-based, no ExternalSecret objects). The transferable principle is: identify how your developers consume secrets, and build a routing path for each consumption pattern. If you only support env vars, you need one path. If you support env vars, files, and image pull credentials, you need three. The paths mirror the consumption surface.

9.3 The Decision: How to Abstract the Backend

The developer writes ${resources.orders-db.password}. The platform must resolve this to a real secret value from a real backend. The developer shouldn’t know which backend.

The reference implementation uses a SecretProvider CRD:

apiVersion: lattice.dev/v1alpha1
kind: SecretProvider
metadata:
  name: vault-production
spec:
  backend: vault
  config:
    server: https://vault.internal:8200
    path: secret/data
    authMethod: kubernetes

The platform team creates one SecretProvider per backend per environment. The SecretProvider controller reconciles this CRD into an ESO ClusterSecretStore — the platform’s abstraction over ESO’s backend configuration. The pipeline resolves each secret resource against the configured ClusterSecretStore. If the platform team migrates from Vault to AWS Secrets Manager, they update the SecretProvider. No service spec changes.

This is the intent/infrastructure separation from Part I applied to secrets. The developer’s intent is “I need the orders-db password.” The infrastructure detail is “it’s in Vault at secret/data/orders-db.” The intent survives the backend migration — as long as the key names are preserved (if Vault uses password and AWS uses db-password, the developer’s ${resources.orders-db.password} reference breaks).

The platform team must ensure key name compatibility during migration, or provide a key mapping layer in the SecretProvider. This is an infrastructure concern, not a developer concern.

9.4 Rotation

When a secret rotates in the backend, the platform propagates the change automatically.

ExternalSecrets have a refreshInterval. ESO re-syncs from the backend at this interval. If the value changed, ESO updates the Kubernetes Secret.

For file-mounted Secrets: the kubelet propagates the change to the pod’s filesystem — but not immediately. The propagation delay depends on the kubelet sync period plus the Secret cache TTL (combined default: up to roughly 1-2 minutes). In the worst case, the application reads a stale secret for up to the combined sync-plus-cache window after the Kubernetes Secret was updated. No restart needed, but the application must handle the delay.

For environment variable Secrets: a pod restart is required. Kubernetes doesn’t update env vars on running containers. The pipeline handles this: it computes a config hash from all Secrets and ConfigMaps. When ESO updates a Secret, the hash changes on the next reconciliation. The controller detects the change and triggers a rolling restart.

The developer doesn’t manage rotation. The backend handles it (Vault dynamic secrets, AWS rotation lambdas). The platform propagates it. The application receives the new value through the same mechanism as the original.

The inconsistency between file mounts (no restart) and env vars (restart) is real. The developer’s choice of variables vs files in their spec implicitly determines whether secret rotation causes a restart. The platform should document this — and teams that can’t tolerate restarts for a specific secret should consume it as a file mount.

9.5 What Goes Wrong in Practice

Scenario: the secret backend is unreachable. The developer applies a spec with two secrets. The pipeline passes authorization (Cedar permits both). The pipeline creates the ExternalSecrets. ESO tries to sync from Vault — but Vault is down (maintenance, network issue, authentication failure).

The pipeline polls for ESO sync (2-second interval, 120-second timeout). After 120 seconds, the sync hasn’t completed. The pipeline times out.

What should happen? The pipeline follows the all-or-nothing rule from Chapter 8: it doesn’t create the Deployment without the secrets (that would produce a pod that crash-loops on missing env vars). The status reports: phase: Failed, conditions: [SecretsResolved: False, reason: ExternalSecret checkout-orders-db did not sync within 120s. Check secret backend connectivity.]

The developer sees: their spec failed because the secret backend is unreachable. This is an infrastructure issue, not a spec issue. They contact the platform team. The platform team fixes Vault. On the next reconciliation (60 seconds later), the pipeline retries. ESO syncs successfully. The pipeline proceeds. The Deployment is created. The service reaches Ready.

The hidden dependency: ESO itself. The derivation pipeline depends on ESO to perform the sync. If the ESO controller pods are down (crashed, evicted, OOMKilled), every service that declares secrets stalls at the sync step. ESO’s health is as critical as the platform operator’s health — it’s on the critical path for every deployment. Monitor ESO pod readiness with the same urgency as the platform operator. A cluster where ESO is down is a cluster where no new service can deploy.

Scenario: the wrong secret value. The developer’s spec references ${resources.orders-db.password}. ESO syncs the key from Vault. The value is oldpassword123 — it was rotated in the database but not in Vault. The service starts, connects to the database with the wrong password, and crashes.

The platform can’t catch this. The pipeline verified that the secret exists and the service is authorized to access it. It can’t verify that the secret’s value is correct for the application’s use case. This is the boundary between platform responsibility (secret exists, authorized, synced) and application responsibility (secret value is correct for the application’s purpose).

Exercises

9.1. [M10] A developer writes LOG_LEVEL: "${resources.config.level}". The config resource has type: secret (stored in Vault) but the value is "info" — not sensitive. Should the pipeline route this through the secret path? Should there be a type: config for non-sensitive external values?

9.2. [H30] A file references secrets from two different SecretProviders — one in Vault, one in AWS. A single ExternalSecret can only reference one ClusterSecretStore. How should the pipeline handle this? Reject? Split into two ExternalSecrets and merge? What does the error message look like?

9.3. [R] Environment variable secrets require pod restarts on rotation. File-mounted secrets don’t. The developer doesn’t control which path the pipeline chooses — it’s derived from how they consume the secret. Is this inconsistency a problem? Should the platform always use file mounts? What are the trade-offs?

9.4. [H30] Design the unit test matrix for secret compilation. How many test cases for: each path, mixed paths in one spec, authorization denial for one secret but not others, backend unavailability, and the config hash rollout trigger?

9.5. [R] The backend abstraction says “the intent survives the migration.” A team migrates from Vault (key: password) to AWS (key: db-password). The developer’s ${resources.orders-db.password} breaks. Does the spec survive? What must the platform team ensure? Should the platform provide a key mapping layer?

9.6. [M10] Path 5 (bulk extraction) syncs all keys without knowing them at derivation time. The application expects key password but the secret contains db-password. When is the error discovered? How should the platform report it?