Infrastructure
LatticeCluster
Defines a Kubernetes cluster's infrastructure, node topology, and lifecycle. After provisioning via Cluster API, the cluster pivots to own its own resources and becomes self-managing.
group: lattice.dev version: v1alpha1 scope: cluster
Example
cluster.yaml
apiVersion: lattice.dev/v1alpha1
kind: LatticeCluster
metadata:
name: production
spec:
providerRef: aws-prod
provider:
kubernetes:
version: "1.32.0"
bootstrap: kubeadm
config:
aws:
region: us-west-2
nodes:
controlPlane:
replicas: 3
instanceType: m5.xlarge
workerPools:
general:
replicas: 10
instanceType: m5.2xlarge
gpu:
replicas: 2
instanceType: g5.xlarge
labels:
workload: gpu
taints:
- key: nvidia.com/gpu
effect: NoSchedule
parentConfig: # makes this a parent cluster
host: "172.18.255.1"
grpcPort: 50051
bootstrapPort: 8443
proxyPort: 8081
service:
type: LoadBalancer
services: true # Istio ambient + bilateral agreements
gpu: true # NFD + NVIDIA device plugin + HAMi
monitoring: true # VictoriaMetrics + KEDA
backups: true # Velero backup infrastructure Spec
| Field | Type | Description |
|---|---|---|
providerRef | string | Reference to a CloudProvider resource for credentials. |
provider | ProviderSpec | Infrastructure provider configuration including K8s version, bootstrap provider, and provider-specific config. |
nodes | NodeSpec | Node topology with control plane count and worker pools. |
networking | NetworkingSpec? | Optional network configuration (CIDR pools). |
parentConfig | EndpointsSpec? | Makes this a parent cluster capable of provisioning children. Includes gRPC, bootstrap, and proxy endpoints. |
services | bool | Enable Istio ambient mesh and bilateral service agreements. Default: true. |
gpu | bool | Enable GPU infrastructure (NFD + NVIDIA device plugin + HAMi). GPUs are discovered automatically by NFD from instance types. Default: false. |
monitoring | bool | Enable monitoring infrastructure (VictoriaMetrics + KEDA for autoscaling). Default: true. |
backups | bool | Enable backup infrastructure (Velero). Default: true. |
ProviderSpec
| Field | Type | Description |
|---|---|---|
kubernetes.version | string | Kubernetes version (e.g., "1.32.0"). |
kubernetes.bootstrap | string | Bootstrap provider: kubeadm or rke2. |
config.aws | object? | AWS-specific config: region, instance types, VPC, subnets. |
config.proxmox | object? | Proxmox-specific config: server URL, node, storage. |
config.openstack | object? | OpenStack-specific config: auth URL, network, floating IP pool. |
NodeSpec
| Field | Type | Description |
|---|---|---|
controlPlane | ControlPlaneSpec | Control plane configuration including replica count, instance type, and root volume. |
workerPools | map<string, WorkerPoolSpec> | Named worker pools with independent scaling, instance types, labels, and taints. Keys are pool identifiers (e.g., "general", "gpu"). |
ControlPlaneSpec
| Field | Type | Description |
|---|---|---|
replicas | u32 | Number of control plane nodes. Must be a positive odd number for HA (1, 3, 5). |
instanceType | InstanceType? | Instance type for control plane nodes. |
rootVolume | RootVolume? | Root volume configuration for control plane nodes. |
WorkerPoolSpec
| Field | Type | Description |
|---|---|---|
replicas | u32 | Desired number of worker nodes in this pool. Ignored when autoscaling is enabled. |
displayName | string? | Human-readable display name for the pool. |
instanceType | InstanceType? | Instance type for nodes in this pool. |
rootVolume | RootVolume? | Root volume configuration for nodes in this pool. |
labels | map<string, string> | Labels to apply to nodes in this pool. |
taints | []NodeTaint | Taints to apply to nodes in this pool. |
min | u32? | Minimum replicas for cluster autoscaler. When both min and max are set, autoscaling is enabled. |
max | u32? | Maximum replicas for cluster autoscaler. When both min and max are set, autoscaling is enabled. |
InstanceType
Either a named instance type string (e.g., "m5.xlarge") or an explicit resource specification object for providers without named types (e.g., Proxmox).
| Field | Type | Description |
|---|---|---|
Named: a plain string like "m5.xlarge" | ||
| Or an explicit resource object: | ||
cores | u32 | Number of CPU cores. |
memoryGib | u32 | Memory in GiB. |
diskGib | u32 | Disk size in GiB. |
sockets | u32 | Number of CPU sockets. Default: 1. |
RootVolume
| Field | Type | Description |
|---|---|---|
sizeGb | u32 | Volume size in GB. |
type | string? | Volume type (provider-interpreted, e.g., "gp3", "io1"). |
NodeTaint
| Field | Type | Description |
|---|---|---|
key | string | Taint key. |
value | string? | Taint value. |
effect | string | NoSchedule, PreferNoSchedule, or NoExecute. |
EndpointsSpec
When parentConfig is present, the cluster acts as a parent that can provision and manage child clusters.
| Field | Type | Description |
|---|---|---|
host | string | IP or hostname for child clusters to connect to. |
grpcPort | u16 | gRPC agent communication port. Default: 50051. |
bootstrapPort | u16 | Bootstrap configuration port. Default: 8443. |
proxyPort | u16 | Kubectl proxy port. Default: 8081. |
service.type | string | Kubernetes Service type: LoadBalancer, NodePort, or ClusterIP. |
Status
| Field | Type | Description |
|---|---|---|
observedGeneration | i64? | Last spec generation processed by the controller. Compare to metadata.generation to check if changes are reconciled. |
phase | ClusterPhase | Current lifecycle phase of the cluster. |
message | string? | Human-readable status message. |
conditions | []Condition | Standard Kubernetes conditions. |
readyControlPlane | u32? | Number of ready control plane nodes. |
readyWorkers | u32? | Number of ready worker nodes (sum across all pools). |
workerPools | map<string, WorkerPoolStatus> | Per-pool status with replica counts and autoscaling state. |
endpoint | string? | Kubernetes API server endpoint. |
pivotComplete | bool | Whether CAPI resources have been pivoted into the cluster. |
bootstrapComplete | bool | Whether initial bootstrap has completed. |
unpivotImportComplete | bool | Whether CAPI import is complete for unpivot (crash-safe marker). |
bootstrapToken | string? | Token for authenticating with the parent cell during bootstrap. |
childrenHealth | []ChildClusterHealth | Health status of child clusters (parent only). |
lastHeartbeat | string? | ISO 8601 timestamp of last heartbeat to parent. |
WorkerPoolStatus
| Field | Type | Description |
|---|---|---|
desiredReplicas | u32 | Desired number of replicas (from MachineDeployment). |
currentReplicas | u32 | Current number of replicas. |
readyReplicas | u32 | Number of ready nodes in this pool. |
autoscalingEnabled | bool | Whether cluster autoscaler manages this pool. |
message | string? | Human-readable message about pool state. |
ChildClusterHealth
| Field | Type | Description |
|---|---|---|
name | string | Child cluster name. |
readyNodes | u32 | Number of ready nodes. |
totalNodes | u32 | Total number of nodes. |
readyControlPlane | u32 | Number of ready control plane nodes. |
totalControlPlane | u32 | Total number of control plane nodes. |
agentState | string | Current agent state (e.g., "Ready", "Provisioning"). |
lastHeartbeat | string? | Last heartbeat timestamp (ISO 8601). |
ClusterPhase
Pending Waiting to be provisioned.
Provisioning Infrastructure being created via Cluster API.
Pivoting CAPI resources being moved into the cluster.
Pivoted Pivot complete (from parent's perspective).
Ready Fully operational and self-managing.
Deleting Infrastructure teardown in progress.
Unpivoting CAPI resources being moved back to parent for deletion.
Failed Has encountered an error.