Compute¶
Karpenter¶
We use Karpenter for Cluster Autoscaling.
Karpenter dynamically adds and removes nodes in the EKS cluster based on:
- pending pod requirements
- node utilization
- available AWS instance types
- current AWS pricing
Node types¶
Generally, C11N has 3 types of nodes:
1. EKS Managed Node Group (workload-tier: baseline)¶
The EKS managed NodeGroup created by Terraform. The EKS MNG Nodes are also tainted, which prevents any non critical workload to be scheduled on it
2. On-Demand Instances (workload-tier: on-demand)¶
Regular EC2 instances billed per-second/hour with no commitment. They provide guaranteed capacity (within AWS service limits) and are not interrupted.
3. Spot Instances (workload-tier: spot)¶
Spot instances use spare AWS capacity at a large discount (typically 60–90% cheaper), but AWS may interrupt and reclaim the instance with 2 minutes notice.
Tip
Using Spot instances significantly reduces compute cost for Constellation clusters.
Constellations Compute Strategy¶
Our strategy is as follows:
- Cluster Critical Workloads (ArgoCD, CoreDNS, Cilium, Karpenter, Traefik) ->
workload-tier: baseline - Business Critical Workloads (Passport, ...) ->
workload-tier: on-demand - All other Applications ->
workload-tier: spot
This strategy enables us to reduce the compute costs while still being relatively stable and predictable.
Workload pinning¶
using spec.template.spec.nodeSelector¶
Add a spec.template.spec.nodeSelector to your deployments with workload-tier: on-demand for On-Demand nodes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
replicas: 1
template:
spec:
nodeSelector:
workload-tier: spot | on-demand | baseline
kustomize¶
Or in a kustomize patch:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
patches:
- target:
kind: Deployment # applies to all Deployments
patch: |-
- op: add
path: /spec/template/spec/nodeSelector
value:
workload-tier: on-demand
PodDisruptionBudgets¶
PodDisruptionBudgets (PDBs) ensure Kubernetes does not evict too many pods at once during events such as:
- node scale-down
- Karpenter consolidation
- Spot interruptions
- planned maintenance
Example: For a Deployment with app=my-service with 2+ replicas
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-service-pdb
spec:
selector:
matchLabels:
app: my-service
minAvailable: 1
- This ensures at least 1 pod is always running
- Allows voluntary disruptions (Karpenter draining, node upgrades, etc.) as long as that condition stays true