Compute¶
Karpenter¶
We use Karpenter for Cluster Autoscaling.
Karpenter dynamically adds and removes nodes in the EKS cluster based on:
- Pending pod requirements
- Node utilization
- Available AWS instance types
- Current AWS pricing
Node Types¶
Generally, Constellation has three types of nodes:
1. EKS Managed Node Group (workload-tier: baseline)¶
These are EKS-managed NodeGroups created by Terraform. EKS MNG Nodes are tainted, preventing non-critical workloads from being scheduled on them.
2. On-Demand Instances (workload-tier: on-demand)¶
Regular EC2 instances billed per-second/hour with no commitment. They provide guaranteed capacity (within AWS service limits) and are not interrupted.
3. Spot Instances (workload-tier: spot)¶
Spot instances use spare AWS capacity at a large discount (typically 60–90% cheaper), but AWS may interrupt and reclaim the instance with 2 minutes notice.
Tip
Using Spot instances significantly reduces compute costs for Constellation clusters.
Constellation Compute Strategy¶
Our workload placement strategy is as follows:
- Cluster Critical Workloads (ArgoCD, CoreDNS, Cilium, Karpenter, Traefik) ->
workload-tier: baseline - Business Critical Workloads (Passport, etc.) ->
workload-tier: on-demand - All Other Applications ->
workload-tier: spot
This strategy enables us to reduce compute costs while maintaining stability and predictability for critical services.
Workload Pinning¶
Using spec.template.spec.nodeSelector¶
Add a spec.template.spec.nodeSelector to your deployments to specify the workload-tier:
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
replicas: 1
template:
spec:
nodeSelector:
workload-tier: spot # or on-demand, baseline
Using kustomize¶
Alternatively, apply it via a kustomize patch:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
patches:
- target:
kind: Deployment # applies to all Deployments
patch: |-
- op: add
path: /spec/template/spec/nodeSelector
value:
workload-tier: on-demand
PodDisruptionBudgets (PDBs)¶
PodDisruptionBudgets ensure Kubernetes does not evict too many pods at once during events such as:
- Node scale-down
- Karpenter consolidation
- Spot interruptions
- Planned maintenance
Example: For a Deployment with app=my-service and 2+ replicas:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-service-pdb
spec:
selector:
matchLabels:
app: my-service
minAvailable: 1
- This ensures at least 1 pod is always running.
- It allows voluntary disruptions (Karpenter draining, node upgrades, etc.) as long as that condition stays true.