Lab K203 - Advanced Pod Scheduling

In the Kubernetes bootcamp training, we have seen how to create a pod and and some basic pod configurations to go with it. But this chapter explains some advanced topics related to pod scheduling.

From the api document for version 1.11 following are the pod specs which are relevant from scheduling perspective.

  • nodeSelector
  • nodeName
  • affinity
  • schedulerName
  • tolerations

Labeling your nodes

kubectl get nodes --show-labels

kubectl label nodes <node-name> zone=aaa

kubectl get nodes --show-labels

e.g.

kubectl label nodes node1 zone=bbb
kubectl label nodes node2 zone=bbb
kubectl label nodes node3 zone=aaa
kubectl get nodes --show-labels

where, replace node1-3 with the actual nodes in your cluster.

Defining affinity and anti-affinity

We have discussed about scheduling a pod on a particular node using NodeSelector, but using node selector is a hard condition. If the condition is not met, the pod cannot be scheduled. Node/Pod affinity and anti-affinity solves this issue by introducing soft and hard conditions.

  • required
  • preferred

  • DuringScheduling

  • DuringExecution

Operators

  • In
  • NotIn
  • Exists
  • DoesNotExist
  • Gt
  • Lt

Adding Node Affinity

Examine the current pod distribution

kubectl get pods -o wide --selector="role=vote"

and node labels

kubectl get nodes --show-labels

Lets create node affinity criteria as

  • Pods for vote app must not run on the master nodes
  • Pods for vote app preferably run on a node in zone bbb

First is a hard affinity versus second being soft affinity.

file: vote-deploy-nodeaffinity.yaml

....
  template:
....
    spec:
      containers:
        - name: app
          image: schoolofdevops/vote:v1
          ports:
            - containerPort: 80
              protocol: TCP

              affinity:
                nodeAffinity:
                  requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                    - matchExpressions:
                      - key: node-role.kubernetes.io/master
                        operator: DoesNotExist
                  preferredDuringSchedulingIgnoredDuringExecution:
                    - weight: 1
                      preference:
                        matchExpressions:
                        - key: zone
                          operator: In
                          values:
                            - bbb

apply

kubectl apply -f vote-deploy-nodeaffinity.yaml

kubectl get pods -o wide

Configuring Pod Affinity

Lets define pod affinity criteria as,

  • Pods for vote and redis should be co located as much as possible (preferred)
  • No two pods with redis app should be running on the same node (required)
kubectl get pods -o wide --selector="role in (vote,redis)"

file: vote-deploy-podaffinity.yaml

...
    template:
...
    spec:
      containers:
        - name: app
          image: schoolofdevops/vote:v1
          ports:
            - containerPort: 80
              protocol: TCP

      affinity:
...

        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: role
                    operator: In
                    values:
                    - redis
                topologyKey: kubernetes.io/hostname

file: redis-deploy-podaffinity.yaml

....
  template:
...
    spec:
      containers:
      - image: schoolofdevops/redis:latest
        imagePullPolicy: Always
        name: redis
        ports:
        - containerPort: 6379
          protocol: TCP
      restartPolicy: Always

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: role
                operator: In
                values:
                - redis
            topologyKey: "kubernetes.io/hostname"

apply

kubectl apply -f redis-deploy-podaffinity.yaml
kubectl apply -f vote-deploy-podaffinity.yaml


check the pods distribution

kubectl get pods -o wide --selector="role in (vote,redis)"

Observations from the above output,

  • Since redis has a hard constraint not to be on the same node, you would observe redis pods being on differnt nodes (node2 and node4)
  • since vote app has a soft constraint, you see some of the pods running on node4 (same node running redis), others continue to run on node 3

If you kill the pods on node3, at the time of scheduling new ones, scheduler meets all affinity rules

Now try scaling up redis instances

kubectl scale deploy/redis --replicas=4
kubectl get pods -o wide
  • Are all redis pods runnning ? Why?

Adding Taints and tolerations

  • Affinity is defined for pods
  • Taints are defined for nodes

You could add the taints with criteria and effects. Effetcs can be

Taint Specs:

  • effect
    • NoSchedule
    • PreferNoSchedule
    • NoExecute
  • key
  • value
  • timeAdded (only written for NoExecute taints)

Observe the pods distribution

kubectl get pods -o wide

Lets taint a node.

kubectl taint node node2 dedicated=worker:NoExecute

kubectl describe node node2

after tainting the node

kubectl get pods -o wide

All pods running on node2 just got evicted.

Add toleration in the Deployment for worker.

File: worker-deploy.yml

apiVersion: apps/v1
.....
  template:
....
    spec:
      containers:
        - name: app
          image: schoolofdevops/vote-worker:latest

      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "worker"
          effect: "NoExecute"

apply

kubectl apply -f worker-deploy.yml

Observe the pod distribution now.

$ kubectl get pods -o wide
NAME                      READY     STATUS    RESTARTS   AGE       IP             NODE
db-66496667c9-qggzd       1/1       Running   0          4h        10.233.74.74   node4
redis-5bf748dbcf-ckn65    1/1       Running   0          3m        10.233.71.26   node3
redis-5bf748dbcf-vxppx    1/1       Running   0          31m       10.233.74.79   node4
result-5c7569bcb7-4fptr   1/1       Running   0          4h        10.233.71.18   node3
result-5c7569bcb7-s4rdx   1/1       Running   0          4h        10.233.74.75   node4
vote-56bf599b9c-22lpw     1/1       Running   0          30m       10.233.74.80   node4
vote-56bf599b9c-4l6bc     1/1       Running   0          12m       10.233.74.83   node4
vote-56bf599b9c-bqsrq     1/1       Running   0          12m       10.233.74.82   node4
vote-56bf599b9c-xw7zc     1/1       Running   0          12m       10.233.74.81   node4
worker-6cc8dbd4f8-6bkfg   1/1       Running   0          1m        10.233.75.15   node2

You should see worker being scheduled on node2

To remove the taint created above

kubectl taint node node2 dedicate:NoExecute-

Exercise

  • Master node is unschedulable because of a taint. Find the taint on the master node and remove it. See if new pods get scheduled on it after that.