分享交流
合作共赢!

Kubernetes/K8S基础使用方法总结【十四】——调度器Scheduler

一、简述

调度器大体分为Predicate、Priority、Select(预选、优选、选择)三个阶段。

1.预选策略

  • CheckNodeCondition: 检查节点是否正常;
  • GeneralPredicates: 通用预选策略,包括HostName(检查Pod对象是否定义了pod.spec.hostname)、PodFitsHostPorts(检查node上是否有空闲的pod需要的端口:pods.spec.containers.ports.hostPort)、MatchNodeSelector(检查pods.spec.nodeSelector与节点node选择器是否匹配)、PodFitsResources(检查Pod的资源需求是否能被node节点所满足);
  • NoDiskConflict: 检查Node是否能满足pod所用存储卷要求
  • PodToleratesNodeTaints: 检查pod的spec.tolerations可容忍污点是否完全包含节点上的污点;
  • PodToleratesNodeNoExcuteTaints: 当pod的容忍污点或node上的污点信息发生变化,导致pod不能容忍node的污点时,pod会被node驱逐;
  • CheckNodeLabelPresence: 检查节点标签是否存在;
  • CheckServiceAffinity: 检查service的亲和性;
  • MaxEBSVolumeCount: 检查亚马逊存储卷剩余可挂载数量(默认最大39);
  • MaxGCEPDVolumeCount: 检查google平台存储卷可挂载数量;
  • MaxAzureDiskVolumeCount: 检查Azure平台存储卷挂载数量,最大16;
  • CheckVolumeBinding: 检查Node节点已绑定的pvc;
  • NoVolumeZoneConflict: 检查某区域是否满足调节键的逻辑限制;
  • CheckNodeMemoryPressure: 检查节点内存资源状况是否满足Pod需求;
  • CheckNodePIDPressure: 检查节点进程状况是否过多;
  • CheckNodeDiskPressure: 检查节点磁盘IO是否过大;
  • MatchInterPodAffinity: 检查pos资源亲和性;

2.优选函数

  • least_requested: 计算节点node剩余可用资源的比率,比率越大越优选;
  • balancedResourceAllocation: CPU和内存占用比率的相近程度,越相近越优选;
  • NodePreferAvoidPods: 根据节点注解信息(scheduler.alpha.kubernetes.io/preferAvoidPods)判断,存在此信息则倾向于不运行pod;
  • TaintToleration: 将Pod对象的spec.tolerations与节点taint列表项的匹配度进行检查匹配;
  • SelectorSpreading: 通过计算,匹配度越高,越倾向于将pod分散到不同的node;
  • nodeAffinity: 节点亲和性;
  • interpod_affinity: 计算pod亲和性,分值越大越容易被调度;
  • most_requested:与least_requested相反,默认不启用;
  • node_label: 根据节点是否存在相关标签选择调度,默认不启用;
  • image_locality: 根据节点是否拥有相关镜像,使用某镜像的容易程度来优选,默认不启用;

二、高级调度方式

一、node亲和性实例

1.nodeSelector

实例

apiVersion: v1
kind: Pod
metadata:
  name: pod-schedule
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  nodeSelector:
    disktype: ssd

创建pod后发现pod一直处于pending状态,原因是node节点没有匹配到disktype的标签

[root@master1 ~]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
pod-schedule   0/1     Pending   0          2m6s
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 node(s) didn't match node selector.

这是假如在node2上打上这个标签,pod就在node2上运行起来了,如下所示:

[root@master1 schedule]# kubectl label nodes node2 disktype=ssd
[root@master1 ~]# kubectl get pods -o wide
NAME           READY   STATUS    RESTARTS   AGE     IP           NODE    NOMINATED NODE   READINESS GATES
pod-schedule   1/1     Running   0          6m45s   10.244.2.4   node2   <none>           <none>

2.nodeAffinity

使用帮助:

[root@master1 schedule]# kubectl explain pods.spec.affinity.nodeAffinity
KIND:     Pod
VERSION:  v1

RESOURCE: nodeAffinity <Object>

DESCRIPTION:
     Describes node affinity scheduling rules for the pod.

     Node affinity is a group of node affinity scheduling rules.

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution	<[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node matches
     the corresponding matchExpressions; the node(s) with the highest sum are
     the most preferred.

   requiredDuringSchedulingIgnoredDuringExecution	<Object>
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to an update), the system may or may not try
     to eventually evict the pod from its node.

实例:

apiVersion: v1
kind: Pod
metadata:
  name: pod-schedule-nodeffinity
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone 
            operator: In
            values:
            - foo
            - bar

因为是requiredDuringSchedulingIgnoredDuringExecution硬亲和性,此时Pod处于pending状态;

当改为referredDuringSchedulingIgnoredDuringExecution软亲和性后,重新创建Pod资源,则为运行状态,如下:

apiVersion: v1
kind: Pod
metadata:
  name: pod-schedule-nodeffinity
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: zone 
            operator: In
            values:
            - foo
            - bar
        weight: 60

二、pod亲和性使用实例

1.podAffinity

实例

apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: 
    - "sh"
    - "-c"
    - "sleep 3600"      
  affinity:
    podAffinity: 
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - myapp
        topologyKey: kubernetes.io/hostname

查看结果如下:

[root@master1 schedule]# kubectl apply -f pod-required-affinity.yaml 
pod/pod-first created
pod/pod-second created
[root@master1 schedule]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          9s    10.244.2.12   node2   <none>           <none>
pod-second   1/1     Running   0          9s    10.244.2.11   node2   <none>           <none>

2.podAntiAffinity

与podAffinity相反,只需要把podAffinity改为podAntiAffinity即可,如下:

[root@master1 schedule]# kubectl apply -f pod-required-antiAffinity.yaml 
pod/pod-first created
pod/pod-second created
[root@master1 schedule]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
pod-first    1/1     Running   0          11s   10.244.2.13   node2   <none>           <none>
pod-second   1/1     Running   0          11s   10.244.1.9    node1   <none>           <none>

三、污点调度

污点taint一般定义在node节点上,在pod上定义容忍度Tolerations来决定是否能容忍这些污点。taint的effect定义了对Pod的排斥影响结果:

  • NoSchedule: 仅影响调度过程,对现存的pod不产生影响;
  • NoExecute: 既影响调度过程,也影响现存的Pod对象,不能容忍的Pod对象将被驱逐离开;
  • PreferNoSchedule: 最好不调度,但是也能调度

添加污点格式:kubectl taint NODE NAME KEY_1=VAL_1:TAINT_EFFECT_1 … KEY_N=VAL_N:TAINT_EFFECT_N [options]

此内容查看价格3.99立即购买
[root@master1 yaml]# kubectl apply -f toleration-deploy-pod.yaml 
deployment.apps/myapp-deploy created
[root@master1 yaml]# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
myapp-deploy-5f86f6ffdd-knmvk   1/1     Running   0          10s   10.244.1.10   node1   <none>           <none>
myapp-deploy-5f86f6ffdd-pftbz   1/1     Running   0          10s   10.244.1.12   node1   <none>           <none>
myapp-deploy-5f86f6ffdd-xnbqk   1/1     Running   0          10s   10.244.1.11   node1   <none>           <none>
赞(0) 打赏
未经允许不得转载:琼杰笔记 » Kubernetes/K8S基础使用方法总结【十四】——调度器Scheduler

评论 抢沙发

评论前必须登录!

 

分享交流,合作共赢!

联系我们加入QQ群

觉得文章有用就打赏一下文章作者

非常感谢你的打赏,我们将继续给力更多优质内容,让我们一起创建更加美好的网络世界!

支付宝扫一扫打赏

微信扫一扫打赏

登录

找回密码

注册