Kubernetes与自动扩缩容最佳实践

张开发
2026/4/13 10:17:54 15 分钟阅读

分享文章

Kubernetes与自动扩缩容最佳实践
Kubernetes与自动扩缩容最佳实践1. 自动扩缩容概述自动扩缩容是Kubernetes的核心功能之一它可以根据应用的负载自动调整Pod数量或资源分配确保应用在不同负载下的稳定性和高效性。1.1 扩缩容类型类型描述适用场景Horizontal Pod Autoscaler (HPA)水平扩缩容调整Pod数量处理流量波动如Web应用Vertical Pod Autoscaler (VPA)垂直扩缩容调整Pod资源资源密集型应用如数据库Cluster Autoscaler (CA)集群扩缩容调整节点数量集群资源不足时自动添加节点1.2 扩缩容指标CPU使用率最常用的扩缩容指标内存使用率适用于内存密集型应用自定义指标如请求数、队列长度等多指标组合综合多个指标进行扩缩容决策2. Horizontal Pod Autoscaler (HPA)2.1 基本配置基于CPU的HPAapiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70基于内存的HPAapiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa-memory namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: memory target: type: Utilization averageUtilization: 802.2 基于自定义指标的HPA安装Metrics Server# 安装Metrics Server kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # 验证安装 kubectl get pods -n kube-system | grep metrics-server基于自定义指标的HPAapiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa-custom namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metric: name: requests-per-second target: type: AverageValue averageValue: 100m2.3 HPA高级配置多指标HPAapiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa-multiple namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: requests-per-second target: type: AverageValue averageValue: 100m行为配置apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa-behavior namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 behavior: scaleUp: stabilizationWindowSeconds: 30 policies: - type: Pods value: 2 periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 60 policies: - type: Pods value: 1 periodSeconds: 603. Vertical Pod Autoscaler (VPA)3.1 VPA配置安装VPA# 安装VPA git clone https://github.com/kubernetes/autoscaler.git cd autoscaler/vertical-pod-autoscaler ./hack/vpa-up.sh # 验证安装 kubectl get pods -n kube-system | grep vpaVPA配置apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: app-vpa namespace: default spec: targetRef: apiVersion: apps/v1 kind: Deployment name: app updatePolicy: updateMode: Auto resourcePolicy: containerPolicies: - containerName: * minAllowed: cpu: 100m memory: 256Mi maxAllowed: cpu: 2 memory: 4Gi controlledResources: [cpu, memory]3.2 VPA更新模式模式描述适用场景Off仅推荐资源不自动更新测试环境手动调整Initial仅在Pod创建时应用推荐避免Pod重启的场景Recreate自动更新重启Pod非关键应用Auto自动更新优先不重启生产环境4. Cluster Autoscaler (CA)4.1 CA配置安装Cluster Autoscaler# 安装Cluster Autoscaler helm repo add autoscaler https://kubernetes.github.io/autoscaler helm repo update helm install cluster-autoscaler autoscaler/cluster-autoscaler \ --namespace kube-system \ --set autoDiscovery.clusterNamemy-cluster \ --set replicaCount2 # 验证安装 kubectl get pods -n kube-system | grep cluster-autoscalerCA配置apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: replicas: 2 selector: matchLabels: app: cluster-autoscaler template: metadata: labels: app: cluster-autoscaler spec: containers: - name: cluster-autoscaler image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.23.0 command: - ./cluster-autoscaler - --v4 - --cluster-namemy-cluster - --nodes3:10:node-group-1 - --nodes2:5:node-group-2 - --expanderleast-waste - --balance-similar-node-groupstrue - --skip-nodes-with-local-storagefalse - --skip-nodes-with-system-podstrue4.2 节点组配置AWS节点组配置apiVersion: autoscaling/v1 kind: AutoScalingGroup metadata: name: node-group-1 spec: minSize: 3 maxSize: 10 desiredCapacity: 3 launchConfiguration: name: node-group-1-lc tags: - key: kubernetes.io/cluster/my-cluster value: owned - key: k8s.io/cluster-autoscaler/enabled value: true - key: k8s.io/cluster-autoscaler/node-group-1 value: true5. 自动扩缩容最佳实践5.1 HPA最佳实践设置合理的扩缩容范围根据应用需求设置最小和最大副本数选择合适的指标根据应用特性选择CPU、内存或自定义指标配置扩缩容行为设置合理的扩缩容策略和稳定窗口监控扩缩容事件关注HPA的扩缩容决策和事件测试扩缩容效果模拟负载变化测试扩缩容响应避免频繁扩缩容设置合理的稳定窗口避免抖动使用多指标综合多个指标进行扩缩容决策考虑启动时间为应用设置合理的就绪探针避免扩缩容时的服务中断5.2 VPA最佳实践从Off模式开始先观察推荐值再逐步调整设置资源限制为容器设置最小和最大资源限制选择合适的更新模式根据应用重要性选择更新模式监控资源使用关注VPA的资源推荐和实际使用情况与HPA配合使用VPA调整资源HPA调整副本数避免资源浪费根据实际需求调整资源配置考虑Pod重启影响Recreate模式会导致Pod重启影响服务5.3 Cluster Autoscaler最佳实践设置合理的节点组大小根据应用需求设置节点组的最小和最大节点数选择合适的扩缩容策略根据实际需求选择扩缩容策略配置节点标签为节点组设置合适的标签便于CA识别监控集群资源关注集群资源使用情况及时调整节点组配置考虑节点启动时间节点启动需要时间设置合理的扩缩容策略避免频繁节点调整设置合理的扩缩容阈值避免频繁添加/删除节点与HPA配合使用HPA调整Pod数量CA调整节点数量测试集群扩缩容模拟负载变化测试集群扩缩容响应6. 性能优化6.1 HPA性能优化调整扩缩容参数apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa-optimized namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 behavior: scaleUp: stabilizationWindowSeconds: 30 policies: - type: Pods value: 2 periodSeconds: 60 - type: Percent value: 100 periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 60 policies: - type: Pods value: 1 periodSeconds: 60 - type: Percent value: 50 periodSeconds: 60使用自定义指标apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: app-hpa-custom-metrics namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: app minReplicas: 2 maxReplicas: 10 metrics: - type: Object object: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1 kind: Ingress name: app-ingress target: type: Value value: 10006.2 资源配置优化合理设置资源请求和限制apiVersion: apps/v1 kind: Deployment metadata: name: app namespace: default spec: replicas: 2 selector: matchLabels: app: app template: metadata: labels: app: app spec: containers: - name: app image: nginx:1.21-alpine resources: requests: memory: 256Mi cpu: 200m limits: memory: 512Mi cpu: 500m使用资源配额apiVersion: v1 kind: ResourceQuota metadata: name: app-quota namespace: default spec: hard: requests.cpu: 4 requests.memory: 8Gi limits.cpu: 8 limits.memory: 16Gi pods: 207. 监控与故障排查7.1 监控自动扩缩容Prometheus监控指标apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: hpa-monitor namespace: monitoring spec: selector: matchLabels: app: metrics-server namespaceSelector: matchNames: - kube-system endpoints: - port: https interval: 15s scheme: https tlsConfig: insecureSkipVerify: trueGrafana仪表板{ dashboard: { id: null, title: Auto Scaling Metrics, panels: [ { title: HPA Replicas, type: graph, targets: [ { expr: kube_hpa_status_current_replicas{namespace\default\} } ] }, { title: CPU Utilization, type: graph, targets: [ { expr: (sum(node_cpu_seconds_total{mode!idle}) by (instance) / sum(node_cpu_seconds_total) by (instance)) * 100 } ] }, { title: Memory Utilization, type: graph, targets: [ { expr: (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100 } ] }, { title: Cluster Nodes, type: graph, targets: [ { expr: count(kube_node_info) } ] } ] } }7.2 故障排查常见扩缩容问题问题原因解决方案HPA不扩缩容指标采集失败检查Metrics Server是否正常运行HPA频繁扩缩容指标波动大增加稳定窗口调整扩缩容策略VPA不更新资源更新模式设置不当检查VPA更新模式配置CA不添加节点节点组配置错误检查节点组标签和配置CA不删除节点Pod无法移动检查Pod的节点亲和性和持久卷排查命令# 查看HPA状态 kubectl get hpa kubectl describe hpa app-hpa # 查看VPA状态 kubectl get vpa kubectl describe vpa app-vpa # 查看CA状态 kubectl get pods -n kube-system | grep cluster-autoscaler kubectl logs -f deployment/cluster-autoscaler -n kube-system # 查看节点状态 kubectl get nodes kubectl describe node node-name # 查看Pod状态 kubectl get pods kubectl describe pod pod-name8. 多集群自动扩缩容8.1 跨集群扩缩容使用Cluster API# 安装Cluster API kubectl apply -f https://github.com/kubernetes-sigs/cluster-api/releases/latest/download/cluster-api-components.yaml # 创建集群 kubectl apply -f cluster.yaml多集群HPA配置apiVersion: policy.karmada.io/v1alpha1 kind: PropagationPolicy metadata: name: hpa-propagation namespace: default spec: resourceSelectors: - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler name: app-hpa placement: clusterAffinity: clusterNames: - cluster1 - cluster28.2 联邦集群扩缩容使用Karmada# 安装Karmada kubectl apply -f https://github.com/karmada-io/karmada/releases/latest/download/karmada-operator.yaml # 加入集群 kubectl apply -f member-cluster.yamlKarmada扩缩容配置apiVersion: apps.karmada.io/v1alpha1 kind: PropagationPolicy metadata: name: deployment-propagation namespace: default spec: resourceSelectors: - apiVersion: apps/v1 kind: Deployment name: app placement: replicaScheduling: replicaDivisionPreference: Weighted replicaSchedulingType: Divided weightPreference: staticWeightList: - targetCluster: clusterNames: - cluster1 weight: 60 - targetCluster: clusterNames: - cluster2 weight: 409. 实践案例9.1 高流量Web应用扩缩容配置# 部署配置 apiVersion: apps/v1 kind: Deployment metadata: name: web-app namespace: default spec: replicas: 3 selector: matchLabels: app: web-app template: metadata: labels: app: web-app spec: containers: - name: web-app image: nginx:1.21-alpine resources: requests: memory: 256Mi cpu: 200m limits: memory: 512Mi cpu: 500m ports: - containerPort: 80 # HPA配置 apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-app-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-app minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 70 behavior: scaleUp: stabilizationWindowSeconds: 30 policies: - type: Pods value: 3 periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 120 policies: - type: Pods value: 1 periodSeconds: 609.2 数据库应用扩缩容配置# 部署配置 apiVersion: apps/v1 kind: StatefulSet metadata: name: postgres namespace: database spec: serviceName: postgres replicas: 1 selector: matchLabels: app: postgres template: metadata: labels: app: postgres spec: containers: - name: postgres image: postgres:13 resources: requests: memory: 1Gi cpu: 500m limits: memory: 2Gi cpu: 1 ports: - containerPort: 5432 # VPA配置 apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: postgres-vpa namespace: database spec: targetRef: apiVersion: apps/v1 kind: StatefulSet name: postgres updatePolicy: updateMode: Initial resourcePolicy: containerPolicies: - containerName: postgres minAllowed: cpu: 500m memory: 1Gi maxAllowed: cpu: 4 memory: 8Gi controlledResources: [cpu, memory]10. 总结Kubernetes与自动扩缩容最佳实践需要考虑以下因素选择合适的扩缩容类型根据应用特性选择HPA、VPA或CA配置合理的扩缩容参数设置适当的指标、阈值和行为监控扩缩容效果关注扩缩容事件和资源使用情况优化资源配置合理设置资源请求和限制避免扩缩容抖动设置合理的稳定窗口和扩缩容策略测试扩缩容响应模拟负载变化测试扩缩容效果多集群扩缩容实现跨集群的统一扩缩容管理故障排查建立有效的扩缩容故障排查流程持续优化根据实际运行情况调整扩缩容配置最佳实践遵循行业最佳实践确保扩缩容的有效性和可靠性通过以上实践可以构建一个高效、自动的扩缩容系统确保应用在不同负载下的稳定性和高效性同时优化集群资源使用降低运营成本。

更多文章