healthcheck 配置详解

healthcheck 配置详解

healthcheck 的完整参数

services:
  app:
    image: myapp
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s          # 健康检查间隔
      timeout: 10s           # 单次检查超时时间
      retries: 3             # 连续失败次数视为不健康
      start_period: 40s      # 启动缓冲期

参数详解

test – 检查命令

# 三种语法格式

# 1. CMD(执行命令,使用 exec 格式)
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost"]

# 2. CMD-SHELL(用 shell 执行)
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]

# 3. NONE(禁用继承的 healthcheck)
healthcheck:
  test: ["NONE"]

interval – 检查间隔

# 默认:30s
# 不同类型服务建议值

healthcheck:
  # 关键数据库:更频繁检查
  interval: 10s

  # 普通 Web 服务:30s 足够
  interval: 30s

  # 非关键服务:1分钟
  interval: 60s

timeout – 单次超时

# 默认:30s
# 建议值:根据检查命令的复杂度设置

healthcheck:
  # ping 类命令:短超时
  timeout: 3s

  # HTTP 检查:5-10s
  timeout: 10s

  # 复杂检查脚本:30s
  timeout: 30s

retries – 失败重试次数

# 默认:3
# 含义:连续失败 N 次后标记为 unhealthy

healthcheck:
  # 对不稳定服务:更多重试
  retries: 5

  # 对稳定服务:较少重试
  retries: 2

start_period – 启动缓冲期

# 默认:0s(无缓冲)
# 含义:在 start_period 内,健康检查失败不计入重试计数

healthcheck:
  # 数据库初始化时间长
  start_period: 60s

  # 需要预热的应用
  start_period: 30s

  # 快速启动的服务
  start_period: 0s

各服务的推荐配置

数据库类

# PostgreSQL
postgres:
  image: postgres:15
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U postgres"]
    interval: 10s
    timeout: 5s
    retries: 5
    start_period: 30s

# MySQL
mysql:
  image: mysql:8.0
  healthcheck:
    test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
    interval: 10s
    timeout: 5s
    retries: 5
    start_period: 30s

# MongoDB
mongo:
  image: mongo:6
  healthcheck:
    test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
    interval: 10s
    timeout: 5s
    retries: 5
    start_period: 30s

# Elasticsearch
elasticsearch:
  image: elasticsearch:8.0
  healthcheck:
    test: ["CMD-SHELL", "curl -s http://localhost:9200 | grep -q 'green\\|yellow'"]
    interval: 30s
    timeout: 10s
    retries: 5
    start_period: 60s

中间件类

# Redis
redis:
  image: redis:7-alpine
  healthcheck:
    test: ["CMD", "redis-cli", "ping"]
    interval: 10s
    timeout: 3s
    retries: 5

# RabbitMQ
rabbitmq:
  image: rabbitmq:3-management
  healthcheck:
    test: ["CMD-SHELL", "rabbitmq-diagnostics check_running"]
    interval: 30s
    timeout: 10s
    retries: 5
    start_period: 30s

# Nginx
nginx:
  image: nginx:alpine
  healthcheck:
    test: ["CMD", "nginx", "-t"]
    interval: 30s
    timeout: 10s
    retries: 3

应用类

# Node.js 应用
node-app:
  image: my-node-app
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 40s

# Python 应用
python-app:
  image: my-python-app
  healthcheck:
    test: ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8000/health', timeout=5)\""]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 30s

# Java 应用
java-app:
  image: my-java-app
  healthcheck:
    test: ["CMD-SHELL", "curl -f http://localhost:8080/actuator/health || exit 1"]
    interval: 30s
    timeout: 10s
    retries: 5
    start_period: 60s  # Java 启动慢

自定义检查脚本

简单场景

services:
  web:
    image: myweb
    healthcheck:
      test: ["CMD-SHELL", "curl -sf http://localhost/health && curl -sf http://localhost:8080/actuator/health"]
      interval: 30s

复杂场景

使用外部脚本更灵活:

services:
  app:
    image: myapp
    healthcheck:
      test: ["CMD", "bash", "-c", "source /health/hcheck.sh"]
      interval: 30s

自定义 Dockerfile 中的 healthcheck

FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm install

# 在镜像中内建 healthcheck
HEALTHCHECK --interval=30s --timeout=5s --start-period=40s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

CMD ["npm", "start"]

如果 Dockerfile 定义了 HEALTHCHECK,在 Compose 中覆盖:

services:
  app:
    image: myapp-internal-healthcheck
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      # 会覆盖 Dockerfile 中的 HEALTHCHECK

常见错误排查

# 查看健康检查日志
docker inspect --format='{{json .State.Health}}' container

# 手动执行相同的检查命令
docker exec container curl -f http://localhost:3000/health

# 检查容器内的必要工具是否存在
docker exec container which curl

面试要点

Q:Dockerfile 中的 HEALTHCHECK 和 Compose 中的 healthcheck 如何互相影响?

A:Compose 中的配置会覆盖 Dockerfile 中的 HEALTHCHECK。如果 Dockerfile 中定义了健康检查,而在 Compose 中设置了相同的 test,则 Compose 的优先级更高。

Q:start_period 和 interval 的时间关系?

A:start_period 是初始阶段的缓冲时间,期间的失败不计入 retries。interval 在整个生命周期中都是固定的,包括 start_period 期间——只是失败不计次。

Q:什么情况下应禁用 healthcheck?

A:对于瞬时任务(exit 0),或者容器启动后立即执行完成然后退出的场景,不适用 healthcheck。可以通过 test: ["NONE"] 禁用。

© 版权声明
THE END
喜欢就支持一下吧
点赞7 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容