healthcheck 配置详解
healthcheck 的完整参数
services:
app:
image: myapp
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s # 健康检查间隔
timeout: 10s # 单次检查超时时间
retries: 3 # 连续失败次数视为不健康
start_period: 40s # 启动缓冲期
参数详解
test – 检查命令
# 三种语法格式
# 1. CMD(执行命令,使用 exec 格式)
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
# 2. CMD-SHELL(用 shell 执行)
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
# 3. NONE(禁用继承的 healthcheck)
healthcheck:
test: ["NONE"]
interval – 检查间隔
# 默认:30s
# 不同类型服务建议值
healthcheck:
# 关键数据库:更频繁检查
interval: 10s
# 普通 Web 服务:30s 足够
interval: 30s
# 非关键服务:1分钟
interval: 60s
timeout – 单次超时
# 默认:30s
# 建议值:根据检查命令的复杂度设置
healthcheck:
# ping 类命令:短超时
timeout: 3s
# HTTP 检查:5-10s
timeout: 10s
# 复杂检查脚本:30s
timeout: 30s
retries – 失败重试次数
# 默认:3
# 含义:连续失败 N 次后标记为 unhealthy
healthcheck:
# 对不稳定服务:更多重试
retries: 5
# 对稳定服务:较少重试
retries: 2
start_period – 启动缓冲期
# 默认:0s(无缓冲)
# 含义:在 start_period 内,健康检查失败不计入重试计数
healthcheck:
# 数据库初始化时间长
start_period: 60s
# 需要预热的应用
start_period: 30s
# 快速启动的服务
start_period: 0s
各服务的推荐配置
数据库类
# PostgreSQL
postgres:
image: postgres:15
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
# MySQL
mysql:
image: mysql:8.0
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
# MongoDB
mongo:
image: mongo:6
healthcheck:
test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
# Elasticsearch
elasticsearch:
image: elasticsearch:8.0
healthcheck:
test: ["CMD-SHELL", "curl -s http://localhost:9200 | grep -q 'green\\|yellow'"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
中间件类
# Redis
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 5
# RabbitMQ
rabbitmq:
image: rabbitmq:3-management
healthcheck:
test: ["CMD-SHELL", "rabbitmq-diagnostics check_running"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
# Nginx
nginx:
image: nginx:alpine
healthcheck:
test: ["CMD", "nginx", "-t"]
interval: 30s
timeout: 10s
retries: 3
应用类
# Node.js 应用
node-app:
image: my-node-app
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
# Python 应用
python-app:
image: my-python-app
healthcheck:
test: ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8000/health', timeout=5)\""]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
# Java 应用
java-app:
image: my-java-app
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:8080/actuator/health || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s # Java 启动慢
自定义检查脚本
简单场景
services:
web:
image: myweb
healthcheck:
test: ["CMD-SHELL", "curl -sf http://localhost/health && curl -sf http://localhost:8080/actuator/health"]
interval: 30s
复杂场景
使用外部脚本更灵活:
services:
app:
image: myapp
healthcheck:
test: ["CMD", "bash", "-c", "source /health/hcheck.sh"]
interval: 30s
自定义 Dockerfile 中的 healthcheck
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm install
# 在镜像中内建 healthcheck
HEALTHCHECK --interval=30s --timeout=5s --start-period=40s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["npm", "start"]
如果 Dockerfile 定义了 HEALTHCHECK,在 Compose 中覆盖:
services:
app:
image: myapp-internal-healthcheck
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
# 会覆盖 Dockerfile 中的 HEALTHCHECK
常见错误排查
# 查看健康检查日志
docker inspect --format='{{json .State.Health}}' container
# 手动执行相同的检查命令
docker exec container curl -f http://localhost:3000/health
# 检查容器内的必要工具是否存在
docker exec container which curl
面试要点
Q:Dockerfile 中的 HEALTHCHECK 和 Compose 中的 healthcheck 如何互相影响?
A:Compose 中的配置会覆盖 Dockerfile 中的 HEALTHCHECK。如果 Dockerfile 中定义了健康检查,而在 Compose 中设置了相同的 test,则 Compose 的优先级更高。
Q:start_period 和 interval 的时间关系?
A:start_period 是初始阶段的缓冲时间,期间的失败不计入 retries。interval 在整个生命周期中都是固定的,包括 start_period 期间——只是失败不计次。
Q:什么情况下应禁用 healthcheck?
A:对于瞬时任务(exit 0),或者容器启动后立即执行完成然后退出的场景,不适用 healthcheck。可以通过 test: ["NONE"] 禁用。
© 版权声明
文章版权归作者所有,未经允许请勿转载。
THE END


暂无评论内容