健康检查

HAProxy 内置强大的健康检查机制，分为主动检查（HAProxy 发起的探测）和被动检查（观察实时流量判断）。

主动健康检查

TCP 层检查（默认）

通过建立 TCP 连接判断服务器是否存活：

haproxy

backend web_pool
    option tcp-check
    tcp-check connect port 8080
    tcp-check send PING\r\n
    tcp-check expect string +OK
    server web1 192.168.1.101:8080 check

HTTP 层检查（最常用）

发送 HTTP 请求，更精准地判断服务是否就绪：

haproxy

backend web_pool
    option httpchk
    http-check expect status 200               # 期望返回 200
    # http-check expect ! status 500           # 或者期望不是 500
    # http-check expect string "healthy"        # 检查响应包含某字符串

    server web1 192.168.1.101:8080 check inter 3000ms fall 3 rise 2
    server web2 192.168.1.102:8080 check inter 3000ms fall 3 rise 2

inter：检查间隔（默认 2000ms）
fall：连续失败 n 次标记为下线（默认 3）
rise：连续成功 n 次标记为上线（默认 2）

自定义 HTTP 检查

haproxy

backend web_pool
    option httpchk
    http-check send meth GET uri /api/health headers "Authorization: Bearer token123" hdr(host) myservice.example.com
    http-check expect status 200 204
    server web1 192.168.1.101:8080 check inter 5000ms fall 2 rise 3

SSL/HTTPS 后端检查

haproxy

backend web_pool
    option httpchk
    http-check send-ssl
    http-check expect status 200
    server web1 192.168.1.101:443 ssl check inter 10000ms

健康检查日志

haproxy

backend web_pool
    log-health-checks local0 info    # 开启健康检查日志
    option log-health-checks

查看日志：

bash

tail -f /var/log/haproxy/haproxy.log | grep "Health check"

被动健康检查（Agent Check）

后端服务器主动报告自己的健康状态，通过 Unix Socket 通信：

haproxy

backend web_pool
    option agent-check
    server web1 192.168.1.101:8080 check agent-check inter 3000ms agent-port 8888

后端服务需要在 :8888 端口实现简单的协议：

上线：50~199 的状态码
下线：200~299 的状态码
权重调整：30~99（数字前加-表示减少）

on-marked 系列语法

服务器状态变化时触发特定动作：

haproxy

backend web_pool
    # 上线时执行
    on-marked-up  on-marked-up action(iponmarkedservers)
    # 下线时执行
    on-marked-down shutdown-sessions

    # 或者触发告警
    on-marked-down logger err "Server went DOWN!"
    on-marked-up   logger info "Server is BACK!"

详细 on-marked 系列

haproxy

server web1 192.168.1.101:8080 \
    check inter 2000ms fall 3 rise 2 \
    on-marked-down shutdown-sessions \
    on-marked-up   enable  # 自动重新启用已上线的服务器

渐进式上线（Slowstart）

新服务器上线时，逐渐增加流量，避免冷启动冲击：

haproxy

backend web_pool
    server web1 192.168.1.101:8080 check slowstart 30s

新上线的服务器在 30 秒内逐步从 0 增加到满负载。

多层检查（综合）

haproxy

backend web_pool
    # 主动 HTTP 检查
    option httpchk GET /health
    http-check expect status 200

    # 被动检查：连续 5 个 5xx 就标记下线
    option redispatch
    option abortonclose

    server web1 192.168.1.101:8080 \
        check inter 3000ms fall 3 rise 2 \
        slowstart 20s \
        maxconn 3000

健康检查最佳实践

检查频率：inter 不宜过短（造成负载），不宜过长（发现问题慢）。通常 2~5 秒。
URI 独立：检查 URI 应轻量（/health、/ping），不要依赖数据库或缓存。
fall/rise 配合：fall > rise 更好，避免抖动（宁可慢上线，快下线）。
日志开启：初期开启 log-health-checks，便于排查问题。
结合主动+被动：主动检查确保自动剔除，被动 redispatch 确保故障时自动重试其他服务器。

健康检查 ​

主动健康检查 ​

TCP 层检查（默认） ​

HTTP 层检查（最常用） ​

自定义 HTTP 检查 ​

SSL/HTTPS 后端检查 ​

健康检查日志 ​

被动健康检查（Agent Check） ​

on-marked 系列语法 ​

详细 on-marked 系列 ​

渐进式上线（Slowstart） ​

多层检查（综合） ​

健康检查最佳实践 ​

健康检查

主动健康检查

TCP 层检查（默认）

HTTP 层检查（最常用）

自定义 HTTP 检查

SSL/HTTPS 后端检查

健康检查日志

被动健康检查（Agent Check）

on-marked 系列语法

详细 on-marked 系列

渐进式上线（Slowstart）

多层检查（综合）

健康检查最佳实践