Nginx Reverse Proxy doesn't detect upsteam hosts as down

m2e · February 21, 2024, 09:06:46 AM

In reference to https://stackoverflow.com/questions/77522129/nginx-does-not-detect-an-upstream-server-as-down

I have an upstream (generated by OpnSense Nginx plugin)

Code Select

upstream upstream9dbd5491033b477e84564ebe3e516c0b {
        server aa.bb.cc.d1:443 weight=1 max_conns=10000 max_fails=3 fail_timeout=10;
        server aa.bb.cc.d2:443 weight=1 max_conns=10000 max_fails=3 fail_timeout=10;
        server aa.bb.cc.d3:443 weight=1 max_conns=10000 max_fails=3 fail_timeout=10;
}

and host aa.bb.cc.d3 is down. But Nginx does not detect the host as down, unless I add the down flag to it.
See screenshot below. The red line shows a host that is shut down (power off) but still up for Nginx.

I expect Nginx to not forward any requests to the server anymore. But unfortunately, it still does (there is a significant performance change when I "down" the server manually).

Also the statistics view in OpnSense says, that server aa.bb.cc.d3 is up.

The documentation [1] is quite clear, except the following facts:

QuoteWhat is considered an unsuccessful attempt is defined by the proxy_next_upstream, fastcgi_next_upstream, uwsgi_next_upstream, scgi_next_upstream, memcached_next_upstream, and grpc_next_upstream directives.

Well, I have no proxy_next_upstream [2] and the default value is error:

Quotean error occurred while establishing a connection with the server, passing a request to it, or reading the response header

But the default of proxy_next_upstream_timeout is 0:

QuoteLimits the time during which a request can be passed to the next server. The 0 value turns off this limitation.

Do these default values disable that feature completely, or what else could be the reason, that Nginx still keeps a server up, that is not reachable at all?

References:

[1] https://nginx.org/en/docs/http/ngx_http_upstream_module.html
[2] https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream

m2e · February 22, 2024, 08:58:16 AM

It looks like, that the feature to "mark a host down automatically after n retries" is not a basic feature a may be available in the commercial healthcheck module: https://nginx.org/en/docs/http/ngx_http_upstream_hc_module.html

The only chance to get this feature work, is to reduce `max_fails` and `fail_timeout` and let `proxy_next_upstream` do the job.

Fright · February 24, 2024, 06:01:23 PM

QuoteThe only chance to get this feature work, is to reduce `max_fails` and `fail_timeout` and let `proxy_next_upstream` do the job.

Quotemax_fails=3 fail_timeout=10;

hm. what if you just increase the `fail_timeout` value in this case?
say 'max_fails=1 fail_timeout=60;'

Nginx Reverse Proxy doesn't detect upsteam hosts as down

m2e

February 21, 2024, 09:06:46 AM Last Edit: February 21, 2024, 10:03:08 AM by m2e

m2e

February 22, 2024, 08:58:16 AM #1 Last Edit: February 22, 2024, 09:00:29 AM by m2e

Fright

February 24, 2024, 06:01:23 PM #2