I have been running HAProxy on my OPNSense 22.10 business edition for a while now. Sadly, I have to conclude that this doesn't increase availability as HAProxy after a few days stops passing on port 587 ansd this has now happened 3 times in one week. HAproxy just becomes a black hole when that happens. Stopping and starting haproxy solves that, so just to be sure I have now created a cron job to restart the router once a day (which is ugly).
Is there anyone who recognises this and knows what to do about it? Or how to find out what goes wrong?
My internal postfix/dovecot servers listen haproxy-aware on 991 (postfix/postscreen), 990 (postfix/submission), 994 (dovecot/imaps) and they listen non-haproxy-aware on the official ports (25,587,993)
haproxy.conf:
#
# Automatically generated configuration.
# Do not edit this file manually.
#
global
uid 80
gid 80
chroot /var/haproxy
daemon
stats socket /var/run/haproxy.socket group proxy mode 775 level admin
nbproc 1
nbthread 1
hard-stop-after 60s
no strict-limits
tune.ssl.default-dh-param 2048
spread-checks 2
tune.bufsize 16384
tune.lua.maxmem 0
log /var/run/log local0 info
lua-prepend-path /tmp/haproxy/lua/?.lua
defaults
log global
option redispatch -1
timeout client 30s
timeout connect 30s
timeout check 10s
timeout server 30s
retries 3
default-server init-addr last,libc
# autogenerated entries for ACLs
# autogenerated entries for config in backends/frontends
# autogenerated entries for stats
# Frontend: smtpd-loadbalancing (Port 25 Load Balancing)
frontend smtpd-loadbalancing
bind 192.168.2.2:25 name 192.168.2.2:25
mode tcp
default_backend mail.rna.nl.991
# tuning options
timeout client 30s
# logging options
# Frontend: submission-loadbalancing (Port 587 Load Balancing)
frontend submission-loadbalancing
bind 192.168.2.2:587 name 192.168.2.2:587
mode tcp
default_backend mail.rna.nl.991
# tuning options
timeout client 30s
# logging options
# Frontend: imaps-loadbalancing (Port 993 Load Balancing)
frontend imaps-loadbalancing
bind 192.168.2.2:993 name 192.168.2.2:993
mode tcp
default_backend mail.rna.nl.994
# tuning options
timeout client 30s
# logging options
# Backend: mail.rna.nl.991 (postfix haproxy postscreen pool)
backend mail.rna.nl.991
option log-health-checks
# health check: port991-health-monitor
mode tcp
balance roundrobin
# tuning options
timeout connect 30s
timeout check 10s
timeout server 30s
server albus-991 192.168.2.66:991 check inter 300s port 991 send-proxy
server snape-991 192.168.2.125:991 check inter 300s port 991 send-proxy
# Backend: mail.rna.nl.990 (postfix haproxy submssion pool)
backend mail.rna.nl.990
option log-health-checks
# health check: port991-health-monitor
mode tcp
balance roundrobin
# tuning options
timeout connect 30s
timeout check 10s
timeout server 30s
server albus-990 192.168.2.66:990 check inter 300s port 991 send-proxy
server snape-990 192.168.2.125:990 check inter 300s port 991 send-proxy
# Backend: mail.rna.nl.994 (postfix haproxy imaps pool)
backend mail.rna.nl.994
option log-health-checks
# health check: port991-health-monitor
mode tcp
balance roundrobin
# tuning options
timeout connect 30s
timeout check 10s
timeout server 30s
server albus-994 192.168.2.66:994 check inter 300s port 991 send-proxy
server snape-994 192.168.2.125:994 check inter 300s port 991 send-proxy
It sounds a bit like an upstream issue with the HAProxy software itself.
But: do you need HAProxy? The business version also has a proxy plugin based on Apache if it fits your use cases:
https://docs.opnsense.org/vendor/deciso/opnwaf.html
Cheers,
Franco
would start with enabling Detailed Logging and looking in logs perhaps )
it is also interesting to understand the network configuration (fontends and backends in 192.168.2 ?)
Quote from: franco on January 27, 2023, 09:25:40 AM
It sounds a bit like an upstream issue with the HAProxy software itself.
But: do you need HAProxy? The business version also has a proxy plugin based on Apache if it fits your use cases:
https://docs.opnsense.org/vendor/deciso/opnwaf.html
Cheers,
Franco
I would guess that for load balancing SMTP etc, a WAF would not be the right choice. I would suspect Apache to be http(s) oriented. But I haven't looked.
Quote from: Fright on January 27, 2023, 03:43:42 PM
would start with enabling Detailed Logging and looking in logs perhaps )
it is also interesting to understand the network configuration (fontends and backends in 192.168.2 ?)
What is noticeable in the logs is what is
not there. I should see 3 health checks every 5 minutes or so, but this is what I see:
2023-01-27T05:11:12 Notice haproxy Health check for server mail.rna.nl.991/albus-991 failed, reason: Layer4 connection problem, info: "General socket error (Network is down)", check duration: 0ms, status: 0/2 DOWN.
2023-01-26T22:38:19 Notice haproxy Health check for server mail.rna.nl.994/snape-994 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-26T22:37:29 Notice haproxy Health check for server mail.rna.nl.994/albus-994 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-26T22:36:39 Notice haproxy Health check for server mail.rna.nl.990/snape-990 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-26T22:35:49 Notice haproxy Health check for server mail.rna.nl.990/albus-990 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-26T22:34:59 Notice haproxy Health check for server mail.rna.nl.991/snape-991 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-26T22:34:09 Notice haproxy Health check for server mail.rna.nl.991/albus-991 succeeded, reason: Layer4 check passed, check duration: 2ms, status: 3/3 UP.
2023-01-25T00:57:00 Notice haproxy Health check for server mail.rna.nl.994/snape-994 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-25T00:56:10 Notice haproxy Health check for server mail.rna.nl.994/albus-994 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-25T00:55:20 Notice haproxy Health check for server mail.rna.nl.990/snape-990 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-25T00:54:30 Notice haproxy Health check for server mail.rna.nl.990/albus-990 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-25T00:53:40 Notice haproxy Health check for server mail.rna.nl.991/snape-991 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
2023-01-25T00:52:50 Notice haproxy Health check for server mail.rna.nl.991/albus-991 succeeded, reason: Layer4 check passed, check duration: 2ms, status: 3/3 UP.
This is weird. For one, I do not see health checks after Jan 25 00:57 but I only became aware of a user not being able to contact port 587 around 20:15 on Jan 26. And I am positive that in the meantime port 587 has worked (I sent mail myself).
(the issue at 5:11 is because I have now programmed my router to reboot every night at 5:10)
I guess I do not understand the question about the network. What do you need to know other than what is in my original post?
QuoteI only became aware of a user not being able to contact port 587 around 20:15 on Jan 26. And I am positive that in the meantime port 587 has worked (I sent mail myself)
if this is somehow related to syslog-ng communication, then probably these checks should be visible in the postfix logs - I would look there.
but I'm also talking about raising the logging level in the frontend settings ("option log-separate-errors", "option tcplog": Edit Public Service -> advanced mode -> Raise Log Level and Detailed Logging checkboxes).
Maybe this will help you understand what's going on.
about network settings: if you are sure that routes cannot be involved, then do not bother )