OPNsense Forum
Archive => 22.7 Legacy Series => Topic started by: blacksteel1288 on November 03, 2022, 07:58:22 pm
-
Hi,
I just upgraded to 27.7.7 and now haproxy fails to start after a reboot. I see in the release notes that the haproxy plug-in was updated in this release to v3.12.
A related message in the log looks like this:
[1ff53fdb-8812-4a5b-bd04-04cddac2fa89] Script action failed with Command 'configctl template reload OPNsense/HAProxy 2 > /dev/null; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json ' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 482, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'configctl template reload OPNsense/HAProxy 2 > /dev/null; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json ' returned non-zero exit status 1.
I am able to start haproxy manually from the Dashboard.
Additionally, rolling back to os-haproxy v3.11 by doing a ---
opnsense-revert -r 22.7.6 os-haproxy
-- did not correct the problem. haproxy now no longer starts on reboot in either version.
Please advise --- thanks!
-
In the upgrade logs (22.7.6 -> 22.7.7) I see these error/warning messages related to certs and haproxy. Could they be related to the problem at startup?
[certs error]
https://forum.opnsense.org/index.php?topic=29557.0
[haproxy deprecation warning]
https://forum.opnsense.org/index.php?topic=29538.0
And, also this one:
Message from py39-urllib3-1.26.10,1:
--
Since version 1.25 HTTPS connections are now verified by default which is done
via "cert_reqs = 'CERT_REQUIRED'". While certificate verification can be
disabled via "cert_reqs = 'CERT_NONE'", it's highly recommended to leave it on.
Various consumers of net/py-urllib3 already have implemented routines that
either explicitly enable or disable HTTPS certificate verification (e.g. via
configuration settings, CLI arguments, etc.).
Yet it may happen that there are still some consumers which don't explicitly
enable/disable certificate verification for HTTPS connections which could then
lead to errors (as is often the case with self-signed certificates).
In case of an error one should try first to temporarily disable certificate
verification of the problematic urllib3 consumer to see if that approach will
remedy the issue.
You may need to manually remove /usr/local/etc/ssl/cert.pem if it is no longer needed.
You may need to manually remove /usr/local/openssl/cert.pem if it is no longer needed.
-
A related message in the log looks like this:
[1ff53fdb-8812-4a5b-bd04-04cddac2fa89] Script action failed with Command
'configctl template reload OPNsense/HAProxy 2 > /dev/null;
/usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json '
returned non-zero exit status 1.
This message is unrelated to HAProxy startup problems. It is the "cert_sync_bulk" action, which is either triggered by a cron job or the related GUI button.
Having that said, we need to get more information to find out why HAProxy is not starting. I'd suggest to run the following command on the OPNsense shell as user root:
service haproxy start
The output should contain hints about the startup issue.
Ciao
- Frank
-
Hi @fraenki,
Nothing interesting from the command line start process.
root@kbl:~ # service haproxy start
haproxy already running? (pid=88652).
root@kbl:~ # service haproxy stop
Waiting for PIDS: 88652.
root@kbl:~ # service haproxy start
Starting haproxy.
The issue I see is only with the startup of haproxy at reboot. I don't see the errors or problem when I start it manually.
Is there another logfile from the startup process?
-
Is there another logfile from the startup process?
AFAIK, the console messages from the "service" command are not recorded anywhere.
You may have to watch the console while during the boot process. Maybe make a video in order to not miss the HAProxy error message.
Ciao
- Frank
-
Do you have any hostnames instead of IPs in your HA config?
I've had this since forever when I replaced all my IPs with FQDNs so I could swap machines without needing to reconfigure HA, it seems no matter what I do HA fails to look them up and thus fails to start, even with the option to ignore this (DNS resolve order: 'last','libc','none').
-
Hi @Taomyn,
I do have hostnames in my config, but it has been working fine for a couple years until now with that setup. I can't imagine that the upgrade from 27.7.6 to 27.7.7 would uncover that problem, but who knows.
@fraenki, my router is headless. Is there any other way of finding the failed on startup message logs?
-
@fraenki, my router is headless. Is there any other way of finding the failed on startup message logs?
AFAICT, no. If you have a serial console, you would still be able to see all boot messages...
-
Hi!
Script action failed with Command 'configctl template reload OPNsense/HAProxy 2 > /dev/null; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json ' returned non-zero exit status 1.
the message should not be directly related to the problem, but may give some clues..
any errors if you run this
configctl template reload OPNsense/HAProxy 2 > /dev/null ; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json
in shell?
(there will be an error about "Inline action failed with not all arguments .." in backend log but that's not it)
-
In the shell, I get this:
root@kbl:~ # configctl template reload OPNsense/HAProxy 2 > /dev/null ; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json
{"modified": [], "deleted": [], "add_count": 0, "remove_count": 0, "update_count": 0, "del_count": 0}
And, as you expected, in the backend log I see this:
[8c855732-8a22-4ffd-9648-d93ea83bf186] Inline action failed with not all arguments converted during string formatting at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 506, in execute inline_act_parameters = self.parameters % tuple(parameters) TypeError: not all arguments converted during string formatting
Any clues from that?
-
no, sorry
that is another issue (no new info from this output):
configctl template reload OPNsense/HAProxy 2 > /dev/null
has more parameters than the controller is ready to accept, which causes the mentioned error and prevents the templates from reloading. but only when performing two actions that are not related to (re)boot
you can try
opnsense-patch -c plugins -a kulikov-a ce7711c
and restart configd daemon. that should fix this error. but I would be very surprised if it turns out that this is related to the main problem
if it's possible to share a (sanitized) haproxy config, maybe i could try to reproduce the issue
-
Hi @Fright,
I applied the patch, and then did a reboot, but the problem is still there. haproxy does not start after the reboot. And, when I start haproxy manually from the Dashboard, it works, everything is ok.
The prior error is gone, and the only related log message I could find now in the "General" log is this:
/usr/local/etc/rc.d/haproxy: WARNING: failed precmd routine for haproxy
I'm attaching the sanitized haproxy.conf export, as mentioned. Thank you!
-
Hi!
WARNING: failed precmd routine for haproxy
and now this is something! )
but precmd routine for HAProxy start is just a config check afaiu
looking at the config i think that probable places for config check alerts is certificates (if not created at setup phase) and error pages (if haproxy is not happy with the format) but I think that in these cases the Syntax Check in GUI should not pass either...so the only line that seems to be potentially problematic for me is
log host1.localdomain:9901 len 8096 local0 info
in frontend "Option pass-through". if "host1.localdomain" could not be resolved on start then config check may fail.
can you try with ip in this string or test without it?
-
OK, I tried changing the log host, host1.localdomain to its IP address, but same error message appears on the reboot and haproxy does not start manually.
BTW, when I do a "Test Syntax" in haproxy, I do get:
[WARNING] (59447) : parsing [/usr/local/etc/haproxy.conf.staging:107] : a 'http-request' rule placed after a 'use_backend' rule will still be processed before.
[WARNING] (59447) : parsing [/usr/local/etc/haproxy.conf.staging:109] : a 'http-request' rule placed after a 'use_backend' rule will still be processed before.
Warnings were found.
Configuration file is valid
But, it has been like that for a long time and has worked without issue.
Could it be that there was some update that if there's a config file "warning" that startup is prevented (on purpose)?
-
Could it be that there was some update that if there's a config file "warning" that startup is prevented (on purpose)?
don't think so. works fine with warnings for me
can you try to edit /usr/local/etc/rc.d/haproxy file to make the line #97 look like
start_precmd="$command -c -f $haproxy_config > /var/log/haproxybootconfcheck.log 2>&1"
(remove quiet "-q" switch and redirect output to file), reboot and check the /var/log/haproxybootconfcheck.log?
-
Ok, that now shows the problem, and not resolving a hostname is causing an alert, which then I guess, fails the startup.
Here's the output of the logfile:
[WARNING] (23218) : parsing [/usr/local/etc/haproxy.conf:114] : a 'http-request' rule placed after a 'use_backend' rule will still be processed before.
[WARNING] (23218) : parsing [/usr/local/etc/haproxy.conf:116] : a 'http-request' rule placed after a 'use_backend' rule will still be processed before.
[NOTICE] (23218) : haproxy version is 2.4.19-ec55434
[NOTICE] (23218) : path to executable is /usr/local/sbin/haproxy
[ALERT] (23218) : parsing [/usr/local/etc/haproxy.conf:183] : 'server plex' : could not resolve address 'host1.localdomain'.
[ALERT] (23218) : Failed to initialize server(s) addr.
I tried rebooting a few times and either 1 or 2 different hosts would show up as alerts, so I assume if I correct this one by changing to an IP address, it will just trigger the next host, and so on.
I'm using unbound for DNS, and I'm wondering if there's some issue with waiting for unbound to start and begin resolving addresses at the time of haproxy startup? Or, DHCP possibly, since these hosts are in DHCPv4 settings and maybe are not getting registered as hosts until too late in the startup process.
Again, odd that this only started in 22.7.7 -- other than changing all hosts to IP addresses, is there any other solution?
-
oh, unbound...who knows why it might load a second faster or a minute later..config check can take a significant amount of time (I haven't dug for reasons yet. but it's not just the presence of a large blacklist. if I understand correctly, it's being prepared a change in the work of the blacklists and this can greatly speed up work with them).
various documentation states that adding "none" in DNS Resolve Order option at Services: HAProxy: Settings: Defaults Parameters should help with loading HAProxy in this case (a quick test confirms this. HAProxy loads fine with a known unresolvable backend name and ALERTs switches to WARNINGs at syntax check).
worth a try ;)
https://www.haproxy.com/documentation/hapee/latest/onepage/#5.2-init-addr
-
ok, yes, that worked.
Adding "none" to that config, converted the ALERT into a WARNING and the config check passes on startup, so the haproxy service will startup automatically on reboots.
I did also have to change the log host to it's IP address because that 'none' setting is only relevant for backend servers, not log hosts. But, that's fine.
Really appreciate all your help! And, I hope this thread helps someone else with the same issue.
Final thought -- it'd be nice if there was a way to control the ordering of rules in the haproxy configs to eliminate the "[WARNING] (55248) : parsing [/usr/local/etc/haproxy.conf:112] : a 'http-request' rule placed after a 'use_backend' rule will still be processed before."
warnings completely.
Feature request?
-
Glad it works and thanks for the feedbacks )
it'd be nice if there was a way to control the ordering of rules in the haproxy configs
Feature request?
think so. the rules order matters
-
Indeed it matters.
blacksteel, that is not just a noisy warning. The rule is not going to be processed as your config has it but.
Not a defect but the way haproxy works.
-
Do you have any hostnames instead of IPs in your HA config?
I've had this since forever when I replaced all my IPs with FQDNs so I could swap machines without needing to reconfigure HA, it seems no matter what I do HA fails to look them up and thus fails to start, even with the option to ignore this (DNS resolve order: 'last','libc','none').
Thank you thank you thank you! This has plagued me for ages. Looks solved with your tip!