22.7.7 haproxy fails to start on reboot

Started by blacksteel1288, November 03, 2022, 07:58:22 PM

Previous topic - Next topic
November 03, 2022, 07:58:22 PM Last Edit: November 03, 2022, 11:59:42 PM by blacksteel1288
Hi,

I just upgraded to 27.7.7 and now haproxy fails to start after a reboot.  I see in the release notes that the haproxy plug-in was updated in this release to v3.12.

A related message in the log looks like this:


[1ff53fdb-8812-4a5b-bd04-04cddac2fa89] Script action failed with Command 'configctl template reload OPNsense/HAProxy 2 > /dev/null; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json ' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 482, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'configctl template reload OPNsense/HAProxy 2 > /dev/null; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json ' returned non-zero exit status 1.


I am able to start haproxy manually from the Dashboard.

Additionally, rolling back to os-haproxy v3.11 by doing a ---

opnsense-revert -r 22.7.6 os-haproxy

-- did not correct the problem.  haproxy now no longer starts on reboot in either version.

Please advise --- thanks!

In the upgrade logs (22.7.6 -> 22.7.7) I see these error/warning messages related to certs and haproxy.  Could they be related to the problem at startup?

[certs error]
https://forum.opnsense.org/index.php?topic=29557.0

[haproxy deprecation warning]
https://forum.opnsense.org/index.php?topic=29538.0

And, also this one:

Message from py39-urllib3-1.26.10,1:

--
Since version 1.25 HTTPS connections are now verified by default which is done
via "cert_reqs = 'CERT_REQUIRED'". While certificate verification can be
disabled via "cert_reqs = 'CERT_NONE'", it's highly recommended to leave it on.

Various consumers of net/py-urllib3 already have implemented routines that
either explicitly enable or disable HTTPS certificate verification (e.g. via
configuration settings, CLI arguments, etc.).

Yet it may happen that there are still some consumers which don't explicitly
enable/disable certificate verification for HTTPS connections which could then
lead to errors (as is often the case with self-signed certificates).

In case of an error one should try first to temporarily disable certificate
verification of the problematic urllib3 consumer to see if that approach will
remedy the issue.
You may need to manually remove /usr/local/etc/ssl/cert.pem if it is no longer needed.
You may need to manually remove /usr/local/openssl/cert.pem if it is no longer needed.


Quote from: blacksteel1288 on November 03, 2022, 07:58:22 PM
A related message in the log looks like this:


[1ff53fdb-8812-4a5b-bd04-04cddac2fa89] Script action failed with Command
  'configctl template reload OPNsense/HAProxy 2 > /dev/null;
  /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json '
returned non-zero exit status 1.


This message is unrelated to HAProxy startup problems. It is the "cert_sync_bulk" action, which is either triggered by a cron job or the related GUI button.

Having that said, we need to get more information to find out why HAProxy is not starting. I'd suggest to run the following command on the OPNsense shell as user root:


service haproxy start


The output should contain hints about the startup issue.


Ciao
- Frank

Hi @fraenki,

Nothing interesting from the command line start process.


root@kbl:~ # service haproxy start
haproxy already running?  (pid=88652).
root@kbl:~ # service haproxy stop
Waiting for PIDS: 88652.
root@kbl:~ # service haproxy start
Starting haproxy.


The issue I see is only with the startup of haproxy at reboot.  I don't see the errors or problem when I start it manually.

Is there another logfile from the startup process?


Quote from: blacksteel1288 on November 06, 2022, 03:12:06 PM
Is there another logfile from the startup process?

AFAIK, the console messages from the "service" command are not recorded anywhere.

You may have to watch the console while during the boot process. Maybe make a video in order to not miss the HAProxy error message.


Ciao
- Frank

Do you have any hostnames instead of IPs in your HA config?


I've had this since forever when I replaced all my IPs with FQDNs so I could swap machines without needing to reconfigure HA, it seems no matter what I do HA fails to look them up and thus fails to start, even with the option to ignore this (DNS resolve order: 'last','libc','none').

Hi @Taomyn,

I do have hostnames in my config, but it has been working fine for a couple years until now with that setup.   I can't imagine that the upgrade from 27.7.6 to 27.7.7 would uncover that problem, but who knows.

@fraenki, my router is headless.  Is there any other way of finding the failed on startup message logs?

Quote from: blacksteel1288 on November 07, 2022, 02:26:03 PM
@fraenki, my router is headless.  Is there any other way of finding the failed on startup message logs?

AFAICT, no. If you have a serial console, you would still be able to see all boot messages...

Hi!
QuoteScript action failed with Command 'configctl template reload OPNsense/HAProxy 2 > /dev/null;  /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json ' returned non-zero exit status 1.
the message should not be directly related to the problem, but may give some clues..
any errors if you run this
configctl template reload OPNsense/HAProxy 2 > /dev/null ; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json

in shell?
(there will be an error about "Inline action failed with not all arguments .." in backend log but that's not it)

In the shell, I get this:


root@kbl:~ # configctl template reload OPNsense/HAProxy 2 > /dev/null ; /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json
{"modified": [], "deleted": [], "add_count": 0, "remove_count": 0, "update_count": 0, "del_count": 0}


And, as you expected, in the backend log I see this:


[8c855732-8a22-4ffd-9648-d93ea83bf186] Inline action failed with not all arguments converted during string formatting at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/processhandler.py", line 506, in execute inline_act_parameters = self.parameters % tuple(parameters) TypeError: not all arguments converted during string formatting


Any clues from that?

no, sorry
that is another issue (no new info from this output):
configctl template reload OPNsense/HAProxy 2 > /dev/null
has more parameters than the controller is ready to accept, which causes the mentioned error and prevents the templates from reloading. but only when performing two actions that are not related to (re)boot
you can try

opnsense-patch -c plugins -a kulikov-a ce7711c

and restart configd daemon. that should fix this error. but I would be very surprised if it turns out that this is related to the main problem

if it's possible to share a (sanitized) haproxy config, maybe i could try to reproduce the issue

Hi @Fright,

I applied the patch, and then did a reboot, but the problem is still there.  haproxy does not start after the reboot.  And, when I start haproxy manually from the Dashboard, it works, everything is ok.

The prior error is gone, and the only related log message I could find now in the "General" log is this:


/usr/local/etc/rc.d/haproxy: WARNING: failed precmd routine for haproxy


I'm attaching the sanitized haproxy.conf export, as mentioned.  Thank you!

Hi!
QuoteWARNING:  failed precmd routine for haproxy
and now this is something! )
but precmd routine for HAProxy start is just a config check afaiu
looking at the config i think that probable places for config check alerts is certificates (if not created at setup phase) and error pages (if haproxy is not happy with the format) but I think that in these cases the Syntax Check in GUI should not pass either...so the only line that seems to be potentially problematic for me is
log host1.localdomain:9901 len 8096 local0 info
in frontend "Option pass-through". if "host1.localdomain" could not be resolved on start then config check may fail.
can you try with ip in this string or test without it?

OK, I tried changing the log host, host1.localdomain to its IP address, but same error message appears on the reboot and haproxy does not start manually.

BTW, when I do a "Test Syntax" in haproxy, I do get:


[WARNING] (59447) : parsing [/usr/local/etc/haproxy.conf.staging:107] : a 'http-request' rule placed after a 'use_backend' rule will still be processed before.
[WARNING] (59447) : parsing [/usr/local/etc/haproxy.conf.staging:109] : a 'http-request' rule placed after a 'use_backend' rule will still be processed before.
Warnings were found.
Configuration file is valid


But, it has been like that for a long time and has worked without issue.

Could it be that there was some update that if there's a config file "warning" that startup is prevented (on purpose)?

QuoteCould it be that there was some update that if there's a config file "warning" that startup is prevented (on purpose)?
don't think so. works fine with warnings for me

can you try to edit /usr/local/etc/rc.d/haproxy file to make the line #97 look like
start_precmd="$command -c -f $haproxy_config > /var/log/haproxybootconfcheck.log 2>&1"
(remove quiet "-q" switch and redirect output to file), reboot and check the /var/log/haproxybootconfcheck.log?