IPv6 is broken after Update to 23.1.4 + 503 Service unavailable error

Started by PhoenixRider, March 21, 2023, 03:14:03 PM

Previous topic - Next topic
Quote from: franco on March 22, 2023, 08:16:52 AM
Looks like the persistence to get this minor issue fixed made the problem worse on 23.1.4. I've written a new patch that should sidestep the underlying issue:

https://github.com/opnsense/core/commit/33ad50456

# opnsense-patch 33ad50456

Please report back if it works or not...


Cheers,
Franco

I have the Web GUI set to run on two Listen Interfaces: Mgmt (management VLAN), OPT1 (personal network interface for my one PC)
System -> Settings -> Administration -> Web GUI -> Listen Interfaces

Reason: Lowers the attack surface area, the other VLANs and networks do not need nor should ever be accessing the WebGUI so if it is not listening at all it is better off.
the SSHd is also setup the same way.

OPT1 also has IPv6 enabled and that has been working great with the fix in 23.1.4_1. DHCPv6 and radvd error etc are gone.

With the above 33ad50456 patch applied I was able to reproduce the 503 Service Unavailable error on the subsequent reboot. Before I didn't know when it would happen just the next time I tried to open the WebGUI it would have the service unavailable error.
configctl webgui restart via SSH would get it to work again for awhile.

My hunch from what I am reading in this thread is the interface up/down is triggering the issue and it will likely happen after one or more restarts of my OPT1 connected PC.

I also do not see any PHP errors

cat /tmp/PHP_errors.log
cat: /tmp/PHP_errors.log: No such file or directory


Web GUI Log file and what happened between the entries


action: reboot after applying opnsense-patch 33ad50456
2023-03-25T14:20:06-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/server.c.1704) server started (lighttpd/1.4.69)

action: connection timeout after attempting to login to web gui after reboot.
action: "configctl webgui restart" issued via SSH
2023-03-25T14:20:23-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/server.c.1057) [note] graceful shutdown started
2023-03-25T14:22:12-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/server.c.1704) server started (lighttpd/1.4.69)

action: 503 Error from the web gui after login and just checking out the web gui log file. This was new as haven't been actively using the web gui when it threw the 503 error before.

2023-03-25T14:25:49-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/mod_openssl.c.3438) SSL (error): 5 -1: Operation timed out
2023-03-25T14:25:49-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/server.c.2078) server stopped by UID = 0 PID = 57711
2023-03-25T14:26:00-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/gw_backend.c.274) establishing connection failed: socket: unix:/tmp/php-fastcgi.socket-1: No such file or directory
2023-03-25T14:26:00-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/gw_backend.c.274) establishing connection failed: socket: unix:/tmp/php-fastcgi.socket-0: No such file or directory
2023-03-25T14:26:00-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/gw_backend.c.960) all handlers for /ui/index.php? on .php are down.
2023-03-25T14:26:03-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/gw_backend.c.351) gw-server re-enabled: unix:/tmp/php-fastcgi.socket-1 0 /tmp/php-fastcgi.socket
2023-03-25T14:26:03-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/gw_backend.c.351) gw-server re-enabled: unix:/tmp/php-fastcgi.socket-0 0 /tmp/php-fastcgi.socket

configctl webgui restart issued via SSH
2023-03-25T14:26:09-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/server.c.1057) [note] graceful shutdown started
2023-03-25T14:27:30-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/server.c.1704) server started (lighttpd/1.4.69)


while typing this up in the "General" log
No commands were issued to cause that error to my knowledge

2023-03-25T14:31:12-06:00 Error opnsense /usr/local/etc/rc.restart_webgui: The command '/usr/local/bin/flock -ne /var/run/lighty-webConfigurator.pid /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf' returned exit code '1', the output was ''

Then in the Web GUI Log
2023-03-25T14:31:12-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/mod_openssl.c.3438) SSL (error): 5 -1: Operation timed out
2023-03-25T14:31:12-06:00 Error lighttpd (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.69/src/server.c.2078) server stopped by UID = 0 PID = 54531



The Web GUI is still working without 503 or connection timeout via the OPT1 interface...for now?
The current lighttpd pid according to top
87005 root          1  20    0    18M  7932K kqread   1   0:00   0.00% lighttpd

Hopefully this helps troubleshoot the issue.

> Reason: Lowers the attack surface area

It doesn't unless you screw up your firewall rules. But then again screwing up access by ignoring the GUI warning is same same but different? All in the name of security of course. ;)


Cheers,
Franco

INADDR_ANY is special.
INADDR_ANY is special.
INADDR_ANY is special.
...
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I had the same issue, since updating yesterday to the latest version.

I just saw the 503 service unavailable and restarted the webgui service, it is working for now but the errors are still popping up in logs.


Ok, I removed "LAN" from listen interfaces and went back to default "All (recommended)". It makes no sense to me why the GUI would need to listen on the WAN side.

To be frank I don't expect everyone to understand but the fact is e.g. if you have LAN tracking an IPv6 WAN this is what it is because without NAT you do not have a static address and this reload *must* happen. We are not trying to make arbitrary rules here...

And this wouldn't happen if people would use a real management interface to access the web GUI in the first place. A LAN is not a management interface.


Cheers,
Franco

@Gromheim "All (recommended)" does not mean

- listen on LAN
- listen on WAN
- listen on OPT1
- ...

It means

- listen on the special address 0.0.0.0 also called INADDR_ANY
- listen on the special address :: also called IN6ADDR_ANY

These addresses work regardless of interfaces coming and going, addresses changing etc. It's a fundamental property of the socket API.

That's why it is the recommended setting and why changing this leads to all sorts of problems if your network configuration is not 100% static.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

@Patrick M. Hausen, @franco - many thanks for the explanation! Indeed, I expected I was _wrong_, was just looking for this piece of information. Maybe discussions like these help at some point make the gui or docs more self-explanatory. Of course, nothing helps against ignorant users (I hope I am not one of them).