webgui need manual restart after upgrade to 24.7

Started by Cilahn, August 03, 2024, 07:39:34 PM

Previous topic - Next topic
August 03, 2024, 07:39:34 PM Last Edit: August 03, 2024, 07:48:15 PM by Cilahn
Hello fellow opnsensers,

my box used to hum right away running 24.1. After upgrading to 24.7 the webgui is unable to start at boot. To "fix" this issue a login via ssh is needed. Restarting the webgui via 'configctl webgui restart' make the webgui accessable again.

There are hints in the boot.log:

2024-08-03T18:38:20 webgui_configure_do[299] failed.
2024-08-03T18:38:20 webgui_configure_do[299] Starting web GUI...


digging a bit deeper i found this in the general.log:

2024-08-03T18:38:20 Notice kernel <118>Starting web GUI...failed.
2024-08-03T18:38:20 Notice kernel <118>Configuring OpenSSH...done.
2024-08-03T18:38:20 Notice opnsense /usr/local/etc/rc.bootup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure(1))
2024-08-03T18:38:20 Notice opnsense /usr/local/etc/rc.bootup: plugins_configure dhcp (1)
2024-08-03T18:38:20 Notice opnsense /usr/local/etc/rc.bootup: ROUTING: setting inet default route to 192.168.2.1
2024-08-03T18:38:20 Notice opnsense /usr/local/etc/rc.bootup: ROUTING: configuring inet default gateway on wan
2024-08-03T18:38:20 Notice opnsense /usr/local/etc/rc.bootup: ROUTING: entering configure using defaults
2024-08-03T18:38:20 Error opnsense /usr/local/etc/rc.bootup: The command '/usr/local/bin/flock -ne /var/run/lighty-webConfigurator.pid /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf' returned exit code '255', the output was '2024-08-03 18:38:20: (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.76/src/network.c.604) bind() [fe80::20c:29ff:fe34:fc02]:4444: Can't assign requested address'
2024-08-03T18:38:20 Notice opnsense /usr/local/etc/rc.bootup: plugins_configure early (execute task : webgui_configure_do(1))
2024-08-03T18:38:20 Notice opnsense /usr/local/etc/rc.bootup: plugins_configure early (execute task : unbound_cache_flush(1))


also the restart seems to find an 'old' instance:

2024-08-03T18:41:50 Error opnsense /usr/local/etc/rc.restart_webgui: The command '/usr/sbin/daemon -f -p '/var/run/updaterrd.pid' '/var/db/rrd/updaterrd.sh'' returned exit code '3', the output was 'daemon: process already running, pid: 97477'


bit i couldn't find anymore hints for reason the webgui is not starting. Is this a known bug or does anybody know where to look for debug info?

Cheers,
Cilahn

Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Hi Patrick,

thanks for pointing this out. I double checked the Interfache Settings. The WebGUI ist listening to port 4444 on Interface LAN and WLAN. Both are set to IPV4 only (IPV6 'NONE') and have static IPs.

If there where a conflict a configctl webgui restart should fail too, but it doesn't. In the log is an bind error belonging to an ipv6 address which make no sense, because IPV6 is set to none on all interfaces.

Cheers,
Cilahn

Change to "All (recommended)" and test, please.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

i did, still needed a manual restart after reboot.

general.log is not happy:

2024-08-03T20:29:30 Error opnsense /usr/local/etc/rc.bootup: The command '/usr/local/bin/flock -ne /var/run/lighty-webConfigurator.pid /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf' returned exit code '255', the output was '2024-08-03 20:29:30: (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.76/src/network.c.604) bind() [fe80::20c:29ff:fe34:fc02]:4444: Can't assign requested address'

August 03, 2024, 08:57:51 PM #5 Last Edit: August 03, 2024, 09:17:11 PM by Cilahn
it is strange, kinda found a workaround ..all interfaces are configured with IPV6 Configuration "None". Anyway 'ifconfig show' still shows ipv6 addresses on all interfaces. vmx0 is the interface which is unable to bind the IPV6 listening port stated in the general.log.

Also tried Interface->Settings->Allow IPv6 'off', which kinda fixed it ...

before IPv6 off ....


vmx0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4800028<VLAN_MTU,JUMBO_MTU,HWSTATS,MEXTPG>
ether 00:0c:29:34:fc:02
inet6 fe80::20c:29ff:fe34:fc02%vmx0 prefixlen 64 scopeid 0x1
media: Ethernet autoselect
status: active
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
vmx1: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: WAN (wan)
options=4800028<VLAN_MTU,JUMBO_MTU,HWSTATS,MEXTPG>
ether 00:0c:29:34:fc:0c
inet 192.168.2.2 netmask 0xffffff00 broadcast 192.168.2.255
inet6 fe80::20c:29ff:fe34:fc0c%vmx1 prefixlen 64 scopeid 0x2
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
groups: lo
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0 metric 0 mtu 1536
options=0
groups: enc
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
pfsync0: flags=0 metric 0 mtu 1500
options=0
maxupd: 128 defer: off version: 1400
syncok: 1
groups: pfsync
pflog0: flags=20100<PROMISC,PPROMISC> metric 0 mtu 33152
options=0
groups: pflog
vmx0_vlan100: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: LAN (lan)
options=4000000<MEXTPG>
ether 00:0c:29:34:fc:02
inet 192.168.80.250 netmask 0xffffff00 broadcast 192.168.80.255
inet6 fe80::20c:29ff:fe34:fc02%vmx0_vlan100 prefixlen 64 scopeid 0x7
groups: vlan
vlan: 100 vlanproto: 802.1q vlanpcp: 0 parent interface: vmx0
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vmx0_vlan300: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: DMZ (opt1)
options=4000000<MEXTPG>
ether 00:0c:29:34:fc:02
inet 172.23.23.1 netmask 0xffffff00 broadcast 172.23.23.255
inet6 fe80::20c:29ff:fe34:fc02%vmx0_vlan300 prefixlen 64 scopeid 0x8
groups: vlan
vlan: 300 vlanproto: 802.1q vlanpcp: 0 parent interface: vmx0
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vmx0_vlan400: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: WLAN (opt2)
options=4000000<MEXTPG>
ether 00:0c:29:34:fc:02
inet 10.10.10.1 netmask 0xffffff00 broadcast 10.10.10.255
inet6 fe80::20c:29ff:fe34:fc02%vmx0_vlan400 prefixlen 64 scopeid 0x9
groups: vlan
vlan: 400 vlanproto: 802.1q vlanpcp: 0 parent interface: vmx0
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>



.... reboot  ...vmx0 is still IPv6 enabled, but the vlans are now IPv6 free and lighthttpd seems to bind without trouble after the reboot ...


vmx0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=4800028<VLAN_MTU,JUMBO_MTU,HWSTATS,MEXTPG>
ether 00:0c:29:34:fc:02
inet6 fe80::20c:29ff:fe34:fc02%vmx0 prefixlen 64 scopeid 0x1
media: Ethernet autoselect
status: active
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

vmx0_vlan100: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
description: LAN (lan)
options=4000000<MEXTPG>
ether 00:0c:29:34:fc:02
inet 192.168.80.250 netmask 0xffffff00 broadcast 192.168.80.255
groups: vlan
vlan: 100 vlanproto: 802.1q vlanpcp: 0 parent interface: vmx0
media: Ethernet autoselect
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>


sockstat -4ln
0        lighttpd     914 7   tcp4   127.0.0.1:4444        *:*
0        lighttpd     914 10  tcp4   192.168.80.250:4444   *:*
0        lighttpd     914 11  tcp4   10.10.10.1:4444       *:*
0        lighttpd     914 12  tcp4   127.0.0.1:80          *:*
0        lighttpd     914 15  tcp4   192.168.80.250:80     *:*
0        lighttpd     914 16  tcp4   10.10.10.1:80         *:*
0        sshd       95203 8   tcp4   *:22                  *:*



sorry, but this feels like a bug in the config daemon.

Quote from: Cilahn on August 03, 2024, 08:57:51 PM
all interface are configures with IPV6 Configureation "None". Anyway ifconfig show still ipv6
addresses on all interface.

This is not how it works, not configuring IPv6 does not disable the link-local IPs - and never has.

August 03, 2024, 11:04:47 PM #7 Last Edit: August 03, 2024, 11:07:41 PM by Cilahn
Quote from: doktornotor on August 03, 2024, 09:17:33 PM
Quote from: Cilahn on August 03, 2024, 08:57:51 PM
all interface are configures with IPV6 Configureation "None". Anyway ifconfig show still ipv6
addresses on all interface.

This is not how it works, not configuring IPv6 does not disable the link-local IPs - and never has.

This is surely true, i'm not to deep into ipv6, but this shouldn't prevent the start of webgui anyway. Something broke here between 24.1 and 24.7

Thinking a bit about it, so why are there no link-local adresses on the vlan interfaces?

i'm did a deeper dig today and found some clues i not really can glue together.

First I saw that all VLans on on the vmx0 1q-Trunk Interfaces share the same ipv6 link-local addresses, but starting the webgui via configctl just binds theses addresses fine. Even when the shared link-local IP was mentioned in the lighty startup general.log error.

Could it be, that under ESXi the VLans are somewhat setup late and i observe a race condition?
BTW the VLan Trunk is setup correctly with ID 4095 and has being working for years now.

So to dig deeper i added an evil hack to /usr/local/etc/inc/plugins.inc.d/webgui.inc just to see how the setup is reacting to a little delay of 30 seconds. Guess what. It works after this fix.

/* start lighttpd */
    /* .. some evil hack.. */
    sleep(30);
    if (!count($listeners) || mwexec('/usr/local/bin/flock -ne /var/run/lighty-webConfigurator.pid /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf')) {
        service_log("failed.\n", $verbose);
    } else {
        service_log("done.\n", $verbose);
    }


Any Ideas what could be the reason behind all this?

If the share the MAC address the share the link-local address. Nothing wrong with that. Link-local means exactly that. Unique per link.

Did you try changing the listen interfaces to "All"?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)


Weird. Only thing I can say for now is that the link-local addresses are perfectly normal.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I think the link-local similarity is misleading because all interfaces are perfectly mapped when the webgui is delayed or manually started.


sockstat -ln | grep lighttpd
0        lighttpd   84760 7  tcp4   127.0.0.1:4444        *:*
0        lighttpd   84760 8  tcp6   ::1:4444              *:*
0        lighttpd   84760 9  tcp6   fe80::1%lo0:4444      *:*
0        lighttpd   84760 10 tcp4   192.168.80.250:4444   *:*
0        lighttpd   84760 11 tcp6   fe80::20c:29ff:fe25:92df%vmx0_vlan100:4444 *:*
0        lighttpd   84760 12 tcp4   10.10.10.1:4444       *:*
0        lighttpd   84760 13 tcp6   fe80::20c:29ff:fe25:92df%vmx0_vlan400:4444 *:*
0        lighttpd   84760 14 tcp4   127.0.0.1:80          *:*
0        lighttpd   84760 15 tcp6   ::1:80                *:*
0        lighttpd   84760 16 tcp6   fe80::1%lo0:80        *:*
0        lighttpd   84760 17 tcp4   192.168.80.250:80     *:*
0        lighttpd   84760 18 tcp6   fe80::20c:29ff:fe25:92df%vmx0_vlan100:80 *:*
0        lighttpd   84760 19 tcp4   10.10.10.1:80         *:*
0        lighttpd   84760 20 tcp6   fe80::20c:29ff:fe25:92df%vmx0_vlan400:80 *:*

some minor findings. The timing of the interface being initalized and bindung the lighttpd daemon to them is right on the same time stamp in the logs. May be this is a hint?


Timing

system/latest.log
<11>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot opnsense 278 - [meta sequenceId="321"] /usr/local/etc/rc.bootup: The command '/usr/local/bin/flock -ne /var/run/lighty-webConfigurator.pid /usr/local/sbin/
lighttpd -f /var/etc/lighty-webConfigurator.conf' returned exit code '255', the output was '2024-08-04 21:48:52: (/usr/obj/usr/ports/www/lighttpd/work/lighttpd-1.4.76/src/network.c.604) bind() [fe80::20c:29ff:fe
25:92df]:4444: Can't assign requested address'
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot opnsense 278 - [meta sequenceId="322"] /usr/local/etc/rc.bootup: ROUTING: entering configure using defaults
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="304"] <6>vlan0: changing name to 'vmx0_vlan100'
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="305"] <6>vlan1: changing name to 'vmx0_vlan300'
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="306"] <6>vlan2: changing name to 'vmx0_vlan400'
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="307"] <118>done.
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="308"] <118>Configuring DMZ interface...done.
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="309"] <118>Configuring LAN interface...done.
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="310"] <118>Configuring WAN interface...
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="311"] <6>vmx1: link state changed to UP
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="312"] <118>done.
<13>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot kernel - - [meta sequenceId="313"] <118>Configuring WLAN interface...done.


audit/latest.log
<38>1 2024-08-04T21:48:52+02:00 firewall.mittelerde.mot configd.py 238 - [meta sequenceId="3"] action allowed interface.linkup.start for user root

Quote from: Cilahn on August 04, 2024, 10:06:27 PM
yes i did. Listen to all didn't help :(

Just to be sure because this looks like a waste of debugging time that has already been done over the years:

"Listen to all" here means DESELECT ALL interfaces, not selecting all interfaces.


Cheers,
Franco