Upgraded to 23.1.r2: no LAN ip after reboot

Started by Alessandro Del Prete, January 23, 2023, 05:16:26 AM

Previous topic - Next topic
Hi,

I upgraded to 23.1.r2, everything seemed to go fine as usual. After reboot, no errors, but I noticed in the console menu where the interfaces with their IPs are listed that the LAN interface had no IP assigned.

I reloaded the services from the menu, and the ip was assigned. Everything good. So I rebooted to check if it was just an incident, but the same behaviour: no IP assigned to LAN on boot. :(

The LAN interface is of type LAGG with static ip configuration. Working since 2y without an issue.

Reloading services works everytime, but obviously it's not an ideal scenario in case I'm not home and the fw reboots (power loss, etc.).

Any hint of what to look for to further debug the issue?

Thanks for any help.

Are you saying a static IP is missing after boot? That's highly unlikely, but perhaps the best step is to look for ifconfig errors first:

# opnsense-log | grep ifconfig


Cheers,
Franco

Hi Franco,

I've tested 3 reboots, what happens is that as soon as it reboots, services are coming up, interfaces go UP, I can actually ping 10.1.10.1, but after 3-4sec. while other services are coming up, the IP is not pingable anymore, the reboot completes, the menu appears and none of the interfaces have the IP, two of these interfaces have a static IP and it shows empty. A reload of all the services fixes everything.

Here's the log you requested:

<171>1 2023-01-23T04:22:41+01:00 OPNsense.axel.dom php-cgi 18472 - [meta sequenceId="169"] /interfaces_lagg_edit.php: The command `/sbin/ifconfig 'igb0' mtu '9000'' failed to execute
<171>1 2023-01-23T04:22:41+01:00 OPNsense.axel.dom php-cgi 18472 - [meta sequenceId="170"] /interfaces_lagg_edit.php: The command `/sbin/ifconfig 'igb1' mtu '9000'' failed to execute




Perhaps you should share your full setup. It's probably hard to reproduce otherwise.


Cheers,
Franco

What do you need exactly? Is there a command to share the complete config? Never needed to ask for support before, OPNsense always worked great. :)

Perhaps for now you merely could post the member configuration of the LAGG in question, the "ifconfig lagg0" output and the IP configuration in the interface settings (that you would expect on it).

At least the ifconfig will show us if it's there. If it is, it might be a firewall rule or otherwise...


Cheers,
Franco

Thanks for the help Franco. I didn't touch any rule before upgrading from 22.7.11. And if it's a rule, it's strange just reloading services fixes the ip binding.

Here's the info requested:

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
description: LAN (lan)
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 02:d4:32:81:89:00
inet 10.1.10.1 netmask 0xfffffe00 broadcast 10.1.11.255
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>




So this looks as expected actually:

inet 10.1.10.1 netmask 0xfffffe00 broadcast 10.1.11.255

(perhaps you posted this in the working state, not sure)

What service(s) are you restarting (or which button or command do you run) and then which part of the system do you access? Is this about the GUI in particular?


Cheers,
Franco

Yes, I'm giving you the config in the working state, from remote. Once I get home, I can reproduce the issue rebooting, and access shell via local console.

In order to fix the issue, after reboot, in the boot menu there is a "Reload all services" option on the right column. Once I run that, the binding is ok.

It's all a bit strange. Reload all services sort of acts like a dry reboot. It does the same things as the boot would minus a few bits.


Cheers,
Franco

January 23, 2023, 04:53:09 PM #10 Last Edit: January 23, 2023, 05:16:54 PM by alexdelprete
Even stranger: during reboot, for some seconds the IP is bound, I can ping OPNsense from the LAN, then after a couple of seconds, while completing reboot, the IP goes away.

It's like one of the scripts unbinds it in some way. I don't know if it's something related to the LAGG, but I don't think so, because it happens also to the ONT_LAN interface, that is a simple interface.

Unfortunately from the console I don't know exactly what to do to give you the most information possible. Because in the non-working state OPNsense is isolated so I can't send anything, I wil do some screenshots with my phone. :)

What commands should I run to debug the non-working state?

January 23, 2023, 05:26:04 PM #11 Last Edit: January 23, 2023, 05:28:37 PM by alexdelprete
Just came home, did a quick test (my family gets angry when OPNsense is down..:):

In non-working state, I ran the opnsense-log command and also ifconfig, here's the output:


OPNsense-log (from timestamps they seem old entries, not related to this reboot):

<171>1 2023-01-23T04:22:41+01:00 OPNsense.axel.dom php-cgi 18472 - [meta sequenceId="169"] /interfaces_lagg_edit.php: The command `/sbin/ifconfig 'igb0' mtu '9000'' failed to execute
<171>1 2023-01-23T04:22:41+01:00 OPNsense.axel.dom php-cgi 18472 - [meta sequenceId="170"] /interfaces_lagg_edit.php: The command `/sbin/ifconfig 'igb1' mtu '9000'' failed to execute

LOG ifconfig:

igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 02:d4:32:81:89:00
hwaddr 40:62:31:0c:0e:e2
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
igb1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 02:d4:32:81:89:00
hwaddr 40:62:31:0c:0e:e3
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
igb2: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 40:62:31:0c:0e:e4
media: Ethernet autoselect
status: no carrier
nd6 options=9<PERFORMNUD,IFDISABLED>
igb3: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 40:62:31:0c:0e:e5
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
igb4: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 40:62:31:0c:0e:e5
hwaddr 40:62:31:0c:0e:e6
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
igb5: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 40:62:31:0c:0e:e5
hwaddr 40:62:31:0c:0e:e7
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
enc0: flags=41<UP,RUNNING> metric 0 mtu 1536
groups: enc
nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8
inet 127.0.0.1 netmask 0xff000000
groups: lo
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=20100<PROMISC,PPROMISC> metric 0 mtu 33160
groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
syncpeer: 0.0.0.0 maxupd: 128 defer: off
syncok: 1
groups: pfsync
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
description: LAN (lan)
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 02:d4:32:81:89:00
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=0<>
laggport: igb1 flags=0<>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: ONT_LAN (opt2)
options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
ether 40:62:31:0c:0e:e5
laggproto lacp lagghash l2,l3,l4
laggport: igb3 flags=0<>
laggport: igb4 flags=0<>
laggport: igb5 flags=0<>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
vlan01: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: ONT_WAN (opt3)
options=4000000<NOMAP>
ether 40:62:31:0c:0e:e5
groups: vlan
vlan: 835 vlanproto: 802.1q vlanpcp: 7 parent interface: lagg1
media: Ethernet autoselect
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
pppoe0: flags=8890<POINTOPOINT,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN_FTTH (wan)
groups: WAN_GROUP
nd6 options=9<PERFORMNUD,IFDISABLED>


This is the end of the reboot, you can see no IPs are bound:



Then I choose option 11 (Reload all services) and after it ends, everything's ok, the IPs are bound:



It seems like on reboot, some services don't complete and when triggered manually they do...but I'm just speculating...

Let me know if you need other info, and thanks again.

Funky... does running the following script break it again and require a reload?

# /usr/local/etc/rc.syshook.d/start/90-sysctl

If yes, what sysctl values do you set? The same would potentially happen from the tunables GUI.


Cheers,
Franco

January 23, 2023, 10:51:12 PM #13 Last Edit: January 23, 2023, 10:56:01 PM by alexdelprete
You're the man Franco! :)

I quickly checked Tunables, and I had 3-4 entries that I used when trying to solve the PPPoE performance issues. Now I deleted all custom entries, rebooted, and everything's good again.

Questions:

  • why this problem only with v23? With v22 I had no issues, even with those tunables.
  • is there a way to reset the tunables table to default with a command? I'm not sure everything's at default now.

Thanks again...now I'll make a bectl snapshot, just in case.

January 23, 2023, 11:09:45 PM #14 Last Edit: January 23, 2023, 11:11:21 PM by franco
The new kernel patching in 23.1 may react differently to the tunables being used. It's half way to FreeBSD 13.2 now (not released yet) vs. the more officially aligned FreeBSD 13.1 reference used in 22.7.

However, tunables (sysctls) were not being reinvoked after boot sequence completes in 22.7. That's where the educated guess came from actually.

If you can reproduce the issue on 22.7.11 with your previous tunables using the following...

# /usr/local/sbin/pluginctl -s sysctl restart

It's the tunables itself that cause this (might be good to know which ones cause this) and if not reproducible it's likely 23.1 kernel patching. Yet even in this case it's caused by the "bad" tunables and it would be interesting to narrow this down.


Cheers,
Franco