21.7 Boot hang at “Configuring VLAN interfaces...” with imported 21.1 config

Started by MacLemon, July 09, 2021, 05:17:04 PM

Previous topic - Next topic
I found the time to retry with
- Export on 21.1.9_1 on the existing firewall
- Import into fresh 21.7 release intall (with ZFS) on the new hardware

Same procedure as described already, same result where it just hangs at configuring VLANs.

Thanks to everyone else who chimed in to help resolve this!

@franco
Would it still be of any value to send you the whole config (redacted) that I'm using? Or is it already certain, that the underlying issue is elsewhere?

Unsure what more I could contribute at this point.

This sounds concerning, I also have a 3558 based system with a LAGG using a chelsio 10G sfp+ nic and two "backup" igb interfaces.

Take-home message here from Franco is to wait? Is thi caused by the new kernel?

Those who are experiencing this problem:

Any chances that you can disable msix on the igb interfaces and see if that helps?

root@fw_i5:~ # sysctl -a | grep dev.igb | grep msix
dev.igb.3.iflib.disable_msix: 1
dev.igb.2.iflib.disable_msix: 1
dev.igb.1.iflib.disable_msix: 1
dev.igb.0.iflib.disable_msix: 1
root@fw_i5:~ #


You can set these from System -> Settings -> Tunables and you need a reboot.

I can give it a shot.  If I set them in 21.1.9_1, will the tunables survive a reboot, or do I need to set them in the boot menu option 3 after the upgrade?  And I'm assuming there's a # for each interface, so on my 6-port qotom, there'd be dev.igb.[0-5].iflib.disable_msix:1?

Answered my own question.  Currently the setting on one of my firewalls:
root@inner-fw2:~ # sysctl -a | grep dev.igb | grep msix
dev.igb.5.iflib.disable_msix: 0
dev.igb.4.iflib.disable_msix: 0
dev.igb.3.iflib.disable_msix: 0
dev.igb.2.iflib.disable_msix: 0
dev.igb.1.iflib.disable_msix: 0
dev.igb.0.iflib.disable_msix: 0
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

If you can set these from System -> Settings -> Tunables and they should survive a reboot.

Yep, knew they survived a reboot ( I wrote that wrong in my previous post), don't know (yet) if they survive an upgrade.

So, I set the tunables:
root@inner-fw2:~ # sysctl -a | grep dev.igb | grep msix
dev.igb.5.iflib.disable_msix: 1
dev.igb.4.iflib.disable_msix: 1
dev.igb.3.iflib.disable_msix: 1
dev.igb.2.iflib.disable_msix: 1
dev.igb.1.iflib.disable_msix: 1
dev.igb.0.iflib.disable_msix: 1


Rebooted, and tested a sync from inner-fw1, everything looked good.
Tried to update and the "update repository" step was just hanging (gave up after 5 minutes, rebooted again)
Tried to update again.

And now, I have an upgraded 21.7 system that is not hanging at the configuring VLANs step!!! (Yay! :) )

SSH Login banner:
Last login: Fri Jul 30 02:26:46 2021
----------------------------------------------
|      Hello, this is OPNsense 21.7          |         @@@@@@@@@@@@@@@
|                                            |        @@@@         @@@@


The tunables did indeed survive the upgrade:
root@inner-fw2:~ # sysctl -a | grep dev.igb | grep msix
dev.igb.5.iflib.disable_msix: 1
dev.igb.4.iflib.disable_msix: 1
dev.igb.3.iflib.disable_msix: 1
dev.igb.2.iflib.disable_msix: 1
dev.igb.1.iflib.disable_msix: 1
dev.igb.0.iflib.disable_msix: 1


And this:
root@inner-fw2:~ # opnsense-version
OPNsense 21.7 (amd64/OpenSSL)



I tested an HA Sync (Successful) and a failover (Successful) (still running on the secondary as I write this).

So, this is a driver issue for the igb devices?  For sure going to set these on the primary HA firewall, probably in a day or so, gonna let this cook.  I have a third non-HA firewall that has no VLAN settings (Not that it's a VLAN issue, but that seemed to have been the trigger), should I disable the dev.igb.#.iflib.disable_msix on that firewall as well?

I tried finding info on the that tunable, wasn't able to, can you explain what it is?  Found it:  Disables MSI-X interrupts for the device.


And, finally, THANK YOU to you, franco and anyone else who helped on this!


Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

Here is a test kernel that takes out a FreeBSD patch as per Murat's suggestion:

# opnsense-update -zkr 21.7.r_6
# opnsense-shell reboot

If this one works without the tunables set that would be the kernel we can push out today without much delay.


Thanks,
Franco

Alrighty ;)

On the same system I've been using all along, standby firewall in HA setup:
Updated the kernel: opnsense-update -zkr 21.7.r_6
Rebooted: opnsense-shell reboot (tunables still in place).
Remove tunables.
Reboot via GUI.

System booted successfully!  Not sure how to check the running kernel, opnsense-version still returns 21.7.



Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

# uname -v
FreeBSD 12.1-RELEASE-p19-HBSD  79ea2ec061b(master) SMP

Alternatively

# opnsense-version kernel
21.7.r_6

But opnsense-version can only say which package is installed, not if the kernel is booted.


Cheers,
Franco

Not sure if it's the right way, in the GUI Firmware->updates panel, it shows this now:

Current Version: 21.7.r_6.
New Version: 21.7

And it's wanting to upgrade :)



Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

That's correct, but you don't want the stock 21.7 kernel at the moment. ;)

uname -v is the best bet to verify.

Quote from: franco on July 30, 2021, 10:20:40 AM
That's correct, but you don't want the stock 21.7 kernel at the moment. ;)

uname -v is the best bet to verify.

yea, I figured it might not go well ;)

root@inner-fw2:~ # opnsense-version kernel
21.7.r_6
root@inner-fw2:~ # uname -v
FreeBSD 12.1-RELEASE-p19-HBSD  79ea2ec061b(master) SMP
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

Looks good. Did you also have the hang at VLAN with LAGG? It's been a bit confusing...


Cheers,
Franco

Quote from: franco on July 30, 2021, 10:25:06 AM
Looks good. Did you also have the hang at VLAN with LAGG? It's been a bit confusing...


Cheers,
Franco

Yes, without the tunables mb posted, or the kernel downgrade/upgrade you have given, this machine hangs at the configuring VLANs boot message.
Dual Virtual OPNsense on PVE with HA via CARP
Node 1: OPNsense 24.7.3_1 - Protectli Vault FW6E (i7)
Node 2: OPNsense 24.7.3_1 - Qotom-Q555G6-S05 (i5)

Ok, perfect. Thanks a lot!  :)

More feedback from others welcome.  Now we have to wait and see if stability issues reported around igb are also addressed by this change.


Cheers,
Franco