Lost access to everything after upgrade.

Started by Red Squirrel, July 31, 2024, 05:23:30 AM

Previous topic - Next topic
I'm in the process of upgrading my current firewall to Opnsense from an older version of Pfsense. I have it installed on a Sophos box and have a Dell switch with all the vlans setup so I can have a test environment before I deploy it live.  For now the WAN is connected to my internal network, so I had allowed the admin interface to be accessible, so I can play around with it from my regular PC instead of standing at it directly with the laptop plugged into the switch.

I had the basic stuff working and decided it would be a good time to run an upgrade to bring me to the latest version. (although I now see that this version is considered legacy?  So that's odd. Maybe the upgrade never took)

After the upgrade process and it rebooted I lost access through every port and can no longer get to the admin interface. It's basically blocking all traffic, even internally.  I had it setup so the admin interface is available on all interfaces as a temp measure and have the laptop plugged into a port setup as vlan 2 and LAN is setup as a trunk port to the switch and I get an IP from the firewall which indicates to me that network wise my setup is correct.  However I can no longer access the admin interface either internally via the laptop, or externally via my PC. Cannot ping either.  In the console I went to the firewall log and sure enough I can see that the packets are being blocked.

Is there a way I can gain access to the admin interface without having to do a factory reset?  Like some kind of way to turn off the firewall portion completely?  I'm guessing with the upgrade a default rule may have changed or something and the admin interface is now being blocked.  I had not explicitly allowed it, as it was working already. 

Since this box has 4 interfaces decided to configure a 3rd interface from the console to see if I could access the admin interface that way.  It worked... but I noticed that all my vlan assignments are GONE!  What happened and how do I prevent this in the future?  This is only a lab environment for now but this would be a huge disaster if it had happened in production and would completely cripple my entire infrastructure.    All the firewall rules and everything are gone. 

At this point I think I will just reinstall from scratch using the proper version but really wondering what even happened and how I can prevent this in the future. 

System: Configuration: History... check when the VLANs were allegedly removed and see which subsystem/event caused it.


Cheers,
Franco

I ended up clean installing so I can put the right version, but I am worried the same thing happens again...

Is this some kind of known issue? 

Also vlan naming is kinda odd as I have different vlans on different interfaces but it doesn't actually name it in a way where it's easy to tell which is which.

So I made my own convention where I call it vlan02.1 where 02 is the vlan number and 1 is the interface number.  Does the naming convention actually matter for anything within how the system works?  It seems fairly strict so I'm wondering if the way I'm naming them somehow messed something up.  If I leave it default it just names it vlan02 vlan03 and it doesn't even match the vlan tag number so it's very confusing.

Network device names in FreeBSD have a strict limit of 15 readable characters. We've ran into size issues with VLAN names before being stacked which are impossible to fix so we had to change the way VLANs are referenced a while ago.

The most likely culprit is an assignment of a volatile network device such as tailscale or USB-based NIC which may not present itself after bootup which prompts the interface system to reset the interfaces because it thinks a hardware device went missing / the hardware was swapped. In such cases it is beneficial to use the "lock" feature in the interface settings so ensure the system doesn't reset this interface if it is not found.

That's just pure speculation without a config diff though.


Cheers,
Franco

Quote from: franco on July 31, 2024, 08:55:38 AM
Network device names in FreeBSD have a strict limit of 15 readable characters. We've ran into size issues with VLAN names before being stacked which are impossible to fix so we had to change the way VLANs are referenced a while ago.

Sorry for hijacking the thread - one question about that. Perfectly fine with that decision. I renamed all the VLANs from e.g. lagg0_vlan10 to vlan0.10 on the two units I upgraded last weekend. Question: why does it have to start with a 0, though? Why can't I name the vlan interface with tag 10 "vlan10" like I do everywhere else where I run FreeBSD but not OPNsense?

Kind regards,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Yeah I made sure to lock all the interfaces after previously having them all wiped out and found that I need to set that option after googling and finding this was an issue.  No USB devices here there are 4 built in interfaces. (it's a Sophos XG115 device)

Either way I just did an update now and it didn't happen again, although I did not configure all the vlans yet just the main one that's used for LAN. I think once I put this in prod I'll just keep my old firewall in the rack for a while until I can confirm this won't be an issue again.

@Patrick

The prefix "0" is to ensure the VLANs created by OPNsense never clash with VLANs created by FreeBSD/ifconfig, but still adhere to the "[a-z]+[0-9]+" regex for matching device names.

Something that happened a lot with tun in the past were otherwise created and overlapping expectations of who owns tun0, tun1 etc.. (and one reason why "tun" devices are not showing up in the GUI).


Cheers,
Franco