Multiple retries before acquiring WAN IP (if ever)

Started by mrElement, March 10, 2024, 11:52:03 PM

Previous topic - Next topic
March 10, 2024, 11:52:03 PM Last Edit: March 10, 2024, 11:55:01 PM by mrElement
Hello,
Very new to OPNsense and a networking Novice to say the least here! There is an issue I am facing that I would really like to resolve if possible, but I come in this fully aware that it might be my lack of knowldege and knowhow that might be at fault, rather than OPNsense or any of the other actors in play. Any help or insights would be greatly appreciated.

My setup (hardware)
Before diving into my issue specifically, I think it would be usefull if I presented my setup, as it also kind of plays a role into the problem that I am facing.

  • Huawei GPON ONT ISP box in bridge mode connected to...
  • TP-Link 5-port switch unmanaged connected  to...
  • 2x Physical Computers acting as Proxmox servers (OPNsense runs on the first one as a VM)

    • each "server" has a dedicated single NIC from their motherboard and an additional 4 NIC PCI Intel intel i340-t4 network card
  • TP-Link 8-port switch managed connected to the server-nodes in the following configuration

    • each server's motherboard NIC serves as the Proxmox node's LAN NIC
    • the 1st NIC of the Intel card servers as OPNsense LAN network NIC
    • the 2nd NIC of the Intel card serves as OPNsense WAN network NIC

My setup (software)
OPNsense (OPNsense 24.1.3_1-amd64) as I have mentioned already runs as a VM inside of one of the Proxmox servers.

  • VM has 4 cores and 8GB of RAM along with 30GB of disk space
  • BIOS is SeaBIOS, VM has direct access to the host CPU
  • VM has attached, two network devices which are Linux VM bridges through the VirtIO (paravirtualized) model

    • vmbr1 which corresponds to the 1st physical NIC of the Intel card (LAN)
    • vmbr2 which corresponds to the 2nd physical NIC of the Intel card (WAN)
  • my ISP provides connectivity through DHCP using MAC address binding which I have noted and I am always spoofing on the OPNsense level (in the WAN interface settings)
  • I am using qemu-agent plugin and the relevant option is enabled on Proxmox level
  • I have turned off IPv6 on both LAN/WAN interfaces

The issue
Essentially my issue is that OPNsense whenever the Proxmox server shuts down (or maybe restarts - I need to retest that scenario) has a great deal of difficulaty acquiring an IP from my ISP via DHCP on the WAN interface. It looks like the WAN interface is flapping but I could be wrong and it could be the dhcp service of OPNsense. The OPNsense for sure won't get an IP upon booting and then a substantial amount of time (~12-15') will pass without it being able to acquire a stable lease. Often as the minutes pass I will see multiple attempts (on the GUI) of Gateway(s) appearing and disappearing, IPs being assigned to the WAN interface and disappearing as well, until very later on it will stabilize. I haven't been able to pinpoint the exact pattern that makes it stabilize as hard as I've tried. Nothing seems to work straight away, but my go-to moves are:

  • Deactivating and reactivating the WAN interaface (via GUI)
  • Restarting the DHCP service
  • Restarting the routing service
  • Restarting the VM (only - not the Proxmox server)

Complications/Workarounds

  • This happens only on Proxmox server restart not on OPNsense VM restart
  • This happens also in previous OPNsense version and it's not exclusive to 24.1
  • If I use the E1000 (Emulated) model for the interface bridges (vmbr1,vmbr2) the problem seems to go away however I lose around ~20Mbit of throughput from my ISP connection.
  • If I use direct passthrough of NICs as PCI devices (instead of Linux Bridges) the problem for sure goes away however I lose the option to be able to migrate the VM from one Proxmox node to the next as it is "bound" to physical hardware.

    • This also kind of verifies that there is nothing wrong with physical hardware in play (such as switches, ethernet cables etc.)

I would really appreciate anyone's time and help on the matter. Things I could try, or change both in OPNsense or in my Proxmox/VM setup. Things I might have missed.

I have inlcuded a system -> general log and redacted whatever seemed necessary - I apologize if I've forgoten something.

I know that this sounds more of a Proxmox issue than an OPNsense issue, and quite possibly it might be, however I would like to ensure there isn't some configuration or parameter or bug that I'm missing on the OPNsense side before throwing myself into a paravirtualization battle over at the Proxmox forums.

Thanks,
George
Thanks,
George

Small update: After frantically searching around Google, I had the idea of removing the Proxmox firewall setting from the OPNsense VM - but the specific 'Firewall' option that is by default enabled when you attach the vmbr interfaces.

It seemed to fix the issue initially (I tried my usual test by powering everything down - server/switches/ISP modem) but then it didn't follow-up the next day.

My next experiment was to play around with the NICs. I swapped the ports and I put all LAN interfaces (Proxmox node and OPNsense VM) on the Intel card NICs (I'm reading about a lot of WAN flapping and issues with intel cards) and I put the OPNsense WAN on the single NIC that was embedded on the motherboard.

Doing that and performing my usual "reboot all" test, initially, I didn't get an IP straight away. According to the log it failed once.

2024-03-11T20:48:23 Error opnsense /usr/local/etc/rc.bootup: The command '/sbin/umount '/var/unbound/lib'' returned exit code '1', the output was 'umount: /var/unbound/lib: not a file system root directory'
2024-03-11T20:19:08 Error opnsense /usr/local/etc/rc.bootup: The command '/sbin/umount '/var/unbound/lib'' returned exit code '1', the output was 'umount: /var/unbound/lib: not a file system root directory'
2024-03-11T20:19:07 Error opnsense /usr/local/etc/rc.bootup: ROUTING: not a valid wan interface gateway address: 'missing'
2024-03-11T19:56:13 Error opnsense /usr/local/etc/rc.bootup: The command '/sbin/umount '/var/unbound/lib'' returned exit code '1', the output was 'umount: /var/unbound/lib: not a file system root directory'
2024-03-11T19:56:13 Error opnsense /usr/local/etc/rc.bootup: ROUTING: not a valid wan interface gateway address: 'missing'
2024-03-11T19:52:39 Error opnsense /interfaces.php: ROUTING: not a valid wan interface gateway address: 'missing'
2024-03-11T19:52:38 Error opnsense /interfaces.php: ROUTING: not a valid wan interface gateway address: 'missing'
2024-03-11T19:51:21 Critical dhclient exiting.
2024-03-11T19:51:21 Error dhclient connection closed
2024-03-11T19:49:00 Error opnsense /usr/local/etc/rc.routing_configure: ROUTING: not a valid wan interface gateway address: 'missing'
2024-03-11T19:48:32 Warning opnsense /usr/local/sbin/pluginctl: The required WAN_GW IPv4 interface address could not be found, skipping.
2024-03-11T19:48:32 Warning opnsense /usr/local/sbin/pluginctl: Skipping gateway WAN_GW due to empty 'gateway' property.
2024-03-11T19:48:32 Warning opnsense /usr/local/sbin/pluginctl: Skipping gateway WAN_GW due to empty 'monitor' property.
2024-03-11T19:47:56 Warning opnsense /usr/local/etc/rc.routing_configure: The required WAN_GW IPv4 interface address could not be found, skipping.
2024-03-11T19:47:56 Warning opnsense /usr/local/etc/rc.routing_configure: Skipping gateway WAN_GW due to empty 'gateway' property.
2024-03-11T19:47:56 Warning opnsense /usr/local/etc/rc.routing_configure: Skipping gateway WAN_GW due to empty 'monitor' property.
2024-03-11T19:47:56 Error opnsense /usr/local/etc/rc.routing_configure: ROUTING: not a valid wan interface gateway address: 'missing'
2024-03-11T19:40:30 Error opnsense /usr/local/etc/rc.bootup: The command '/sbin/umount '/var/unbound/lib'' returned exit code '1', the output was 'umount: /var/unbound/lib: not a file system root directory'
2024-03-11T19:40:30 Error opnsense /usr/local/etc/rc.bootup: ROUTING: not a valid wan interface gateway address: 'missing'


But then after a very little while (somewhat 40-60'' it got an IP). I'll try it again in more than 24 hours as it seems my "reboot all" test is not reliable when executed back-to-back after configuration changes for some reason.

Question from whoemever might be reading this
Youtube videos suggest that once you setup OPNsense and provided all is OK with your ISP it will "just work". And it just does tbh, as mentioned above, when using passthrough.

But I see that it detects and creates a NAT Gateway each time and being the networking dummy that I am I would like to ask, should I perhaps define that Gateway to somehow be permanently there instead of allowing it to detect it and set it up automatically?

Thanks,
George
Thanks,
George