Hundreds of VLAN interfaces cause degraded performance

Started by shaun90, January 21, 2025, 12:37:03 PM

Previous topic - Next topic
I have been experimenting with a test network setup that has over 250 VLANs in a HA setup. I am currently running the setup inside QEMU/KVM (Proxmox) with the intent of running it on dedicated hardware without virtualisation once testing is complete.

I added the VLANs by using a script to modify the conf XML and re-importing the XML via the GUI.

98% of the VLANs do not have an IP address associated with the interface, but do have a CARP Virtual IP (they mainly use the same VHID).

The problems I am currently facing with this setup are:

- Boot times of the system is upwards of 15 minutes. This is mainly spent in configuring each VLAN interface.

- Disabling CARP for maintenance takes ~5 minutes with 100% CPU being used throughout the time.

- The GUI becomes very slow on pages that show metrics. The slowness feels like it is on the browser/client side, but some server requests can take a while to complete too.

I went digging into some of the problems. The first and second problems mainly seem to be related to the opnsense hook scripts where it runs scripts when the CARP interface configuration state is changed.

It seem that the following process is spawned for each interface when disabling CARP: "/usr/local/opnsense/scripts/openvpn/ovpn_service_control.php -a configure" which I think gets hooked from within an openvpn script in "/usr/local/etc/rc.syshook.d/carp/". The ovpn script calls "ifconfig -a -m". It appears with a lot of interfaces, the ifconfig command takes several seconds to complete and uses 100% of a CPU core. It appears that several of these ovpn scripts run in parallel which maxes a multicore system.

After removing the ovpn hook script, disabling CARP takes less than 1 minute and the start-up time halved. OpenVPN is not in use on my test system so I guess the script isn't doing anything useful in my case.

There are other scripts using ifconfig too, so I did experiment with wrapping the ifconfig binary to cache the output of various arguments. Wrapping the binary brought some performance benefit too.

I haven't looked much into the GUI performance, but it feels like something in the browser is causing it to lag (client side Javascript?).

Has anybody else had experiences like this hundreds of interfaces?

Thanks :)

What about running OPNsense as a vSphere virtual machine and trunking the VLANs to ESXi as portgroups which you can present to the VM as vNICs?