[HOWTO] OpnSense under virtualisation (Proxmox et.al.)

Started by meyergru, November 21, 2024, 10:43:58 AM

Previous topic - Next topic
I think it may be worth sticking something in about cpu affinity/cpu units.

I'm moving all my setup around at the moment, but I noticed that my RTT have recently shot up on my gateways - Making my networking feel slow. They've effectively doubled.

I'm keeping an eye on this, but putting opnsense cpu units up to 10,000, Adsense - 8,000 brought them straight back down.

I do wonder if there is some way for proxmox to prioritise bridging, also


Greetings all, in the process of going through this myself, and I'm pondering the question of HA

My setup is just home/soho with DHCP on the WAN and a wireless WAN (USB dongle) as backup - the Proxmox host can handle the 'USBness' and present a standard network interface the hosts can use.

My query is around HA... I have it in my head that running a pair of OPNsense VMs on the same hardware would allow for failover between the two virtualised devices, which obviously doesn't protect from hardware failures but can allow for upgrades/maintenance/etc without interruption.  I've seen a few threads around on CARP + DHCP on the WAN interface (which I'd need to address), but I'm wondering if overall I'm vastly overcomplicating things... The wireless backup does masquerading in itself and has comms on 192.168/16 so that's happy to just live on a linux bridge with the VMs, and I can live with the double-NAT for that backup scenario.. The primary wired WAN though is a standard ISP DHCP (single lease available) so as I understand it, CARP there would be a problem - I've seen there are scripts around though to handle that failover via shutting down the WAN on the backup, which uses a duplicated MAC to carry the lease and ARP over.

As I said, I kinda feel like I might be overcomplicating things... I'm also considering that the host's 16GB of RAM being split to 2x 8G VMs may be a limitation if I start dropping in additional features like Suricata/Zenarmor/etc..

Are there any sort of recommendations/advice around on whether there's a "smart" way to do this or if I'm just being stupid?

Isn't HA primarily supposed to help against hardware failures? By putting both VMs on the same host, you won't gain much.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote from: meyergru on May 20, 2025, 08:53:11 AMIsn't HA primarily supposed to help against hardware failures? By putting both VMs on the same host, you won't gain much.

That's certainly the main benefit.. Though I'm thinking the "nice to haves" like seamless restarts for patches/upgrades/etc would be a useful addition.  It's just quite a bit more complicated a configuration.

I might just 'start it simple' to get a single node up and running, then if I'm still feeling keen I can replicate it and work on getting a second going.  My concern there was that I might need to put some more serious thought into how the hosts interfaces are setup and forwarded to the VM(s), but the more I look at it the more it seems like a simple linux bridge interface would do fine.

With VMs, you can just clone the machine, detach it from the network, upgrade it and then switch it with the original one. Also, a restart is really fast on VMs.
Using HA, you might introduce complexity that actually causes more harm than it prevents - yet, I don't know, since I have not tried that before.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote from: MicN on May 20, 2025, 08:29:28 AMGreetings all, in the process of going through this myself, and I'm pondering the question of HA

My setup is just home/soho with DHCP on the WAN and a wireless WAN (USB dongle) as backup - the Proxmox host can handle the 'USBness' and present a standard network interface the hosts can use.

My query is around HA... I have it in my head that running a pair of OPNsense VMs on the same hardware would allow for failover between the two virtualised devices, which obviously doesn't protect from hardware failures but can allow for upgrades/maintenance/etc without interruption.  I've seen a few threads around on CARP + DHCP on the WAN interface (which I'd need to address), but I'm wondering if overall I'm vastly overcomplicating things... The wireless backup does masquerading in itself and has comms on 192.168/16 so that's happy to just live on a linux bridge with the VMs, and I can live with the double-NAT for that backup scenario.. The primary wired WAN though is a standard ISP DHCP (single lease available) so as I understand it, CARP there would be a problem - I've seen there are scripts around though to handle that failover via shutting down the WAN on the backup, which uses a duplicated MAC to carry the lease and ARP over.

As I said, I kinda feel like I might be overcomplicating things... I'm also considering that the host's 16GB of RAM being split to 2x 8G VMs may be a limitation if I start dropping in additional features like Suricata/Zenarmor/etc..

Are there any sort of recommendations/advice around on whether there's a "smart" way to do this or if I'm just being stupid?
I think you have thought all pretty well on getting the right setup on the same host. You already have a view of complexity and challenges against the benefits and not much that can be added for you to make your mind up. You're being smart to think about it.
I also run it virtual on Proxmox without a fallback. I've thought about the HA element but not come to a strategy yet; I have another hardware but is not even close to the VM settings, so best I can do is put it in series (firewall behind a firewall). I have another proxmox node (not three for a cluster), my only third is an ESXi host, so thought about a second VM, BUT, the complexity has put it to later. I have other niggles I want to address first.
So my current emergency strategy if the host was to die is to have backups: I do an irregular clonezilla image of the (single) disk + backups of VMs to a separate storage. Definitively not a fall back but this is home not a business so I can _just_ get away with it.
If it was to take too long for us to be up and running, I probably would put the ISP's ethernet cable into the main eero that is in bridge mode. Put it in router mode.
Or, connect the WiFi clients to the other WAN coming into the house.
Short of it is, I think like you at the moment. That HA in the same virtualisation host is too complicated for the benefit it gives.

I intended to setup HA at some point but with VMs on 2 separate hosts, mostly because I use commodity hardware.
It was one of the reasons why I switched to bridging from PCIe passthrough (getting identical interface assignments).
I gave up because of CARP since I'm getting a single IPv4 (2 allowed in theory to handle router switching, but they are not always in the same subnet).
OTOH, I just stumbled on this reddit post (https://www.reddit.com/r/ZiplyFiber/comments/1311jz5/ziply_10g_install_pics/) and the OP claims he can use RFC1918 CARP VIPs with a GW pointing to his static public IP!
That was on pfSense but the same feature seems to exist on OPN as well. Is that a valid configuration?

In the meantime, I switched to bare metal and rely on backups (entire /conf directory for now, might extend to config of some plug-ins as well).
Backup HW is a VM...

On a single host, I relied on VM snapshots to handle potentially bad updates.
The downtime associated with updates is so small that I would not have dealt with HA just because of downtime!

Well maybe wrongly but I assumed that HA on OPN was possible for a single WAN. The docs https://docs.opnsense.org/manual/how-tos/carp.html show all IPs used for the HA setup are non-routable and show a single WAN link at the front of the router/switch.
So clearly it needs a router to route from WAN to LAN(s)/VIPs but I admit having revisited now, I'm unclear.
This is for me a theoretical exercise though, I have only one managed switch and as said before, my hardware is nowhere near ready to even contemplate it :)
Quote from: EricPerl on May 20, 2025, 07:55:41 PMIn the meantime, I switched to bare metal and rely on backups (entire /conf directory for now, might extend to config of some plug-ins as well).
Backup HW is a VM...

On a single host, I relied on VM snapshots to handle potentially bad updates.
The downtime associated with updates is so small that I would not have dealt with HA just because of downtime!
Yes exactly. The only addition I have is the Clonezilla image of the disk if the host's hard drive with it dies. Easy to replace with a spare and reapply it. I have to do it this way because my host for this VM is a commodity (brilliant little mini machine) that has only space for one hard drive inside, so no mirror possible. Lives in the living room, so a full PC won't do; a server even less of a chance or I get divorce papers served.

Quote from: cookiemonster on May 21, 2025, 12:50:50 AMWell maybe wrongly but I assumed that HA on OPN was possible for a single WAN. The docs https://docs.opnsense.org/manual/how-tos/carp.html show all IPs used for the HA setup are non-routable and show a single WAN link at the front of the router/switch.
So clearly it needs a router to route from WAN to LAN(s)/VIPs but I admit having revisited now, I'm unclear.

There's a thread here that has a bunch of folks trying/working on it (and a git repo with a few forks) using scripts to overcome the need for CARP on the WAN interface when only a single DHCP IP is available there (such as a home internet connection) - essentially you copy the WAN mac of the primary to the secondary and leave that interface shutdown.  When a CARP failover is triggered, the interface is brought up and the same DHCP lease is still valid.  There'd still need to be an ARP on the broadcast domain to update the forwarding tables in the local switch (/bridge in the case of a VM) for the new port, but there'd still be minimal impact.

Quote from: MicN on May 21, 2025, 01:54:20 AM
Quote from: cookiemonster on May 21, 2025, 12:50:50 AMWell maybe wrongly but I assumed that HA on OPN was possible for a single WAN. The docs https://docs.opnsense.org/manual/how-tos/carp.html show all IPs used for the HA setup are non-routable and show a single WAN link at the front of the router/switch.
So clearly it needs a router to route from WAN to LAN(s)/VIPs but I admit having revisited now, I'm unclear.

There's a thread here that has a bunch of folks trying/working on it (and a git repo with a few forks) using scripts to overcome the need for CARP on the WAN interface when only a single DHCP IP is available there (such as a home internet connection) - essentially you copy the WAN mac of the primary to the secondary and leave that interface shutdown.  When a CARP failover is triggered, the interface is brought up and the same DHCP lease is still valid.  There'd still need to be an ARP on the broadcast domain to update the forwarding tables in the local switch (/bridge in the case of a VM) for the new port, but there'd still be minimal impact.
Thanks for that. I'm glad to hear I've not gone mad just yet.

May 21, 2025, 07:34:04 PM #41 Last Edit: May 21, 2025, 08:10:12 PM by EricPerl Reason: After further research...
That thread looks like a hack to me. I'm not doing that.

The how-to on CARP leaves me with one big question: what's the WAN public IP range?
I suspect the article uses 172.18.0.0/24 as a placeholder.
In more realistic scenarios, you'd use a very small IP range (but a /30 is too small).

In the reddit thread I mentioned, the author uses an advanced setting of VIPs (the gateway).
He uses a CARP VIP in 192.0.2.0/24 (Edit: reserved per https://www.rfc-editor.org/rfc/rfc5737) and the GW is set to his static IPv4 WAN IP.

If that's a valid setup, all I'm missing is a static IP.
Using my current DHCP IP as static would be hacky (even if in practice it has never changed unless I swap gear).

> The how-to on CARP leaves me with one big question: what's the WAN public IP range?
I think, just a single public ip.

From reading both myself, it appears to me that the 172.x.y.z/24 indeed is just a placeholder but for a non-routable range But the "problem" is still there for this usage i.e. can be another but must be RFC1918. Docs are now clear to me. Just another range but 1918.
And if I read the gists and reddit thread correctly, it seems no static ip needed. The scripts (there are at least two) there is an older version that used stop/starts to services, more freebsd style (shell exec) and another which seems to be more opn-aware with pluginctl.
They do the request to renewal of the dhcp lease to the isp on the single existing one - the whole purpose of the exercise.

Definitively hacky as is a clubbing together the bits to "make it happen", not stateful though.

I'm not doing it either, just saving it in my "useful to check out" if/when I decide to give it a go. Potentially.

Hi everyone, thanks for this howto. I am having some troubles on the following hardware:
- Intel N5105
- Intel I226-V rev. 4

I have installed proxmox (latest version) and virtualised OPNSense (last version as well).
I am using VirtIO method, so both WAN and LAN are a linux bridge, not passed trough.
My ISP gives me 2.5/1G down/up plan, but this is the result of my speedtest:

From pve host itself:
Retrieving speedtest.net configuration...
Testing from Telecom Italia (79.17.151.204)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Cloudfire Srl (Milano) [0.90 km]: 19.152 ms
Testing download speed................................................................................
Download: 281.67 Mbit/s
Testing upload speed...
Upload: 205.35 Mbit/s

From OPNSense:
root@opnsense:~ # speedtest

   Speedtest by Ookla

      Server: Sky Wifi - Milano (id: 50954)
         ISP: TIM
Idle Latency:     5.46 ms   (jitter: 0.17ms, low: 5.31ms, high: 5.55ms)
    Download:   539.83 Mbps (data used: 915.9 MB)                                                   
                 46.37 ms   (jitter: 73.23ms, low: 3.21ms, high: 433.53ms)
      Upload:   623.16 Mbps (data used: 1.0 GB)                                                   
                 48.06 ms   (jitter: 35.79ms, low: 3.95ms, high: 465.33ms)
 Packet Loss:     0.0%

Tried as well an iperf3 test between a LXC container on the same LAN bridge and OPNSense:
Accepted connection from 192.168.2.8, port 40784
[  5] local 192.168.2.1 port 5201 connected to 192.168.2.8 port 40786
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  65.8 MBytes   549 Mbits/sec                 
[  5]   1.00-2.00   sec  65.6 MBytes   552 Mbits/sec                 
[  5]   2.00-3.00   sec  75.4 MBytes   632 Mbits/sec                 
[  5]   3.00-4.01   sec  68.5 MBytes   568 Mbits/sec                 
[  5]   4.01-5.00   sec  72.8 MBytes   618 Mbits/sec                 
[  5]   5.00-6.01   sec  68.6 MBytes   571 Mbits/sec                 
[  5]   6.01-7.00   sec  67.1 MBytes   567 Mbits/sec                 
[  5]   7.00-8.00   sec  76.1 MBytes   639 Mbits/sec                 
[  5]   8.00-9.00   sec  71.4 MBytes   599 Mbits/sec                 
[  5]   9.00-10.00  sec  77.0 MBytes   647 Mbits/sec                 
[  5]  10.00-10.01  sec  1.12 MBytes   759 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec   709 MBytes   594 Mbits/sec                  receiver

Does anyone know what could be the issue?

Tunables: https://imgur.com/a/tdpPeWr
Offloading: https://imgur.com/2fMhQQW

Thanks in advance.

Did you enable multiqueue on the VM NIC interfaces in Proxmox? The throughput you are getting suggests, you did not.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+