1
24.7 Production Series / How may I improve NAT performance?
« on: August 29, 2024, 01:35:04 pm »
I'm trying to set up an OPNsense virtual appliance but I'm having a hard time getting good performance, especially when it comes to NAT, that's where it really shines, the issue— that is.
Environment
The VM is on vSphere (type 1 hv) with tons of memory and CPU cores to throw around, compared to what it will replace. Disk on very fast vSAN storage. Towards the end I switched to a RAM disk to eliminate it as a potential bottleneck. I had already moved away from FreeBSD-based firewalls but policy routing is a nightmare on Linux, so here I am.
I found this article somewhere alleging that the issue was FreeBSD drivers in a Linux hypervisor, not the whole Scalar- vs Vector Packet Processing, as I thought it might be. This restarted my efforts to get back on FreeBSD.
QEMU/KVM confirmation
I've read a lot about macvtap being the second coming of passthrough interfaces in the absence of SR-IOV. I tried it and it kind of showed. Throughput was better than before. The last time I did these tests, the [recommended] paravirtual interface driver, VMXNET3, hardly broke past ½Gbit/s.
vSphere
It seemed like there was some true to that article, but I was mostly guessing my way through things on KVM thus I moved back to vSphere. I tried it again with SR-IOV [Virtual Function] NICs: passed the 900Mbit/s mark in Internet speed tests and iperf3. My uplink1 is only 1Gbits, it's really a bit more than that because my ISP factors in protocol overheads in order to deliver the advertised speed and avoid complaints, I assume.
Almost almost there
The thing is that I can get the full bandwitdh and sustain it with a modest 2-[AMD64]core Linux firewall with just a little over a gig of RAM.
>900Mbit/s is pretty good if you turn a blind eye to knowing that underlying issues are a fact preventing the full throughput of the link, and unfortunately, I was more than willing to do it. This has not been a quick-and-painless journey, exactly. I can't claim sexual abuse if I consent to it, but enough about work (JK).
At this point, another router was handling NAT and the PPPoE session, it was time to hand them over to OPNsense. I just show down a port on a switch and just as fast OPNsense was in control of a public IP address, when I did the tests though it was the worst results I've ever gotten: download throughput didn't reach 300Mbit/s, upload was well over 300Mbit/s on the other hand, well over the max upload speed I get from my ISP last time I checked. The last part isn't all too surprising bc they keep upgrading the service without warning (though without price hikes either).
NAT rules
My NAT setup is always the same regardless of platform. It's really only outbound (SNAT) on the public interface only and a handful of port forwards to the reverse proxy where the heavy lifting and internal NAT occurs. The forwards weren't set up yet. For the outbound NAT, I undo the rules created by the initial setup wizard, if any, and in its place add two rules:
Bare-NIC testing (i.e. like "-metal" but just thetip NIC that goes in the back, a little bare in the back)
The next thing I tried, using the same NIC that I know can handle the traffic because it has done so in Linux firewalls already — easily — I passed it through at the PCIe level to the VM. Full control. I had to shut down everything to do this, you need a career and a psych eval, and security clearance to shutdown a [small] vSAN cluster, I swear.
I did the tests again, (1.) offloaded NAT and PPPoE to the other router, it improved minimally. (2.) Performance tanked again while natting.
Overall, I learned OPNsense can reach pretty close to the full gig of my uplink which is also the speed of the slowest link in my wired network, so it's good enough. It just needs not to use [para]virtualized NICs. Resource-, or hardware-wise it must be able to NAT at that speed too, other systems are natting much faster than that already, case in point: the reverse proxy that hosts a bunch of virtual IP addresses so it NATs what it can't be proxied and might be conflicting in some way, such as TCP port 22.
Notes
Environment
The VM is on vSphere (type 1 hv) with tons of memory and CPU cores to throw around, compared to what it will replace. Disk on very fast vSAN storage. Towards the end I switched to a RAM disk to eliminate it as a potential bottleneck. I had already moved away from FreeBSD-based firewalls but policy routing is a nightmare on Linux, so here I am.
I found this article somewhere alleging that the issue was FreeBSD drivers in a Linux hypervisor, not the whole Scalar- vs Vector Packet Processing, as I thought it might be. This restarted my efforts to get back on FreeBSD.
QEMU/KVM confirmation
I've read a lot about macvtap being the second coming of passthrough interfaces in the absence of SR-IOV. I tried it and it kind of showed. Throughput was better than before. The last time I did these tests, the [recommended] paravirtual interface driver, VMXNET3, hardly broke past ½Gbit/s.
╭──────────────────────────────────────────────────────────────────────────────╮
RECAP/PROGRESS SUMMARY
├──────────────────────────────────────────────────────────────────────────────┤
CHR/OpenWRT/VyOS baseline ☑︎ Excellent.2
macvtap routing ☑︎ Good. >900Mbit/s
macvtap NAT ☐ (untested)
SR-IOV VF NIC routing ☐ (not there yet)
SR-IOV VF NIC NAT ☐ (not there yet)
PCIe NIC routing ☐ (not there yet)
PCIe NIC NAT ☐ (not there yet)
╰──────────────────────────────────────────────────────────────────────────────╯
vSphere
It seemed like there was some true to that article, but I was mostly guessing my way through things on KVM thus I moved back to vSphere. I tried it again with SR-IOV [Virtual Function] NICs: passed the 900Mbit/s mark in Internet speed tests and iperf3. My uplink1 is only 1Gbits, it's really a bit more than that because my ISP factors in protocol overheads in order to deliver the advertised speed and avoid complaints, I assume.
Almost almost there
The thing is that I can get the full bandwitdh and sustain it with a modest 2-[AMD64]core Linux firewall with just a little over a gig of RAM.
>900Mbit/s is pretty good if you turn a blind eye to knowing that underlying issues are a fact preventing the full throughput of the link, and unfortunately, I was more than willing to do it. This has not been a quick-and-painless journey, exactly. I can't claim sexual abuse if I consent to it, but enough about work (JK).
At this point, another router was handling NAT and the PPPoE session, it was time to hand them over to OPNsense. I just show down a port on a switch and just as fast OPNsense was in control of a public IP address, when I did the tests though it was the worst results I've ever gotten: download throughput didn't reach 300Mbit/s, upload was well over 300Mbit/s on the other hand, well over the max upload speed I get from my ISP last time I checked. The last part isn't all too surprising bc they keep upgrading the service without warning (though without price hikes either).
NAT rules
My NAT setup is always the same regardless of platform. It's really only outbound (SNAT) on the public interface only and a handful of port forwards to the reverse proxy where the heavy lifting and internal NAT occurs. The forwards weren't set up yet. For the outbound NAT, I undo the rules created by the initial setup wizard, if any, and in its place add two rules:
╭─┬───┬─────┬─────┬─────┬─────────────────┬───────────────┬────────────────────╮
│#│NAT│if │stack│proto│src:[port/type] │dst:[port/type]│NAT-to:[port/type] │
├─┼───┼─────┼─────┼─────┼─────────────────┼───────────────┼────────────────────┤
│1│neg│<wan>│IPv4 │any │<This Firewall>:*│any:* │-:- │
│2│yes│<wan>│IPv4 │any │any:* │any:* │masquerade-if:static│
╰─┴───┴─────┴─────┴─────┴─────────────────┴───────────────┴────────────────────╯
╭──────────────────────────────────────────────────────────────────────────────╮
RECAP/PROGRESS SUMMARY
├──────────────────────────────────────────────────────────────────────────────┤
CHR/OpenWRT/VyOS baseline ☑︎ Excellent.2
macvtap routing ☑︎ Good. >900Mbit/s
macvtap NAT ☐ (untested)
SR-IOV VF NIC routing ☑︎ Good. Slightly better than macvtap's.
SR-IOV VF NIC NAT ☒ Bad. ≈300Mbit/s↓ >300Mbit/s↑
PCIe NIC routing ☐ (not there yet)
PCIe NIC NAT ☐ (not there yet)
╰──────────────────────────────────────────────────────────────────────────────╯
Bare-NIC testing (i.e. like "-metal" but just the
The next thing I tried, using the same NIC that I know can handle the traffic because it has done so in Linux firewalls already — easily — I passed it through at the PCIe level to the VM. Full control. I had to shut down everything to do this, you need a career and a psych eval, and security clearance to shutdown a [small] vSAN cluster, I swear.
I did the tests again, (1.) offloaded NAT and PPPoE to the other router, it improved minimally. (2.) Performance tanked again while natting.
╭──────────────────────────────────────────────────────────────────────────────╮
RECAP/PROGRESS SUMMARY
├──────────────────────────────────────────────────────────────────────────────┤
CHR/OpenWRT/VyOS baseline ☑︎ Excellent.2
macvtap routing ☑︎ Good. >900Mbit/s
macvtap NAT ☐ (untested)
SR-IOV VF NIC routing ☑︎ Good. Slightly better than macvtap's.
SR-IOV VF NIC NAT ☒ Bad. ≈300Mbit/s↓ >300Mbit/s↑
PCIe NIC routing ☑︎ Good. No different than SR-IOV's.
PCIe NIC NAT ☒ Bad. No different than SR-IOV's.
╰──────────────────────────────────────────────────────────────────────────────╯
Overall, I learned OPNsense can reach pretty close to the full gig of my uplink which is also the speed of the slowest link in my wired network, so it's good enough. It just needs not to use [para]virtualized NICs. Resource-, or hardware-wise it must be able to NAT at that speed too, other systems are natting much faster than that already, case in point: the reverse proxy that hosts a bunch of virtual IP addresses so it NATs what it can't be proxied and might be conflicting in some way, such as TCP port 22.
Notes
1. i.e. the connection, not the actual bit rate
2. Bursting briefly past 1Gbit/s (1.1) w/test well underway. No dips below 1G. Results are consistent in iperf3 tested:
↳.1 server ←←← <this-router> ←←← client
↳.2 <server/this-router> ←←← client
↳.3 <server/this-router> →→→ client(-R)
↳.4 server ←←← <client/this-router>





I tried settings IP addreses, routing info, VLANs... Only VLANs work. Thankfully this works great on pfSense's FreeRADIUS (where ironically LDAP, secure or not, ain't much of a success) and I can keep that only for my MAC-based auth which is much nicer to manage in either of the two firewalls than in AD Users and Computers or AD Administrative Center or Windows Admin Center.
