Low Download Bandwidth with OPNsense 22.1.8_1 on Proxmox

Started by weekend_warrior1999, June 24, 2022, 05:53:23 AM

Previous topic - Next topic
June 24, 2022, 05:53:23 AM Last Edit: June 25, 2022, 12:33:46 AM by weekend_warrior1999
Proxmox VE 7.2-4
Bare metal specs:
16c/32t (Xeon E5 2680 @2.7Ghz) x2
128GB Ram
1Gbit ehternet
10Gbit SFP adapter
Host Machine utilization about <10% CPU and <30% RAM

VM Specs: "Overkill, I know but was trying to get this thing to max the connection out"
16 cpu cores
32GB Ram
cpu setting: Host+AES
OPNsense 22.1.8_1
Max VM CPU utilization seen was about 15% on any single core, typically only seen 1 or 2 cores reach this, others are 98-99% idle

Logical setup

1Gbit/50Mbit WAN Cat6e <-> Prox. Host "eno1" <-> vmbr1 <-> VM Nic "vtnet1" DHCP "Non-PPPoe" <-> OPNsense <-> VM Nic "vtnet0" <-> vmbr0 <-> Prox. Host "ens3f1" <-> 10Gbe LAN

All hardware Offloading off in GUI
Suricata disabled
1 firewall rule setup for port forwarding of another server
Also added: "Found these recommendations elsewhere

loader.conf.local
net.link.ifqmaxlen="8192"               #default 50
hw.vtnet.rx_process_limit="8192"        #default 1024
net.isr.defaultqlimit="2048"            #default 256
net.isr.maxqlimit="40960"               #default 10240
hw.vtnet.mq_disable="1"                 #default 0
hw.vtnet.lro_disable="1"                #default 0
hw.vtnet.tso_disable="1"                #default 0
hw.vtnet.csum_disable="1"               #default 0
net.pf.states_hashsize="16777216"       #default 131072
hw.pci.honor_msi_blacklist="0"          #default 1
net.inet.rss.bits="4"                   #default 4, Suppsoed to be 4 x num of cores present
hw.ibrs_disable = 1                     #default 0
net.inet.tcp.hostcache.enable="0"
net.inet.tcp.hostcache.cachelimit="0"
net.inet.tcp.soreceive_stream="1"
net.pf.source_nodes_hashsize="1048576"  # (default 32768)
net.isr.maxthreads="-1"  # (default 1, single threaded)
net.isr.bindthreads="1"  #was 1 (default 0, runs randomly on any one cpu core)
machdep.hyperthreading_allowed="1"  # (default 1, allow Hyper Threading (HT))


sysctl.conf
net.inet.tcp.minmss=536  # (default 216)
net.inet.tcp.abc_l_var=44   # (default 2) if net.inet.tcp.mssdflt = 1460
net.inet.tcp.initcwnd_segments=44            # (default 10 for FreeBSD 11.2) if net.inet.tcp.mssdflt = 1460
net.inet.tcp.cc.abe=1  # (default 0, disabled)
net.inet.tcp.rfc6675_pipe=1  # (default 0)
net.inet.tcp.syncache.rexmtlimit=0  # (default 3)
net.inet.ip.maxfragpackets=0     # (default 63474)
net.inet.ip.maxfragsperpacket=0  # (default 16)
net.inet6.ip6.maxfragpackets=0   # (default 507715)
net.inet6.ip6.maxfrags=0         # (default 507715)
net.inet.tcp.syncookies=0  # (default 1)
net.inet.tcp.isn_reseed_interval=4500  # (default 0, disabled)
net.inet.tcp.tso=0  # (default 1)
kern.random.fortuna.minpoolsize=128  # (default 64)
kern.random.harvest.mask=33119   # (default 33247, FreeBSD 13 with Intel Secure Key RNG)

#General Dos/security related
hw.kbd.keymap_restrict_change=4    # disallow keymap changes for non-privileged users (default 0)
kern.elf32.allow_wx=0              # disallow pages to be mapped writable and executable, enforce W^X memory mapping policy for 32 bit user processes (default 1, enabled/allow needed for chrome, libreoffice and go apps)
kern.elf64.allow_wx=0              # disallow pages to be mapped writable and executable, enforce W^X memory mapping policy for 64 bit user processes (default 1, enabled/allow needed for chrome, libreoffice and go apps)
kern.ipc.shm_use_phys=1            # lock shared memory into RAM and prevent it from being paged out to swap (default 0, disabled)
kern.msgbuf_show_timestamp=1       # display timestamp in msgbuf (default 0)
kern.randompid=1                   # calculate PIDs by the modulus of an integer, set to one(1) to auto random (default 0)
net.bpf.optimize_writers=1         # bpf is write-only unless program explicitly specifies the read filter (default 0)
net.inet.icmp.drop_redirect=1      # no redirected ICMP packets (default 0)
net.inet.ip.check_interface=1      # verify packet arrives on correct interface (default 0)
net.inet.ip.portrange.first=32768  # use ports 32768 to portrange.last for outgoing connections (default 10000)
net.inet.ip.portrange.randomcps=9999 # use random port allocation if less than this many ports per second are allocated (default 10)
net.inet.ip.portrange.randomtime=1 # seconds to use sequental port allocation before switching back to random (default 45 secs)
net.inet.ip.random_id=1            # assign a random IP id to each packet leaving the system (default 0)
net.inet.ip.redirect=0             # do not send IP redirects (default 1)
net.inet6.ip6.redirect=0           # do not send IPv6 redirects (default 1)
net.inet.tcp.blackhole=2           # drop tcp packets destined for closed ports (default 0)
net.inet.tcp.drop_synfin=1         # SYN/FIN packets get dropped on initial connection (default 0)
net.inet.tcp.fast_finwait2_recycle=1 # recycle FIN/WAIT states quickly, helps against DoS, but may cause false RST (default 0)
net.inet.tcp.fastopen.client_enable=0 # disable TCP Fast Open client side, enforce three way TCP handshake (default 1, enabled)
net.inet.tcp.fastopen.server_enable=0 # disable TCP Fast Open server side, enforce three way TCP handshake (default 0)
net.inet.tcp.finwait2_timeout=1000 # TCP FIN_WAIT_2 timeout waiting for client FIN packet before state close (default 60000, 60 sec)
net.inet.tcp.icmp_may_rst=0        # icmp may not send RST to avoid spoofed icmp/udp floods (default 1)
net.inet.tcp.keepcnt=2             # amount of tcp keep alive probe failures before socket is forced closed (default 8)
net.inet.tcp.keepidle=62000        # time before starting tcp keep alive probes on an idle, TCP connection (default 7200000, 7200 secs)
net.inet.tcp.keepinit=5000         # tcp keep alive client reply timeout (default 75000, 75 secs)
net.inet.tcp.msl=2500              # Maximum Segment Lifetime, time the connection spends in TIME_WAIT state (default 30000, 2*MSL = 60 sec)



Iperf3, proof.ovh.net and Speakeasy results are very similar, approximately 300Mbps/49Mbps on Proxmox
To be honest none of the settings I have changed or experimented with have made any noticeable improvement or loss.


I can shut down this VM, clone it on the same host "I have a clustered environment", setup Ubuntu as the OS using identical logical setup and VM specs, and easily max the connection out on both iperf3 and proof.ovh.net at 940Mbps/49Mbps So I know it's not something on the Host side and specifically some setting I'm missing within OPNsense itself.

June 24, 2022, 04:46:55 PM #1 Last Edit: June 24, 2022, 04:59:35 PM by weekend_warrior1999
After upgrading to 22.1.9 my bandwidth has increases to 550-600Mbs with IPS off, so it has improved... But I would love to be able to get the remaining 300~Mbps out of this connection before enabling IPS

Also noticed that after the update while my speeds have increased, in "htop" I'm now seeing 1 vcore being heavily utilized during the download test. It hits about 85-90% utilization.

The loader variables seem to disable parts that may help vtnet. I'd roll back everything and just run these on 22.1.9:
Quotehw.pci.honor_msi_blacklist="0"
hw.ibrs_disable = 1
net.inet.rss.enabled = 1
vm.pmap.pti = 0
net.isr.maxthreads = -1

I'd also take the VM down to 4 vCPU cores, just to rule out some weird NUMA balancing issue with a full 16 cores assigned. Even if you had a NUMA split between the two CPU cores, they should still easily be able to max a 1gbit connection but just to rule out that variable, drop to 4 vCPU.

Finally, make those settings above in System/Settings/Tunables. And remove all of your custom loader.conf.local and sysctl variables. If you need to, do a full reset on OPNsense and go back and just set the Tunables above. I'd run speedtests with both vtnet NICs and also E1000 NICs, see what happens and report back?

Lastly, it may also help to pick more modern hardware for the OPNsense VM. I don't use proxmox but I know a lot of hypervisors emulate something like an old i440FX chipset with a PCI bus. You should have the option for a more modern chipset (ICH9 or something) that may also help with scheduling and multi-queuing.

Yes! That's loads better! I'm consistently getting about 900Mbps and the load is spread among all the cores rather than just one core!