Poor routing performance on DEC3840

Started by Berzerker, May 23, 2022, 01:22:25 AM

Previous topic - Next topic
May 23, 2022, 01:22:25 AM Last Edit: May 23, 2022, 01:50:48 AM by Berzerker
Doing some testing cross-VLAN I noticed I'm only able to get about 8-900Mbps or around 3Gbps of fully-threaded routing performance.

My switching hardware is mostly unifi, using DACs between the firewall and the switch.

Any settings/tunables to check for any obvious performance issues?

EDIT: The test is being run between 2 VMs on the same proxmox box in different subnets.

If I try to run iperf3 from the opnsense CLI from one interface address to a VM on a different network, I actually get full 10Gb speed, but whenever traffic is coming into the opnsense router first (from a computer/workstation, VM, etc), then back out routed is where the performance issues happen.

If you run iperf3 from a OPNsense interface to a client, your only limiting factor is the single core performance of your CPU(s).

If you run iperf3 from client 1 to client 2 and have OPNsense in the middle, it has to do a lot of work routing the Packets with pf(4), which uses lots of CPU time.

Afaik iperf3 usually only creates one tcp stream, which isn't really a real world load on a firewall.
You could try to run multiple parallel streams with the -P flag:
Quote
-P, --parallel n
              number of parallel client streams to run. Note that iperf3 is single threaded, so if you are CPU bound, this will not yield higher throughput.
OPNsense: Intel Core i5-6500, 16 GB RAM, 2x 120GB SSD ZFS-mirror, 4x Intel i350-T4

Quote from: _Alchemist_ on May 28, 2022, 07:22:58 PM
If you run iperf3 from a OPNsense interface to a client, your only limiting factor is the single core performance of your CPU(s).

If you run iperf3 from client 1 to client 2 and have OPNsense in the middle, it has to do a lot of work routing the Packets with pf(4), which uses lots of CPU time.

Afaik iperf3 usually only creates one tcp stream, which isn't really a real world load on a firewall.
You could try to run multiple parallel streams with the -P flag:
Quote
-P, --parallel n
              number of parallel client streams to run. Note that iperf3 is single threaded, so if you are CPU bound, this will not yield higher throughput.

I mentioned in my post that these results were "fully-threaded" as in, running 4 or 8 parallel streams to take advantage of the multi-core performance. The numbers posted by Deciso were tested using IMIX which should give you *worse* performance than iperf3, so something is off with my setup or these numbers are not correct.