Poor Throughput (Even On Same Network Segment)

Started by hax0rwax0r, August 25, 2020, 08:31:25 PM

Previous topic - Next topic
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.

Quote from: DiHydro on February 11, 2021, 09:40:20 PM
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.
Using FQ_Codel or IPS are more secondary to the overall discussion here. Both of these will consume a large amount of CPU cycles and won't illustrate the true throughput capabilities of the firewall due to their own inherent overhead.

I run a J3455 with a quad port Intel I340 NIC, and can easily push 1gigabit with the stock ruleset and have plenty of CPU overhead remaining. This unit can also enable FQ_Codel on WAN and still push 1gigabit, although CPU usage does increase around 20% at 1gigabit speeds.

I don't personally run any of the IPS components so I don't have any direct feedback on that. It's worth noting that both of these tests are done on a traditional DHCP WAN connection. If you're using PPPoE, that will be single thread bound and will limit your throughput to the maximum speed of a single core.

What most of the transfer speed tests are illustrating here are that FreeBSD seems to have very poor scaling when using 10gbit virtualized NICs and forwarding packets. This isn't an OPNsense induced issue, more of an issue that OPNsense gets stuck with due to the poor upstream support from FreeBSD. For the vast majority of users on 1gigabit or lower connections, this won't be a cause for concern in the near future.

Quote from: opnfwb on February 11, 2021, 10:27:08 PM
Quote from: DiHydro on February 11, 2021, 09:40:20 PM
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.
Using FQ_Codel or IPS are more secondary to the overall discussion here. Both of these will consume a large amount of CPU cycles and won't illustrate the true throughput capabilities of the firewall due to their own inherent overhead.

I run a J3455 with a quad port Intel I340 NIC, and can easily push 1gigabit with the stock ruleset and have plenty of CPU overhead remaining. This unit can also enable FQ_Codel on WAN and still push 1gigabit, although CPU usage does increase around 20% at 1gigabit speeds.

I don't personally run any of the IPS components so I don't have any direct feedback on that. It's worth noting that both of these tests are done on a traditional DHCP WAN connection. If you're using PPPoE, that will be single thread bound and will limit your throughput to the maximum speed of a single core.

What most of the transfer speed tests are illustrating here are that FreeBSD seems to have very poor scaling when using 10gbit virtualized NICs and forwarding packets. This isn't an OPNsense induced issue, more of an issue that OPNsense gets stuck with due to the poor upstream support from FreeBSD. For the vast majority of users on 1gigabit or lower connections, this won't be a cause for concern in the near future.

It sounds like I may need to reset to stock configuration and try this again. I thought that in some of my testing I had disabled all options and was running the device as a pure router and still seeing the single core limitation. Maybe I was mistaken and did still have some option that had significant CPU usage. My cable modem gives a DHCP lease to my OPNsense box, so I am not running PPPoE. When directly connected to the modem I get 390-430 Mbit/s. That is what lead me to look at the actual firewall as a throttle point.

Quote from: DiHydro on February 11, 2021, 09:40:20 PM
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.

I wonder what throughput you would receive with a Linux based fw just to see what the hardware is capable of. I made the experience with the current opnsense 21.1 release that it gives me only ~50% throughput after performance tuning in a virtualized environment. A quick test with virtualized openwrt gave me full gigabit wire speed without any optimization needed. I know that's comparing apples and oranges but it's difficult to say what a hardware platform is capable of if you don't try different things.

Quote from: spi39492 on February 12, 2021, 04:19:49 PM
Quote from: DiHydro on February 11, 2021, 09:40:20 PM
I am curious if I am seeing this kernel problem on my bare-metal install. I have a passively cooled mini PC with 4 Intel NICs and a J1900 CPU at 2.00GHz and 4 GB of RAM. I know this CPU is fairly old, but the hardware sizing guide says I should be able to do 350-750 Mbit/s throughput. When I have no firewall rules enabled and the default IPS settings I get about 370-380 Mbit/s of my 400 Mbit/s inbound speed. If I enable firewall rules to set up fq_codel, then it drops my throughput to 320-340 Mbit/s. In both of these scenarios I see my CPU going up to 90+% on one thread. I do understand that my throughput will go down with different options like IPS and firewall rules, but I would think that with no other options running this hardware should be able to do better than 380 Mbit/s tops.

I wonder what throughput you would receive with a Linux based fw just to see what the hardware is capable of. I made the experience with the current opnsense 21.1 release that it gives me only ~50% throughput after performance tuning in a virtualized environment. A quick test with virtualized openwrt gave me full gigabit wire speed without any optimization needed. I know that's comparing apples and oranges but it's difficult to say what a hardware platform is capable of if you don't try different things.

I am going to try this in a day or two. IPfire is my choice right now, unless someone has a different suggestion. I will probably come back to OPNsense either way as I like this community and the project.

Quote from: DiHydro on February 12, 2021, 10:49:11 PM

I am going to try this in a day or two. IPfire is my choice right now, unless someone has a different suggestion. I will probably come back to OPNsense either way as I like this community and the project.

Yeah, I like opnsense as well. That's why it is so painful that in my setup the throughput is so limited. I did the tests with Debian and iptables on one hand and with openwrt on the other as it s available for many platforms and pretty simple to install on bare metal and in virtual environments.

So I put OPNsense on a PC that has an Intel PRO/1000 4 port NIC and an i7 2600, and with a default install I get my 450 mibt/s. Once I put a firewall rule in to enable fq_codel, then it drops to 360-380 mbit/s. I don't believe that an i7 at 3.4 GHz with an Intel NIC cannot handle these rules at full speed. What is wrong/what can I look at/how can I help make this better?

Quote from: DiHydro on February 16, 2021, 01:05:39 AM
So I put OPNsense on a PC that has an Intel PRO/1000 4 port NIC and an i7 2600, and with a default install I get my 450 mibt/s. Once I put a firewall rule in to enable fq_codel, then it drops to 360-380 mbit/s. I don't believe that an i7 at 3.4 GHz with an Intel NIC cannot handle these rules at full speed. What is wrong/what can I look at/how can I help make this better?

You can check with some of the performance setting tips laid out here https://forum.opnsense.org/index.php?topic=9264.msg93315#msg93315

I have exactly the same problem. Apparently there are problems with vmxnet3 vNIC here. It's sad but I can't get higher than 1.4 Gbps. Please don't come to me with hardware. Sorry folks, it's 2021. 10gbps is what every FW should be able to do by default. Opnsense is a wonderful product. But I think you are betting on a dead horse. Why not use Linux as OS? FreeBSD slept through the virtual world (see the s... vmxnet3 support and bugs). Now I'm out of my frustration and go back to work :).

Quote from: mm-5221 on February 21, 2021, 06:58:42 PM
I have exactly the same problem. Apparently there are problems with vmxnet3 vNIC here. It's sad but I can't get higher than 1.4 Gbps. Please don't come to me with hardware. Sorry folks, it's 2021. 10gbps is what every FW should be able to do by default. Opnsense is a wonderful product. But I think you are betting on a dead horse. Why not use Linux as OS? FreeBSD slept through the virtual world (see the s... vmxnet3 support and bugs). Now I'm out of my frustration and go back to work :).

So there's always an option to use IPFire for this use-case? :)

No, I switched from sophos UTM to opnsense some time ago. Now I do not want another migration. With the exception of WAF and that the firewall aliases are not connected to DHCP, I find that opnsense is a great product.
I have now solved my performance problem with the parameter hw.pci.honor_msi_blacklist 0. I get with -P10 (parallel jobs) with iperf3 between 8-9Gbps without IPS. With IPS unfortunately only 1.7Gbps (CPU only 30% utilized). I am still missing the performance tuning of IPS parameters in the UI. I think I could get 5-6Gbps with about 8 cores. With 12 cores should be 8-9gbps. Currently IPS/Suricata is artificially throttled somewhere in the configuration.

Do we have any solution here?

I have R620 (Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz - 8 cores) under ESXi7 and I have 700Mbps between OPNsense <> Ubuntu VM on the same host, while two Ubuntu VMs can do 7Gbps, 10 times faster.

Are there any news regarding this topic? Throughput is still slow on Opnsense 21.1.5  :'(
OPNsense 24.7.11_2-amd64

I've found a similar issue regarding slow transfers with iflib in TrueNas which has been solved.

Maybe we're facing the same issue here in OPNsense.

Please have a look at the following links/commits:

Maybe there is an expert here who can review the code snippets.

I really hope this issue can be solved soon.

Related Github ticket: #119
OPNsense 24.7.11_2-amd64