Possible PF Software Bug Causing Slowness [Solved, Lower the ARP Expiration]

Started by eneerge, January 02, 2023, 01:15:42 AM

Previous topic - Next topic
January 05, 2023, 10:20:29 PM #15 Last Edit: January 05, 2023, 10:22:46 PM by schnipp
I can imagine. Opnsense has several performance issues. Regarding my Opnsense, routing traffic between different LAN segments (1 GBit/s each) is slow and drops to 40-60 MB/s (especially SMB transfers).

I can't exactly remember anymore. But, if I am right the issues began with version 18.x or 19.x, maybe while migrating to "iflib". The issues are still not solved and I started coming to terms with it. I don't know whether these performance issues are directly related to Opnsense or are the same in PFsense or vanilla BSD.

I noticed that performance increases when disabling IPsec (even if its not related to the routing between the above mentioned networks). Furthermore, "netflow" has a non-negligible negative impact on performance.

See also:

Edit: My Opnsense runs on bare metal (Supermicro A2SDi-4C-HLN4F)
OPNsense 24.7.11_2-amd64

I have a datacenter OPN on 22.1.10, Xeon E on Supermicro with X710 running 10G in both directions. It cant be a generic problem (and this device exports netflows to external)

Tried using the tunables Kirk recommended in that second post. Didn't make any difference, unfortunately. :(



January 08, 2023, 01:44:00 PM #20 Last Edit: January 08, 2023, 06:17:44 PM by schnipp
Quote from: mimugmail on January 07, 2023, 02:17:02 PM
https://forum.opnsense.org/index.php?topic=31753.msg153441#msg153441

960Mbit upload ...

Testing the throughput in the WAN compared to the LAN can have much more issues which lead to a degraded performance. So, I recommend to start testing the performance in the LAN. I don't want to hijack this thread with my observed problems. But maybe they are related. Regarding my previous post (#15) I did some tests again:

Scenario: Server <-> Opnsense <-> Client:


  • Client downloaded a 10GB file from the server via SMB in the LAN
  • All LAN links are 1GB/s
  • Server CPU load was around 10-15% (all cases)
  • Client CPU load was around 10-15% (all cases)
  • Opnsense CPU load was around 25% (all cases)


  • Result (regular configuration): Download speed was around 40MB/s
  • Result (regular configuration and service "samplicate" stopped): Download speed was around 60MB/s
  • Result (regular configuration and services "samplicate", "stgrongswan" stopped): Download speed was around 90MB/s

Disabling these both services increases the performance in the LAN a lot. But for me it is not an option to drop IPsec. Unfortunately, this situation of slow data throughput has existed for a very long time with no prospect of improvement.

Edit:
Playing with some tunables from here (#15) increases the performance for the third case from around 90MB/s to 100-105MB/s.
OPNsense 24.7.11_2-amd64

Why is your OPNsense involved in LAN to LAN traffic? Do you use a LAN bridge instead of a switch? You should definitely get a dedicated switch, then.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: pmhausen on January 08, 2023, 02:58:26 PM
Why is your OPNsense involved in LAN to LAN traffic? Do you use a LAN bridge instead of a switch? You should definitely get a dedicated switch, then.

Please don't be confused by the term "LAN", I use this in a broader sense (everything behind the firewall compared to WAN). Server and client are both in different LAN segments (VLANs) whereas the Opnsense routes the traffic between them based on pf rules. Additionally, the VLANs are on different physical interfaces, so no sharing of bandwidth is involved.
OPNsense 24.7.11_2-amd64

Just installed the Intel x540-t2. Exact same issue.

Quote from: eneerge on January 11, 2023, 07:14:38 AM
Just installed the Intel x540-t2. Exact same issue.

@eneerge: Have you also tested the performance and possible impact when stopping the services "samplicate" and "strongswan" on the Dashboard?
OPNsense 24.7.11_2-amd64

So, I just stumbled upon this post on reddit that mentioned the EXACT issue I described (speed dropping to the exact same rate). https://www.reddit.com/r/HomeNetworking/comments/p63zbo/calix_ont_to_3rd_party_router_not_working/

Last post mentions that it's caused by the ARP timeout. Maybe this needs to be statically assigned to resolve. I will test at some point.

July 12, 2023, 07:16:43 AM #26 Last Edit: July 12, 2023, 07:27:31 AM by eneerge
I do not want to "jenks" myself, but holy f-ing s, it seems to be fixed after changing the ARP expiration timeout.

When using OpenWrt, I never experienced any slow downs. Linux by default has an ARP expiration of 60 seconds. Pf/Opnsense has a default expiration of 20 minutes. At exactly 600 seconds into the ARP response was when I started experiencing the slow downs. With pfsense, I was able to manually remove the individual cached entry for the gateway. I removed the ARP entry for my gateway and that instantly restored my speeds (and also caused it to instantly create a new ARP entry). So now I can apply the same to Opnsense.

I  just added this to tuneables:

  • net.link.ether.inet.max_age = 540
This should set the ARP cache to expire every 9 minutes instead of 20.

I don't understand why this works. The MAC address of my gateway is the exact same even after the expiration and renew. Anyone have any idea why an old ARP cache entry (which is actually still valid) would cause this issue?

For reference. I have a Calix Gigapoint 803g OTP that my fiber runs into.
Calix 803g -> Opnsense -> Switch -> Devices

Anyone that could enlighten me as to why this fixed the issue, please feel free to do so. This just seems odd that deleting a cached entry and creating the exact same entry every 540 seconds instead of every 1200 seconds fixes the issue.