OPNsense Forum

Archive => 22.7 Legacy Series => Topic started by: eneerge on January 02, 2023, 01:15:42 am

Title: Possible PF Software Bug Causing Slowness [Solved, Lower the ARP Expiration]
Post by: eneerge on January 02, 2023, 01:15:42 am
Hello All,

After a couple days of testing, I think there may be a software issue with PF. I've tried various configurations over a couple different hardware setups and keep experiencing the exact same issue. Full details below.

I currently have a fiber link of 1gbit up/1gbit down. Connecting directly to the fiber modem without going through the firewall, my speed tests run 950mbit up and 950mbit down. I have run the tests multiple times over a period of an hour to verify there's no deviation in that speed when connected directly to the modem.

Now, I have built a couple firewalls during my tests:
1) Intel Xeon E3-1220 v3 @ 3.10GHz, 250GB Samsung SSD, 8GB ram, 2x Intel CT Desktop NICs (Intel EXPI9301CTBLK)
2) Virtual Machine, AMD Ryzen 5950x. Assigned 8 threads to the HyperV VM. 128gb HD on 4x raid 10 SSDs. 8gb ram. Intel NICs

On both machines, I have the exact same experience:
1) I power up the firewall
2) I do a speedtest. The speedtests are 950mbit up/down like they are when directly plugged in
3) I watch some YouTube videos for about 5 minutes.
4) I do another speedtest. The speedtests are 600-700mbit down and only 60mbit up.
5) I reboot
6) Speedtest returns back to normal

I have performed the speedtests using a Windows 10 Machine and also a Windows Server 2019 machine. I have plugged the Ethernet cables directly from the test machines directly into the firewall LAN port (no middle switches).

Additionally, I have also tried installing pfsense on the same machines to see if it was something to do with opnsense. I experienced the exact same issue. Speed drops after the firewall has been online for a few minutes.

The performance seems to deviate. Occasionally it will come back up to the 950mbit, but the majority of the time the speed is slower. The upload rate is the primary issue. It is always below 100mbit for some reason.

I have tried enabling RSS in the tuneables. That did not help. I tried disabling Spectre and Meltdown mitigations. Disabling the meltdown mitigations for some reason causes it to run slower on the Xeon processor - Download never goes above 600mbit, but the upload seems to be a little faster than 60mbit when it goes into slow down mode.

I've tried enabling and disabling the "Hardware CRC", "Hardware TSO", "Hardware LRO" in the interface settings. I tried enabling/disabling interface scrubbing.

When performing speedtests, I watch the opnsense interface statistics to make sure the speeds match what the speedtests shows. They are very close to each other. This shows that there's no background activity occurring other than the speedtest.

Since I have this issue with opnsense and pfsense, the only thing that makes sense to me is an issue with PF. Anyone have a similar issue?
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 02, 2023, 01:33:50 am
When it enters "slow down" mode. The CPU doesn't even go over 1%. IE, upload is 60mbit and cpu is 1%. Prior to going into slow down mode, the CPU hits 25% when downloading and uploading. It's like there's a bottleneck somewhere.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: Patrick M. Hausen on January 02, 2023, 07:16:30 am
Have you tried disabling powerd? And disable all sleep states in the BIOS if that option available.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 03, 2023, 04:33:47 am
I just swapped to 2 completely different nics in the 5950x box. I'm still experiencing the same slowness.

When I make a configuration change on the WAN interface (any change that reloads it), the speed returns back to normal.

PowerD is not enabled. I have tried enabling it and setting to conservative and max and that didn't change anything.

I am not using Suricata. I basically have a vanilla install, but configured 10.0.1.0/24 for the LAN.

I just bought an intel x540 dual lan nic, but I doubt this will help at this point since I've been through 4 NICs now that did not have any issue in the past.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 03, 2023, 05:38:22 am
When I made LAN interface config changes, it reloaded the LAN interface, but the speed was still 60mbit upload.

I do the same with the WAN interface, speed goes back up to 1gbit/second.

Any config change on the WAN interface allows the speed to return to normal. That's the only thing I know for sure at this point.  Across 4 NICS (2 different intel nics, a realtek, and the hyperv virtual nics). Make any WAN config change and click apply, speed returns.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 03, 2023, 10:46:53 pm
I feel absolutely dingy now. I just setup a netfilter based distro (IPFire) on the same hardware. I do not have any performance slow downs now. Everything is 900mbit+ on up/down.

I really prefer OPNSense, but the slow downs are making it unusable at the moment.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: mimugmail on January 04, 2023, 06:43:51 am
You say you use vm? What happens when you install opnsense on the Hardware itself?
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 04, 2023, 11:44:35 am
Exactly the same thing. I've tried it on bare metal and got the exact same issue. Since changing to netfilter, it's been consistently 900mbps+.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: mimugmail on January 04, 2023, 02:44:50 pm
Can you try setting MSS to 1300 in Interface : LAN?
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 05, 2023, 04:25:07 am
Could this be related? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268490
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 05, 2023, 05:19:47 am
After testing MSS settings. It had no effect. Tried on WAN and LAN.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 05, 2023, 05:33:27 am
I will say changing the MSS settings on the LAN, reduced max throughput to about 700mbit up/down, but the throughput eventually dropped to 150mbit down and 60mbit up.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: dinguz on January 05, 2023, 06:26:12 am
Could you elaborate a bit more on the network topology? Are there things like PPPoE involved?
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 05, 2023, 07:34:28 am
I am using a very basic configuration at the moment while troubleshooting.

Fiber Modem -> Firewall WAN. Fiber modem gives OPNSense a WAN IP using DHCP. No PPOE. Just basic DHCP.
Firewall LAN -> Network Device. Firewall gives device an IP via DHCP (10.0.1.0/24)

I have tried the following NICs:

For every NIC, network device gets 900mbit+ for about 5-10 minutes before dropping to about 600mbps down and 60mbps up.

I have also tried different network cables (CAT8).

At the moment, I am running this in a HyperV virtual machine with 8 cores assigned with 4GB RAM. Before virtualizing the setup, I was running bare metal on a Xeon E3-1220 v3 @ 3.10GHz Quadcore with 128GB SSD and 8gb RAM. The exact same experience occurred on that hardware. However, I was using the Intel Gigabit CT Desktop NICs.

Now that I have everything virtualized, I've setup new VMs with the following:
Title: Re: Possible PF Software Bug Causing Slowness
Post by: mimugmail on January 05, 2023, 07:53:33 am
I cant imagine this happens on bare metal too  :o
Title: Re: Possible PF Software Bug Causing Slowness
Post by: schnipp on January 05, 2023, 10:20:29 pm
I can imagine. Opnsense has several performance issues. Regarding my Opnsense, routing traffic between different LAN segments (1 GBit/s each) is slow and drops to 40-60 MB/s (especially SMB transfers).

I can't exactly remember anymore. But, if I am right the issues began with version 18.x or 19.x, maybe while migrating to "iflib". The issues are still not solved and I started coming to terms with it. I don't know whether these performance issues are directly related to Opnsense or are the same in PFsense or vanilla BSD.

I noticed that performance increases when disabling IPsec (even if its not related to the routing between the above mentioned networks). Furthermore, "netflow" has a non-negligible negative impact on performance.

See also:

Edit: My Opnsense runs on bare metal (Supermicro A2SDi-4C-HLN4F)
Title: Re: Possible PF Software Bug Causing Slowness
Post by: mimugmail on January 06, 2023, 04:47:45 pm
I have a datacenter OPN on 22.1.10, Xeon E on Supermicro with X710 running 10G in both directions. It cant be a generic problem (and this device exports netflows to external)
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 07, 2023, 12:43:19 am
Tried using the tunables Kirk recommended in that second post. Didn't make any difference, unfortunately. :(
Title: Re: Possible PF Software Bug Causing Slowness
Post by: _nwb on January 07, 2023, 02:08:40 am
I wonder if this is related to my issue?: https://forum.opnsense.org/index.php?topic=31748.0

and another guy's issue on reddit?: https://www.reddit.com/r/opnsense/comments/1055v4l/abysmally_low_upload_speed/
Title: Re: Possible PF Software Bug Causing Slowness
Post by: mimugmail on January 07, 2023, 02:17:02 pm
https://forum.opnsense.org/index.php?topic=31753.msg153441#msg153441

960Mbit upload ...
Title: Re: Possible PF Software Bug Causing Slowness
Post by: schnipp on January 08, 2023, 01:44:00 pm
https://forum.opnsense.org/index.php?topic=31753.msg153441#msg153441

960Mbit upload ...

Testing the throughput in the WAN compared to the LAN can have much more issues which lead to a degraded performance. So, I recommend to start testing the performance in the LAN. I don't want to hijack this thread with my observed problems. But maybe they are related. Regarding my previous post (#15 (https://forum.opnsense.org/index.php?topic=31680.msg153331#msg153331)) I did some tests again:

Scenario: Server <-> Opnsense <-> Client:



Disabling these both services increases the performance in the LAN a lot. But for me it is not an option to drop IPsec. Unfortunately, this situation of slow data throughput has existed for a very long time with no prospect of improvement.

Edit:
Playing with some tunables from here (#15) (https://forum.opnsense.org/index.php?PHPSESSID=87qpqtobv6gjs8nknrripi90bh&topic=25844.msg126362#msg126362) increases the performance for the third case from around 90MB/s to 100-105MB/s.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: Patrick M. Hausen on January 08, 2023, 02:58:26 pm
Why is your OPNsense involved in LAN to LAN traffic? Do you use a LAN bridge instead of a switch? You should definitely get a dedicated switch, then.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: schnipp on January 08, 2023, 05:05:35 pm
Why is your OPNsense involved in LAN to LAN traffic? Do you use a LAN bridge instead of a switch? You should definitely get a dedicated switch, then.

Please don't be confused by the term "LAN", I use this in a broader sense (everything behind the firewall compared to WAN). Server and client are both in different LAN segments (VLANs) whereas the Opnsense routes the traffic between them based on pf rules. Additionally, the VLANs are on different physical interfaces, so no sharing of bandwidth is involved.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on January 11, 2023, 07:14:38 am
Just installed the Intel x540-t2. Exact same issue.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: schnipp on January 11, 2023, 05:10:20 pm
Just installed the Intel x540-t2. Exact same issue.

@eneerge: Have you also tested the performance and possible impact when stopping the services "samplicate" and "strongswan" on the Dashboard?
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on July 12, 2023, 02:54:38 am
So, I just stumbled upon this post on reddit that mentioned the EXACT issue I described (speed dropping to the exact same rate). https://www.reddit.com/r/HomeNetworking/comments/p63zbo/calix_ont_to_3rd_party_router_not_working/

Last post mentions that it's caused by the ARP timeout. Maybe this needs to be statically assigned to resolve. I will test at some point.
Title: Re: Possible PF Software Bug Causing Slowness
Post by: eneerge on July 12, 2023, 07:16:43 am
I do not want to "jenks" myself, but holy f-ing s, it seems to be fixed after changing the ARP expiration timeout.

When using OpenWrt, I never experienced any slow downs. Linux by default has an ARP expiration of 60 seconds. Pf/Opnsense has a default expiration of 20 minutes. At exactly 600 seconds into the ARP response was when I started experiencing the slow downs. With pfsense, I was able to manually remove the individual cached entry for the gateway. I removed the ARP entry for my gateway and that instantly restored my speeds (and also caused it to instantly create a new ARP entry). So now I can apply the same to Opnsense.

I  just added this to tuneables:

This should set the ARP cache to expire every 9 minutes instead of 20.

I don't understand why this works. The MAC address of my gateway is the exact same even after the expiration and renew. Anyone have any idea why an old ARP cache entry (which is actually still valid) would cause this issue?
Title: Re: Possible PF Software Bug Causing Slowness [Solved, Lower the ARP Expiration]
Post by: eneerge on July 12, 2023, 07:25:52 am
For reference. I have a Calix Gigapoint 803g OTP that my fiber runs into.
Calix 803g -> Opnsense -> Switch -> Devices

Anyone that could enlighten me as to why this fixed the issue, please feel free to do so. This just seems odd that deleting a cached entry and creating the exact same entry every 540 seconds instead of every 1200 seconds fixes the issue.