OPNSense randomly slows down to 10Mbps download, upload is fine

Started by bl8demast3r, December 03, 2023, 09:47:16 PM

Previous topic - Next topic
Hi everyone, so I've started to notice this annoying issue with my OPNSense( 23.7.9 ) box, download will randomly drop to 10Mbps max, after a reboot everything is fine.

I've confirmed all interfaces are auto negotiating to 1000baseT <full-duplex>

My internet speed is 1000/38~

Has anyone else run into this?

Intel I219-LM for WAN and I225-V (V3) for LAN

i5-10500

All offloading disabled

HW Info is below

https://bsd-hardware.info/?probe=262c6ee87c#Host

Edit: Here's some more information from Dec 6th

Had it reoccur today after I swapped out the patch cables for LAN and WAN with brand new cables. I ran a few more speed tests and I noticed that it's not capped at 10Mbps, fast.com was 20~ and Google speedtest gave 85~ speedtest.net still sticks to 10~ Mbps though.

Checked ports physically and both LAN and WAN have the same indicator LEDs

iperf test on LAN interface is gigabit 950~

Disabled WAN interface and reenabled and speeds are still slow.

I checked all the logs and I'm not able to find anything screaming.

Anything suspicious when you run the following in the console while this hiccup appears?

top -m io -o total

Anybody know an alternative to the Linux irqtop?

Here's what I got, just java being at the top due to Zenarmor, even after the reboot it sits at +90%

Today before I noticed the issue was happening I experienced a 5 second internet outage.

In regards of zenarmor,

Zenarmor engine uses currently a single core. Thus if you want to reach routing above 1G speeds you need a CPu that is clocked above 3.3Ghz. I spoke with zen support they told me multicore support will be delivered Q1 in March.

Elastic search should potentially use multi core. I think on the 1.16 they enabled it. When Checking thru OPNsense GUI I can see its using several cores.

However Elastic is only maxed out if there is a Huge Throughput (aka 1G for example). You say that you have Elastic maxing a Core even in Idle state?

When you have this issue again appearing can you:
1. Put ZenEngine into bypass test
2. Stop elastic and test

After doing this do you see the downloads are back to expected values?

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

That seems to be an issue that can happen with the i225 cards.  Take a look at this thread and the linked PDF.

https://forum.opnsense.org/index.php?topic=38055.0

I'm curious which revision you have.

Quote from: Seimus on January 16, 2024, 10:25:54 AM
In regards of zenarmor,

Zenarmor engine uses currently a single core. Thus if you want to reach routing above 1G speeds you need a CPu that is clocked above 3.3Ghz. I spoke with zen support they told me multicore support will be delivered Q1 in March.

Elastic search should potentially use multi core. I think on the 1.16 they enabled it. When Checking thru OPNsense GUI I can see its using several cores.

However Elastic is only maxed out if there is a Huge Throughput (aka 1G for example). You say that you have Elastic maxing a Core even in Idle state?

When you have this issue again appearing can you:
1. Put ZenEngine into bypass test
2. Stop elastic and test

After doing this do you see the downloads are back to expected values?

Regards,
S.

After a while of no issues it finally came back, I put the ZenEngine in bypass and tested again but had the same speed issues, then I stopped Elasticseach database and tested again but same still. :(

Quote from: CJ on January 17, 2024, 05:14:53 PM
That seems to be an issue that can happen with the i225 cards.  Take a look at this thread and the linked PDF.

https://forum.opnsense.org/index.php?topic=38055.0

I'm curious which revision you have.

I verified when I got the FlexIO board that I had the v3 revision of the i225

This is my system information while running a speedtest

Download is 1% CPU usage

Upload is 2% CPU usage and spiked to 9% during the test

Also, I noticed that my upload speed seems unaffected (Edit: I forgot I already had put that in the title, lol)

Yea,

then this indeed looks like as @CJ mentioned the famous problem with i225.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

But how could that be? My WAN connection is the Intel i219, if I run iperf on my LAN to the opnsense box while the issue is happening then I have full throughput.

I just double checked my WAN negotiation speed and it's 1000baseT full-duplex

So I finally found something in opnsense that's off, I have a huge amount of Input Errors on the WAN interface. I believe the throughput issue started when I had a blip in service, I think the modem might have restarted as part of ISP maintenance.

My modem is a EN2251 which suspiciously has a 2.5gb nic, it's got me thinking that maybe the problem is reversed. I know the modem has an intel puma 7 chipset so what if it also has a problematic i225 nic on board? I did some googling and cannot confirm if it does or not.

I have an old SB8200 I can switch to but it has issues updating it's firmware from Xfinity to Spectrum service, this would cause a reboot to happen every night at 12am lol. I might try calling Spectrum to see if they have anything newer I can switch to but I doubt it.

Another idea I had was to switch the LAN and WAN interface ports. So WAN would be the 2.5gb NIC in opnsense and maybe it would resolve the issues if it's going from 2.5gb to 2.5gb. How much configuration would this take? If I swap the assignments in opnsense would it just switch over all my config or am I going to have to do that all manually?

Update: I have swapped the LAN and WAN interface ports by exporting the config, changing the LAN and WAN ports in the xml, then reimported. I will update back if the issue reoccurs