OPNsense Forum

Archive => 22.1 Legacy Series => Topic started by: Andreas_ on February 18, 2022, 06:30:19 pm

Title: Interface errors after Upgrade
Post by: Andreas_ on February 18, 2022, 06:30:19 pm
After upgrading from 21.7 to 22.1.1, my firewall shows an error rate of 0.05/s avg on the WAN interface (SuperMicro A2SDI on-board NIC connected to a Juniper switch afaik), which used to be zero before (netstat -i Ierrs). The machine has 4 more connections, all are still at zero errors.

What may be the reason?
Title: Re: Interface errors after Upgrade
Post by: j_s on February 19, 2022, 03:28:47 am
Can you post the entire output of "netstat -i Ierrs"

I'm reading what you wrote as 0.05 errors per second (or roughly 1 error every 20 seconds).  But netstat -i doesn't give any timeframe.  So I'm really confused on your units.

General standard is to look at the number of Ierrs vs Ipkts.  If I have 500 billion Ipkts, but 100 Ierrs, I'm not worried about that error rate (which is 0.00000002%).  No when that error rate is 5%, we have a big problem.

For me, I generally completely ignore any kind of Ierrs or Oerrs unless its statistically significant, unless the interface simply isn't passing traffic.  I've seen on some systems 20-50 errors show up on bootup, but when there's only like 100 packets a day on that interface, it's obviously going to be statistically significant for a very long time.  But in those cases I simply don't worry about it because I know not to worry unless the ierrs are actually going up day after day.
Title: Re: Interface errors after Upgrade
Post by: Andreas_ on February 19, 2022, 12:39:48 pm
netstat -i output, redacted after 42h uptime:

Name    Mtu Network   Address  Ipkts        Ierrs  Idrop  Opkts        Oerrs  Coll
ix2    1500 <Link#5>  xx:xx:xx 40644829 6335  0       59986652  0        0

The firewall is monitored by checkmk, which calculates the error rate. Attached the current graph (was flat zero for the last 400 days until recently)

I agree this isn't a horribly high error rate, but the firewall was quiet for years until the update. So something must have changed. Maybe some driver issue?
Title: Re: Interface errors after Upgrade
Post by: j_s on February 19, 2022, 12:54:45 pm
Okay, so that translates to 1.5% error rate.  That's nothing to sneeze at, and definitely warrants investigation.

So a driver problem is entirely possible.  I do think that is pretty unlikely in the big picture because Intel "generally" does a pretty good job with maintaining their drivers, and I'd be surprised if this got through their testing.  Also nobody else is having this problem.

What I would do is look at the switch side and see what the Juniper switch says for errors on the incoming side of that port.  If opnsense is reporting incoming errors, and the juniper is also reporting incoming errors, that may indicate something like a cable.  I'm not a Juniper guy, so I can't provide exact commands to check for that on your switch.

However, if that network cable can be replaced easily (even temporarily for testing) I might try that first because that's a cheap and easy test that can rule out the cable easily.  I know I've seen people troubleshoot the heck out of network problems, when if they'd simply tried a cable swap first, it would have saved them dozens of hours of troubleshooting.  If the route for the temporary cable is different than the current cable, that can also help rule out EMF problems due to stuff like fluorescent lights, etc.

I'm a big proponent of trying cheap and easy first before spending lots of time troubleshooting.
Title: Re: Interface errors after Upgrade
Post by: Andreas_ on February 22, 2022, 03:32:57 pm
The internet provider guy says there's no errors on the switch side.

sysctl -A shows
  dev.ix.2.mac_stats.checksum_errs: 15381
  dev.ix.2.mac_stats.rx_errs: 15381
which correspond with 15381 IErrs from netstat.
Since there seems a constant error rate, I captured some 5 minute and filtered with wireshark for errors:

eth.fcs.status=="Bad" || ip.checksum.status=="Bad" || tcp.checksum.status=="Bad" || udp.checksum.status=="Bad"

But the result was zero.
Title: Re: Interface errors after Upgrade
Post by: j_s on February 22, 2022, 08:37:02 pm
Okay, so since your incoming is showing the errors, you're left with 3 very likely possibilities.

1.  The switch output is occasionally garbage.
2.  The cabling has excessive noise or other problem causing checksum errors.
3.  Your network card is having problems.

Yes, drivers and other things are certainly still a possible option.  But this sounds MUCH more likely a hardware problem than anything else.

I would still try replacing the cable first since that is easy to do.  Is this 1Gb or 10Gb?  If 10Gb, is it fiber, DAC, or CAT6 cabling?

Title: Re: Interface errors after Upgrade
Post by: Andreas_ on February 23, 2022, 10:10:22 am
Some days before the errors started, the switch was replaced (cabling and switch are in the provider's realm), but the switch was still silent afterwards. Errors began right after upgrade, so I wonder if the driver now can detect some error situation that it couldn't before.

Actually, I have TWO firewalls in CARP configuration. The regular error rate is on the carp master, but there are also some error on the backup.
Title: Re: Interface errors after Upgrade
Post by: opns_neuling on February 24, 2022, 12:14:34 pm
Hi !
2 cases with same Problem after update to 22.1

 sysctl -A | grep -i "dev.ix.[0-1].mac_stats" | grep err
dev.ix.1.mac_stats.checksum_errs: 103470
dev.ix.1.mac_stats.rec_len_errs: 0
dev.ix.1.mac_stats.byte_errs: 0
dev.ix.1.mac_stats.ill_errs: 0
dev.ix.1.mac_stats.crc_errs: 0
dev.ix.1.mac_stats.rx_errs: 103470
dev.ix.0.mac_stats.checksum_errs: 1257491
dev.ix.0.mac_stats.rec_len_errs: 0
dev.ix.0.mac_stats.byte_errs: 0
dev.ix.0.mac_stats.ill_errs: 0
dev.ix.0.mac_stats.crc_errs: 0
dev.ix.0.mac_stats.rx_errs: 7283759

dev.ix.1.%desc: Intel(R) X520 82599ES (SFI/SFP+)
dev.ix.0.%desc: Intel(R) X520 82599ES (SFI/SFP+)

X520-DA2 % X520-SR2

same problem with 10Gbe DAC cabling or original 10Gbe Gbic adapters,

HPE Switch's with LACP. Tested without LACP too, no changes ...

Cheers,


Title: Re: Interface errors after Upgrade
Post by: Glow on March 08, 2022, 01:50:11 pm
Hi,

I have the same problem, but I only run on one interface (ix.0) and I'm currently running 22.1.2_1, and the errors only keep on counting :-)

dev.ix.1.mac_stats.checksum_errs: 0
dev.ix.1.mac_stats.rec_len_errs: 0
dev.ix.1.mac_stats.byte_errs: 0
dev.ix.1.mac_stats.ill_errs: 0
dev.ix.1.mac_stats.crc_errs: 0
dev.ix.1.mac_stats.rx_errs: 0
dev.ix.0.mac_stats.checksum_errs: 704322
dev.ix.0.mac_stats.rec_len_errs: 0
dev.ix.0.mac_stats.byte_errs: 0
dev.ix.0.mac_stats.ill_errs: 0
dev.ix.0.mac_stats.crc_errs: 0
dev.ix.0.mac_stats.rx_errs: 704322

dev.ix.0.%desc: Intel(R) X520 82599ES (SFI/SFP+)

/Regards

Title: Re: Interface errors after Upgrade
Post by: PackerFan on March 08, 2022, 05:09:47 pm
I had the same problem with the X520 chipset. I ended up having to put a 1GB RJ45 Intel NIC in and all the interface errors stopped for me.
Title: Re: Interface errors after Upgrade
Post by: Glow on March 23, 2022, 11:53:16 pm
Sad to say, but that is not really an option for me, since I have 10Gb on the Wan interface.