How to investigate "errors out"?

tessus · January 02, 2023, 01:45:15 AM

I can see in the interface statistics for one of my VLANs that it has 1783 errors out (of 4,699,213,468 packets out).

I doubt it is a problem, but I would like to find out what triggered those errors. What is the best way to investigate this? Does OPNsense log these type of errors? (Because I couldn't find anything, unless I looked in the wrong places/logs.)

P.S.: I certainly can't run a tcpdump for 22 days. e.g. those errors happened within the last 22 days (current uptime) and I have no idea when exactly.

P.P.S.: Happy New Year!

franco · January 09, 2023, 02:23:52 PM

Happy new year!

This is a good question. Having to deal with this internally a bit I think the answer is... it's complicated.

The aggregate value may or may not come from individual counters, which in turn pin down the subsystem but not necessarily the discard reason.

The most likely culprit, however, is temporary memory allocation (mbuf) failure, which is harmless in the grand scheme of thing like many other of these error counters.

Cheers,
Franco

tessus · January 10, 2023, 07:59:08 AM

Thanks a bunch for the reply.

Would it make sense to log these errors then? e.g. there should be a way to log the discard reason and from which subsystem it came. Unless this is too complex to tackle.

While I am sure these errors are not going to dirupt the proper operational status quo, I am a curious person, being a Chaos Engineer and SRE, and having worked in o11y for the past few years. It's just that I really want to know what's going on in a system. The more info, the better (with an option to filter out irrelevant data). But that's just me, others might call this a bit too OCD.

How to investigate "errors out"?

tessus

January 02, 2023, 01:45:15 AM

franco

January 09, 2023, 02:23:52 PM #1

tessus

January 10, 2023, 07:59:08 AM #2