How to investigate "errors out"?

Started by tessus, January 02, 2023, 01:45:15 AM

Previous topic - Next topic
I can see in the interface statistics for one of my VLANs that it has 1783 errors out (of 4,699,213,468 packets out).

I doubt it is a problem, but I would like to find out what triggered those errors. What is the best way to investigate this? Does OPNsense log these type of errors? (Because I couldn't find anything, unless I looked in the wrong places/logs.)

P.S.: I certainly can't run a tcpdump for 22 days. e.g. those errors happened within the last 22 days (current uptime) and I have no idea when exactly.

P.P.S.: Happy New Year!

Happy new year!

This is a good question. Having to deal with this internally a bit I think the answer is... it's complicated.

The aggregate value may or may not come from individual counters, which in turn pin down the subsystem but not necessarily the discard reason.

The most likely culprit, however, is temporary memory allocation (mbuf) failure, which is harmless in the grand scheme of thing like many other of these error counters.


Cheers,
Franco

Thanks a bunch for the reply.

Would it make sense to log these errors then? e.g. there should be a way to log the discard reason and from which subsystem it came. Unless this is too complex to tackle.

While I am sure these errors are not going to dirupt the proper operational status quo, I am a curious person, being a Chaos Engineer and SRE, and having worked in o11y for the past few years. It's just that I really want to know what's going on in a system. The more info, the better (with an option to filter out irrelevant data). But that's just me, others might call this a bit too OCD.