Monitoring and analysing packet loss on OPNsense

Started by binaryanomaly, January 31, 2022, 11:50:27 AM

Previous topic - Next topic
Hi,

Since quite a while I experience occasional connection interruptions and can observe packet loss on OPNsense (not just since 22.1). I do suspect my ISP but I have not enough evidence to approach it yet.

I have already activated gateway monitoring. Interestingly packet loss is displayed as 0.0% in System -> Gateways -> Single.
Although Reporting -> Health -> Quality displays packet loss for the Gateway.
Which one is correct?

How can I investigate this further in OPNsense?

Thanks

Not in OPNsense, but I run a Smokeping instance to keep an eye on ISP issues. It's hard to argue with a widely accepted graphical measurement.

Bart...

Thanks a lot! Set it up.
Would you mind sharing your config?

This is the abridged content of my targets file /etc/smokeping/config.d/Targets

*** Targets ***

probe = FPing

menu = Top
title = Network Latency Grapher

+ UK

menu = UK
title = Britain

++ BBC

menu = BBC
title = BBC
host = www.bbc.co.uk

+ US

menu = US
title = United States

++ RedHat

menu = Red Hat
title = Red Hat
host = www.redhat.com

Thanks, so pretty standard config.


+ Remote

menu = Remote
title = Remote check

++ cloudflare

menu = Cloudflare
title = 1.1.1.1 check

#probe = FPingNormal
host = 1dot1dot1dot1.Cloudflare-dns.com


My graph looks like below, not sure why I'm getting "u" as unit for the y-axis.
Anything of concern?

fping -s 1.1.1.1
on command line returns 2.43ms avg which I do not recognize in the graph.

Those are SI prefixes (u = micro = 1/1000000). Either you live in the Cloudflare building or you have some DNS issue that returns a local host for an external URL.

Try something not behind a CDN like bbc.co.uk.

Bart...

Thanks, indeed there was something wrong with the host resolution, using the IP now.

Also it seems to show some packet loss, I'll have to investigate further.
Thanks for your help so far 👍🏻


The sweet spot is jitter around the 15-30 ms, since that's well within the domain of your ISP. Your internal latency will be around 2-3 ms and more affected by the quality (or lack thereof) of your infrastructure.

I get a solid 15 ms to sites behind CDN and I don't get worried until that doubles.

The results are pretty good. Constantly 2-3ms e2e.

After digging deeper I now found a rx_no_dma_resources issue on the NIC.
It looks as if this could be the root cause of the intermittent issues I am experiencing. I have no idea why this suddenly appeared, may be related to some kernel upgrade or so on the vmhost itself.

I'm still confused though that OPNsense reports package loss in the Reporting/Health/Quality section but not in System/Gateway/Single.

@Franco: This might be a bug or I am not getting how this is intended to work.

Quote from: binaryanomaly on February 01, 2022, 10:17:34 PM

After digging deeper I now found a rx_no_dma_resources issue on the NIC.
It looks as if this could be the root cause of the intermittent issues I am experiencing.


Where did you found this error?

If I recall correctly this was an error message in dmesg of the vmhost.

The root cause of all of this packet loss was a bad ethernet cable. It took me weeks to identify though since there we no clear error messages or indications and I only had breaking problems intermittently.