Packet loss when pinging opnsense

Started by dza, August 24, 2024, 05:40:48 PM

Previous topic - Next topic
August 24, 2024, 05:40:48 PM Last Edit: August 24, 2024, 05:53:59 PM by dezzadk
I've been chasing this issue of the opnsense router dropping ICMP packets when being pinged continously for some time.

Whats weird is that my access point on the same subnet (and other clients) also connected to the same LAN-interface does not produce any packet loss at all for several days continously.

If I do `ping -t 10.0.0.1` over several hours, sometimes as little as 30min, sometimes 8, sometimes 24 it almost always produces at least 2-20+ packets lost.

I have tried to raise `net.icmp.icmplim=1000` with no results.

Initially I had `hw.acpi.cpu.cx_lowest=c3` but its now at `hw.acpi.cpu.cx_lowest=c1` for sake of testing. `dev.hwpstate_intel.0.epp` to `dev.hwpstate_intel.3.epp` is also at 0.

Since no addresses except gw produce any packet loss I can only conclude that the opnsense gateway must be dropping these packets actively somehow on intervals, because it is the only address on the LAN that does this.

So even if the opnsense router is at the central point, other clients can ping each other without packet loss its only the gateway (opnsense) that produces packet loss from time to time.

Does anyone know why this could happen?

August 24, 2024, 05:59:04 PM #1 Last Edit: August 24, 2024, 06:10:01 PM by meyergru
To quote RFC 777 and RFC 792:

Quote
   The Internet Protocol is not designed to be absolutely reliable.  The
   purpose of these control messages is to provide feedback about
   problems in the communication environment, not to make IP reliable.
   There are still no guarantees that a datagram will be delivered or a
   control message will be returned
.  Some datagrams may still be
   undelivered without any report of their loss.  The higher level
   protocols that use IP must implement their own reliability procedures
   if reliable communication is required.

   The ICMP messages typically report errors in the processing of
   datagrams, to avoid the infinite regress of messages about messages
   etc., no ICMP messages are sent about ICMP messages.

There is no guarantee that ICMP packets will be replied to, much less if they are used excessively. Many backbone router reply only to a few packets, then stop doing it.

You have already found that you can limit the number of ICMP packets per second in FreeBSD. Without exactly knowing, I would think that ICMP has less priority than other network packets, so if there is anything else going on over your router, ICMP packets may get dropped in favor of other IP traffic.

Also, since many devices of you network may want to use the default route, which presumably passes a LAN port of your OpnSense, they share the port's bandwidth. Thus, the switch may drop packets, even before your OpnSense becomes aware of it.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

August 24, 2024, 07:01:32 PM #2 Last Edit: August 24, 2024, 07:15:12 PM by sliman
Hey,

I think, cause ICMP is normally not really critical and dropped by driver / kernel, if there's a temporary micro-overflow in buffers. And this could happen by every hop, the ICMP passes.

This is the reason, why u in scripts do a connectivity check over ICMP never base on only a few packages rather than messurement over time.

I dont know, how exactlly implemented ICMP in your cards or the OPNsense, but maybe try to send ICMP with higher QoS Priority and compare results, if its critical. Never does it, but could a nice experiment.

EDIT:
A complete other direction of thinking:
It's possible the CPU has interrupts or micro-spikes.




Quote from: sliman on August 24, 2024, 07:01:32 PM
I dont know, how exactlly implemented ICMP in your cards or the OPNsense, but maybe try to send ICMP with higher QoS Priority and compare results, if its critical. Never does it, but could a nice experiment.
Do you have an example or elaborate how this could be done?

Quote from: meyergru on August 24, 2024, 05:59:04 PM
You have already found that you can limit the number of ICMP packets per second in FreeBSD. Without exactly knowing, I would think that ICMP has less priority than other network packets, so if there is anything else going on over your router, ICMP packets may get dropped in favor of other IP traffic.

Also, since many devices of you network may want to use the default route, which presumably passes a LAN port of your OpnSense, they share the port's bandwidth. Thus, the switch may drop packets, even before your OpnSense becomes aware of it.
Do you have a better reliability test that could be used for intermittent fallouts?

If you want to monitor the stability of your internet connection, you can do a connection test over tcp to a well-known, stable service. Of course, it depends on what you want to show or test.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

August 26, 2024, 01:05:41 AM #5 Last Edit: August 26, 2024, 01:32:03 AM by dezzadk
I wanted to test reliability of both the client (RTL8125) and the router (hunsn n100 4x 2.5GBe i226V)'s interfaces, since both of those were so controversial on its inception for various issues.

So I thought (continous) ping might be the best for an intermittent test..

For the I226, I only know of ASPM issues that manifest in a permanent loss of connection if present (at least under FreeBSD).

The RTL8125 is problematic with the non-OEM driver, but only under FreeBSD. I would not worry too much.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Can anyone suggest a better reliability test than ping/ICMP for intermittent loss ?