High packet loss - caused by OPNSense?

Started by beneix, December 29, 2023, 02:35:43 PM

Previous topic - Next topic
December 29, 2023, 02:35:43 PM Last Edit: December 30, 2023, 06:42:55 PM by beneix
I need help understanding an issue I seem to have with packet loss on my OPNSense router or my ISP fibre gateway.

I have a home network that connects to an OPNSense APU router, which in turn is connected to my ISP's fibre router (see diagram). The ISP router and the APU2E4 both have Gigabit NICs (in the case of the OPNSense APU, Intel I210-AT) and are connected by a brand new CAT6 Ethernet cable.



I was noticing that OPNSense was telling me about packet loss on the gateway, sometimes quite high (30%).



I decided to check the Quality/gateway chart and was shocked to see loss peaks of 50%. Comparing these in time to the traffic on the router, I can tell that the loss percentage goes up when there is little traffic and down when traffic is high, which I can see makes sense.



I am obviously not happy with having loss of this magnitude in the first place. I therefore installed smokeping on a Raspberry Pi and hooked it up to my network. First, I had it connected directly to my ISP's fibre router; then it was not showing any loss at all on Google, BBC, the ISP's DNS server, etc. Only one site showed some loss. Then I moved it and connected it behind the OPNSense router. There, I set smokeping to track some internet servers, the second and first hop of my ISP, the fibre router and the OPNSense router. Below are the results.









There is no indication of a problem behind the OPNSense router, but some internet servers are showing high loss numbers. The gateway (my ISP's fibre router) shows a very minor max loss. Strangely, there is a significant difference in max loss between the first and second hop (as determined by tracert) of my ISP - 62% and 8%, respectively.

At this point, my evidence is ambiguous - there seems to be something with my OPNSense router causing the packet loss, but how can I track this down? I am running AdGuard Home on the router, but it seems odd that this would generate packet loss as measured against the gateway by OPNSense. Also, I intentionally included both a web address version and an IP version of some servers for smokeping to test; for one server, the IP version got slightly less loss, for the other, slightly more. This suugests there is no consistent detrimental impact on packet loss by AdGuard. I also run traffic shaping, set up very simply with an upload queue and a download queue, set at the normal bandwidth delivered by my ISP (as measured by nightly speedtests from OPNSense). The shaping uses FlowQueue-CoDel ECN.

Any suggestions on how to further diagnose this would be very welcome.
OPNsense 24.7.7-amd64 on APU2E4 using ZFS

Update: I deactivated the traffic shaping in the OPNSense firewall and made sure that the Smokeping RPi was using external DNS servers (my ISP's and Quad), thereby eliminating AdGuard from the equation. The result (still with Smokeping hooked up behind OPNSense) was that some internet servers saw less maximum loss, some had higher maximum loss over 30 hours of measurements. This tells me the problem is neither with the traffic shaping, nor with AdGuard Home.

Any ideas what could be causing the packet loss? As mentioned, when hooking up the Smokeping RPi after OPNSense, directly to my ISP's fibre router, there is zero loss on all but on external server.
OPNsense 24.7.7-amd64 on APU2E4 using ZFS

Maybe you should just quit trying to DoS things with ICMP. Additionally, setting the GW monitoring payload to something other than the default 0 might help to get answers in a more normal way.

Are you experiencing some real issues (as opposed to smoking things with ping)?

Quote from: doktornotor on December 31, 2023, 07:25:57 PM
Maybe you should just quit trying to DoS things with ICMP.
I am afraid I am not knowlegeable enough to understand what you mean, can you please help me understand? Are you suggesting that the pings via ICMP that the Smokeping server sends each minute would be too much for the OPNSense router?

QuoteAdditionally, setting the GW monitoring payload to something other than the default 0 might help to get answers in a more normal way.
How do I set this?

QuoteAre you experiencing some real issues (as opposed to smoking things with ping)?
I am seeing OPNSense show a 15-50% packet loss indication on the Dashboard (this was before I had the Smokeping server installed) and I was wondering if this could explain issues I occasionally have with videoconferencing.
OPNsense 24.7.7-amd64 on APU2E4 using ZFS

Quote
Are you suggesting that the pings via ICMP that the Smokeping server sends each minute would be too much for the OPNSense router?

Nah. I mean that the servers you are trying to monitor with smokeping have other, much higher priorities than responding to perpetual pings. Tbey are not ping servers.

Quote
I am seeing OPNSense show a 15-50% packet loss indication on the Dashboard

I mean real packet loss issues. Not the GW monitoring.

System - Gateways - Single - Edit - Advanced - Data Length. Set this to something else than 0. If 1 does not help, try 32 or 56.

Quote from: doktornotor on December 31, 2023, 07:55:22 PM
System - Gateways - Single - Edit - Advanced - Data Length. Set this to something else than 0. If 1 does not help, try 32 or 56.
Thanks. I tested changing the data length to 1, then to 32, then to 56. This is the gateway quality graph:


The data length change does not seem to have made any difference. Any other ideas?
OPNsense 24.7.7-amd64 on APU2E4 using ZFS

If you are experiencing packet loss between OPNsense snd the router directly connected to WAN, check for a duplex mismatch of the interfaces involved and check the cabling.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on January 01, 2024, 01:00:22 PM
If you are experiencing packet loss between OPNsense snd the router directly connected to WAN, check for a duplex mismatch of the interfaces involved and check the cabling.
Thanks. On the fibre router, it is showing the interface as 1000 Mbps and full duplex, on the OPNSense the interface is shown as 1000baseT <full-duplex>. Is that what you were referring to?

The cable between the OPNSense and the fibre router is a 25cm CAT6 cable. I have tested changing it for another cable without any effect, and I have also tested the same cable to connect the Smokeping RPi to the fibre router and the result was no packet loss with the same cable that is now connecting the OPNSense to my fibre router.
OPNsense 24.7.7-amd64 on APU2E4 using ZFS