OPNsense Forum

Archive => 16.1 Legacy Series => Topic started by: jerrac on May 13, 2016, 04:17:11 pm

Title: How can I figure out what is causing packet loss and dns issues?
Post by: jerrac on May 13, 2016, 04:17:11 pm
I haven't spent much time working with the network layer. So, I'm not really sure where to start figuring issues out... Any pointers would be appreciated.

The issue is that my opnsense firewall is causing packet loss. This shows itself by making me repeatably refresh pages to get the browser the correct dns response. When I restart the system, the issues go away for a while.

I have tested this by removing anything but ethernet cords between my computer, and the firewall. As well as going directly to the cable modem. Packet loss only occurs when the firewall is involved.

It also appears to only occur after the firewall has been on for a while. This last time was 6 days or so. Times before that varied, but I wasn't keeping track of what exactly happened.

I know it's packet loss because the dashboard on my firewall shows anywhere from 15-30+% loss when the issues occur. I also see packet loss when pinging 8.8.8.8.

I've googled some, but I didn't find anything that helped. Maybe because I'm not sure what the best search terms are...

My hardware should not be an issue. It's a 5 or so year old repurposed gateway tower with an i3 cpu, 16gb of ram, and an ssd. I'm using two Intel Gigabit CT PCI-E Network Adapter EXPI9301CTBLK nics for my LAN and WAN ports.

Any suggestions on what I can try?

Thanks!
Title: Re: How can I figure out what is causing packet loss and dns issues?
Post by: fabian on May 13, 2016, 04:54:08 pm
Try to disable IPS as some drivers cause packet loss when the IPS is used.
Title: Re: How can I figure out what is causing packet loss and dns issues?
Post by: jerrac on May 13, 2016, 05:40:28 pm
So, this? https://docs.opnsense.org/manual/ips.html ?

The Services -> Intrusion Detection -> Enabled checkbox is not checked, nor have I ever checked it.

I would think that an issue caused by drivers would be constant, not something that happens after some time. Am I wrong?

Title: Re: How can I figure out what is causing packet loss and dns issues?
Post by: jerrac on May 18, 2016, 11:51:55 pm
Well, after 5 days of uptime, my packet loss issue reappeared.

I still have no idea how to troubleshoot this...
Title: Re: How can I figure out what is causing packet loss and dns issues?
Post by: franco on May 19, 2016, 08:31:05 am
Hi jerrac,

I'll make the answer a bit more broad than what you already narrowed down, just because this question comes up from time to time and it's difficult to trace. So here it goes.

I'd first check the counters for the interface stats... Interfaces: Overview has stats about in/out errors and collisions. It's important to find the actual interface or wire where the drop occurs (if any). It could be a bad NIC, cable, or switch. Physical errors are the worst in terms of traceability, but can be easily solved with the switch of a single cable.

If the drops occur on the WAN side, it may be out of the immediate reach of OPNsense. You can try to adjust the optimisation setting in System: Settings: Firewall to "conservative" and see if that makes a difference. More often than not, the modem or the router/modem combo can be suboptimal. Many stories about people replacing modems for that reason...

You could also measure packet loss with tools such as iperf. You can install from the console:

# pkg install iperf3

Measure from the box to the destination (WAN or LAN). If there is no apparent loss, measure indirectly from WAN to LAN just outside the OPNsense box, afterwards maybe widen the search to the ISP/Internet to see if something is wrong there.

It would also be good to know whether you're exhausting bandwidth or system resources, the system info widget on the dashboard gives some insights into this. If the box is not capable of coping with the network's throughput these drops can occur too.

ARP flooding issues can also be hard to trace as they don't show up on IP-based statistics but can be seen as high-interrupt CPU-intense periods when the actual traffic pushed through the box is negligible.

This could also be MTU issues, or a PPP setting that needs to be tweaked. In that case it would help to ask other cable modem users for advice

Hope this helps.


Cheers,
Franco
Title: Re: How can I figure out what is causing packet loss and dns issues?
Post by: jerrac on June 11, 2016, 04:33:39 am
Thanks for the broad reply. I'm happy to learn as much as I can about this. :)

After a nice 23 or so days, the issue has reared it's ugly head again. Fortunately, it's a Friday evening, and I can troubleshoot. :)

I ran some iperf3 tests. I set up the server on my Ubuntu laptop. The results: http://pastebin.com/ksaXeHdE

Unfortunately, I could not find any explanation on how to interpret what iperf3 output. Google was not my friend this evening. Could you take a look at my pastebin link and tell me if you spot anything?

In case it's useful: I have 2 TP-LINK TL-SG108 unmanaged switches. Those are what my pastebin is referring to. So, LAN goes to the right switch, a cable from that one goes to the left switch, and another cable goes to my NetGear wireless router with DD-WRT on it. All my devices are plugged into one of the switches.

For testing, I used a usb 3 ethernet adapter connected to my laptop. Then I just moved the other end of the cable between switches and the LAN nic.

I did not test the WAN with iperf3. I'm not entirely sure how I'd go about it... Sorry if it's obvious...  :-\ I thought about sticking it on my VPS, but there's a lot more than just my modem between me and it...

On the Dashboard, CPU and RAM usage seems pretty low to me. I did see the CPU spike to 19% once, but it's mostly sitting at less than 5%. At least while I've been watching... Reporting -> doesn't have a cpu load graph does it? Just named different? 'Cause I don't see one.

If any of my fellow cable modem users have any insight, I'm using a ARRIS SURFboard SB6141 DOCSIS 3.0 Cable Modem purchased new (not refurbished) from Amazon.

One thought, wouldn't the fact that this is "fixed" by restarting my firewall box mean that the modem should be fine? Power cycling the modem has no effect.

Anyway, now I'll upgrade OPNSense and reboot. I'll be back when it pops up again. (Or there's some I can reply to without the issue actually occurring...)

Edit:
Just a quick note that the issue appeared again on 7/13/2016. Sticking it here since this where I've been keeping my notes... :\ Don't have time to troubleshoot right now, so a quick reboot is all I'll do.

Hmm... Always happens when I can't take time to troubleshoot... *sigh* 8/3/2016
Title: Re: How can I figure out what is causing packet loss and dns issues?
Post by: franco on June 11, 2016, 09:19:41 am
Hi Jerrac,

Thanks for checking back!

TCP on iperf won't measure packet loss it seems, you need to run UDP tests instead:

https://iperf.fr/iperf-doc.php

FWIW, your TCP tests indicate that there are longer periods where the LAN port won't respond to such high traffic and won't recover for a while, which is very odd. At this point I'd question the hardware itself more than anything. Can you check "dmesg" output after such a test to see if a card driver drops out? Are there hardware errors on the LAN interface after such tests where there are longer periods with "0" bytes sent/received?

Under Reporting: Health: System you find "Processor", which has interrupt count, load values and number of processes (the latter probably doesn't matter).

Restarting and fixing I'm not sure about. If there is an issue that takes 19 days to come back it further points to WAN side, which could also be caused by your provider's network. Maintenance there can come unexpected. Maybe that's two separate issues.

If the problem disappears as quickly as it came and then doesn't happen for a very long time we're not dealing with an software-specific OPNsense specific problem, only OPNsense reacting to something in your path, which would mean to still find the root cause in order to reliable reproduce this and then see if/how that can be fixed in OPNsense.


Cheers,
Franco