OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: devilkin on November 30, 2020, 03:14:50 PM

Title: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on November 30, 2020, 03:14:50 PM
Hi,

I've been fighting with an intermittent problem: at intervals, my laptop (wifi) is no longer able to access anything that requires my OPNsense box for routing. Traffic inside the same subnet keeps working, and traffic from a server on the same vlan out over the OPNsense box also keeps working.

At the same time, an mp3 stream which is playing on my chromecast on another VLAN keeps playing without issues.

Tests have concluded that:

So, it seems this has to be *something* on the opnsense box. Since I can connect via backhaul (ssh to server in same vlan, and from there i can connect wired to the opnsense box), I decided to check some things, and I found out that when my laptop's ARP entry expires, my connectivity drops.


thor.home.lan (192.168.xx.xx) at (incomplete) on igb1_vlan134 expired [vlan]


It then takes between 10 secs and several minutes before it starts working again... and the ARP entry is once again filled. Countdown timer 1200secs, and it's reproducible.

The host itself has a static dhcp entry in dhcpv4, but I don't think that matters much.

Anybody any idea how I could solve this?
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on November 30, 2020, 09:21:06 PM
I can reproduce this on multiple wireless clients. Really odd.

I've been playing with the hardware TSO/LRO/CRC and VLAN support, switching that off, but nothing changed.
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: schnipp on December 01, 2020, 05:44:25 PM
Please describe the full communication path and the components (including os version) involved. Normally, the timeout of an arp table entry is reset when it is used.
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on December 01, 2020, 06:44:48 PM
In this case:

* Linux laptop(s) running debian unstable/testing (kernel 5.9.0)
* switches are unifi switch 8's, firmware 4.3.21.11325
* access points are unifi ap-ac-pro's, firmware 4.3.21.11325
* OPNsense is 20.7.5

Flow is linux laptop <-> AP <-> switch <-> opnsense <-> internet (or other vlan).
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: schnipp on December 01, 2020, 07:17:55 PM
So, in this scenario your laptop must have an arp entry for the opnsense box to communicate across the subnet. The entry should never expire in case you continiously exchange ip packets with the opnsense.

Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on December 02, 2020, 05:14:49 PM
Static ARP: problem solved. Not really an acceptable 'fix' though... strange thing is that before I used to have an Unifi USG, and that was replaced with the OPNsense box.

I've also been playing with some Unifi configuration, and now the problem has 'disappeared'... so I'm going to have to wait what happens.

I came across
https://community.ui.com/questions/LAN-to-WLAN-ARP-Issue-with-UAP-AC-LR/9b1b3060-2950-4bb8-b4ec-eaf4442d75bb?page=1
https://forum.netgate.com/topic/157090/periodic-drops

which seem like the thing I was seeing - and I have the same hardware in play :/
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: schnipp on December 02, 2020, 07:56:54 PM
If there are no malformed arp responses from the opnsense, it seems to be a firmware issue of the AP
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on December 06, 2020, 10:29:30 AM
Is there an easy way to pick out malformed arp reqs/replies?
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: schnipp on December 06, 2020, 09:09:59 PM
I don't think so. Do you have observed such malformed packets (e.g. packet capture)?
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on December 11, 2020, 02:02:41 PM
Not that I can see. I just don't see the traffic *at all* coming into opnsense when i have timeouts
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: schnipp on December 12, 2020, 12:41:08 PM
Did you check arp communication with packet dump and wireshark?
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: allebone on March 02, 2021, 07:12:51 PM
I think I am having the same issue. I will try adding a static arp entry on my ubuntu machine connceted via wifi and report back.
Mo
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on March 02, 2021, 07:37:26 PM
In the end it was the access point causing corruption.

Since I'm running the latest beta 5.53 on my APs, the problems seem to have vanished...

Sent from my SM-T970 using Tapatalk

Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: allebone on March 02, 2021, 07:59:28 PM
My AP is not unifi but will check if there are any fw updates.
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: allebone on March 02, 2021, 08:46:46 PM
I had this happen again, and I had already set a static arp entry on my client side. This did not fix it. However I logged into opnsense via another machine and observed the following:

? (192.168.2.13) at (incomplete) on em1 expired [ethernet]
? (192.168.2.66) at 28:a0:2b:3c:f3:8c on em1 expires in 1166 seconds [ethernet]
? (192.168.2.2) at 00:e0:67:21:e6:07 on em1 permanent [ethernet]
? (192.168.2.192) at c0:f8:da:21:e2:ac on em1 expires in 1189 seconds [ethernet]
? (192.168.2.160) at 74:ac:b9:e0:05:7a on em1 expires in 1150 seconds [ethernet]
? (192.168.2.6) at 52:54:00:36:76:f0 on em1 expires in 1193 seconds [ethernet]
? (192.168.2.4) at 00:d8:61:03:45:cd on em1 expires in 870 seconds [ethernet]
? (192.168.2.58) at 74:81:14:b5:32:86 on em1 expires in 1171 seconds [ethernet]
? (192.168.2.56) at 04:69:f8:31:eb:e3 on em1 expires in 1178 seconds [ethernet]
? (192.168.2.185) at ec:8e:b5:04:dd:8e on em1 expires in 649 seconds [ethernet]
? (192.168.2.63) at 52:54:00:e8:36:5d on em1 expires in 1180 seconds [ethernet]
? (192.168.2.50) at c0:9a:d0:c7:5c:22 on em1 expires in 1118 seconds [ethernet]
? (192.168.2.82) at 58:d3:49:2c:f0:cf on em1 expires in 1150 seconds [ethernet]
? (192.168.2.83) at 58:d3:49:02:31:33 on em1 expires in 1180 seconds [ethernet]
? (192.168.2.16) at 52:54:00:a8:0d:05 on em1 expires in 1028 seconds [ethernet]
? (192.168.2.80) at 58:d3:49:23:0e:01 on em1 expires in 1180 seconds [ethernet]
? (192.168.2.81) at 58:d3:49:22:25:74 on em1 expires in 1151 seconds [ethernet]
? (192.168.2.54) at 64:0b:d7:ee:0e:51 on em1 expires in 1173 seconds [ethernet]
? (192.168.2.22) at 52:54:00:17:48:d0 on em1 expires in 1172 seconds [ethernet]
? (192.168.2.183) at 9c:8e:cd:26:be:56 on em1 expires in 1186 seconds [ethernet]
? (192.168.2.55) at 64:0b:d7:eb:fc:e6 on em1 expires in 1197 seconds [ethernet]
? (192.168.2.21) at 9c:b6:54:be:3e:60 on em1 expires in 1199 seconds [ethernet]
? (192.168.2.181) at 74:ee:2a:5f:c0:60 on em1 expires in 1118 seconds [ethernet]
root@OPNsense:~ # arp -s 192.168.2.13 98:83:89:8A:4F:83
root@OPNsense:~ # arp -a -n
? (192.168.2.13) at 98:83:89:8a:4f:83 on em1 permanent [ethernet]
? (192.168.2.66) at 28:a0:2b:3c:f3:8c on em1 expires in 1176 seconds [ethernet]
? (192.168.2.2) at 00:e0:67:21:e6:07 on em1 permanent [ethernet]
? (192.168.2.192) at c0:f8:da:21:e2:ac on em1 expires in 1150 seconds [ethernet]
? (192.168.2.160) at 74:ac:b9:e0:05:7a on em1 expires in 1173 seconds [ethernet]
? (192.168.2.6) at 52:54:00:36:76:f0 on em1 expires in 1154 seconds [ethernet]
? (192.168.2.4) at 00:d8:61:03:45:cd on em1 expires in 831 seconds [ethernet]
? (192.168.2.58) at 74:81:14:b5:32:86 on em1 expires in 1200 seconds [ethernet]
? (192.168.2.56) at 04:69:f8:31:eb:e3 on em1 expires in 1199 seconds [ethernet]
? (192.168.2.185) at ec:8e:b5:04:dd:8e on em1 expires in 610 seconds [ethernet]
? (192.168.2.63) at 52:54:00:e8:36:5d on em1 expires in 1141 seconds [ethernet]
? (192.168.2.50) at c0:9a:d0:c7:5c:22 on em1 expires in 1169 seconds [ethernet]
? (192.168.2.82) at 58:d3:49:2c:f0:cf on em1 expires in 1111 seconds [ethernet]
? (192.168.2.83) at 58:d3:49:02:31:33 on em1 expires in 1141 seconds [ethernet]
? (192.168.2.16) at 52:54:00:a8:0d:05 on em1 expires in 989 seconds [ethernet]
? (192.168.2.80) at 58:d3:49:23:0e:01 on em1 expires in 1141 seconds [ethernet]
? (192.168.2.81) at 58:d3:49:22:25:74 on em1 expires in 1112 seconds [ethernet]
? (192.168.2.54) at 64:0b:d7:ee:0e:51 on em1 expires in 1164 seconds [ethernet]
? (192.168.2.22) at 52:54:00:17:48:d0 on em1 expires in 1184 seconds [ethernet]
? (192.168.2.183) at 9c:8e:cd:26:be:56 on em1 expires in 1147 seconds [ethernet]
? (192.168.2.55) at 64:0b:d7:eb:fc:e6 on em1 expires in 1158 seconds [ethernet]
? (192.168.2.21) at 9c:b6:54:be:3e:60 on em1 expires in 1199 seconds [ethernet]
? (192.168.2.181) at 74:ee:2a:5f:c0:60 on em1 expires in 1167 seconds [ethernet]
root@OPNsense:~ #


The moment this line was entered: root@OPNsense:~ # arp -s 192.168.2.13 98:83:89:8A:4F:83 everything immediately fixed itself on the client side, so it seems that opnsense is somehow losing/not getting the arp entry from the client.

I am unsure how to troubleshoot this. Adding a static entry on opnsense has fixed the problem and it has not reappeared so far. I am thinking it wont since the sending of this command so immediately and obviously fixed the problem.

What should I do? Just always add a static arp entry for affected clients or is there a better way?

Pete
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: allebone on March 02, 2021, 09:38:09 PM
One thing I forgot to mention is that from the client I am noticing the issue (cant ping opnsense from it because opnsense loses arp entry) is that an rdp connection I have open to another pc on the network does not drop. Both the opnsense and the rds server I connect to are plugged into the same switch. So while rds server is able to keep arp entry, opnsense cannot but is plugged in exactly the same.
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: allebone on March 02, 2021, 10:59:59 PM
Since I dont know how to fix this correctly, I have added troublesome clients to /etc/rc.conf so there is a static arp entry on reboot on the opnsense box. This fixes the problem. Unsure how to resolve the root cause though. If the opnsense never forgets the arp entry, there is no dropped traffic though, so this is how I will leave it until someone has a better idea.
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: devilkin on April 13, 2021, 08:27:49 AM
If you're using Ubiquity gear, checkout this post:

https://www.reddit.com/r/Ubiquiti/comments/mlit54/problems_with_broadcastmulticast_traffic_on_uap/
Title: Re: arp timeouts on VLAN cause connection interruptions
Post by: allebone on April 13, 2021, 10:33:33 PM
Thanks although did not actually apply in my case, good find though :)