[CALL FOR TESTING] Netmap generic mode queue stall fixes

Started by franco, January 27, 2023, 11:38:45 AM

Previous topic - Next topic
To be frank, there is a dmesg/grep combo in my post to diagnose up front and it would be nice to have that.

It would also be nice to have error messages you are seeing. They do point to something, but different as the queue stalls were silent.

Lastly, a queue stall requires killing Suricata or Zenarmor or reboot to get connectivity back. Since I'm not seeing this clear wording here I'm still sceptical.


Cheers,
Franco

I did post this before, but I wasn't posting all the errors.

I will have to wait until the weekend to do more testing/logging as it's very disruptive to others in the house. Will post back more soon. There were other log entries. I clear out dmesg frequently to help track the changes. I also have the Intel I225-V adapters that may be more unstable. Eventually connectivity does come back without my restarting anything, but it's what I would call very similar to a flapping interface type issue.

Yes, I still see queue stalls with this kernel.
I have even gone through some effort to pass a hardware interface through to Opnsense to move away from vtnet onto an igb driver interface and now am no longer using Netmap generic mode (at least I no longer see it in dmesg) - but I still see the queue stalls where traffic stops flowing and i need to stop eastpect to get it to work again.

It's a regular occurrence here, I pretty much see it every 2-3 days. So if you want me to test/debug something, I can probably do it within that timeframe.

Can also switch back to generic mode easily if required for further testing.

I do think the queue stalls question only counts for generic netmap mode running and the issue being reproducible on Zenarmor and Suricata at the same time. Otherwise we have too many unrelated things happening without a way to trace them reliably.

In an upstream Suricata ticket the same pattern seems to emerge that generic netmap mode is prone to stalling although the user base affected seems almost too small to sample properly.


Cheers,
Franco

I can't get netmap to work without issues for more than a day. Native netmap doesn't even work beyond 10 minutes or so, but it does make sense as the netmap documentation doesn't state that it supports the igc driver. My previous box with igb drivers (supported) didn't have the same issues, but it was underpowered.

Emulated mode works up to a day or so, but it doesn't take much to cause the interfaces being protected by Zenarmor (or, separately, Suricata when in IPS mode) to flap. Just changing a setting in the profile such as adding or removing blocking of ad tracking in the web content filter can cause issues.

I eliminated vlans, as I didn't need them (yet), but that didn't make a difference.

I'm going back to running in passive mode.

The problem I have trying to keep this focused is that I have to note that down/up hiccups are not part of the scope here and either are an issue with the driver or with the switch in front of the device doing no-so-great speed negotiation things or packet flooding making the driver drop out, which could be the same network issue as before but the old driver being more resilient in these situations.


Cheers,
Franco

I agree, and this is my last follow-up as my issue is different. My issue is almost certainly due to the hardware I have, specfically: Intel I225-V 2.5G interfaces with igc drivers. And it's definitely netmap related as the same issues pop up whether I use Suricata in IPS mode or Zenarmor in L3 mode, with native or emulated netmap driver (using either version of the emulated netmap driver). All other OPNsense plug-ins and policies I'm using work without issue.

I will be getting a different wireless access point in the future with a 2.5Gbps WAN port to connect to this OPNsense box and will try this again in the future to see if that makes a difference.

When in netmap emulated mode, I get lots and lots of the below in dmesg but when in native mode, it's very different with complete interface drops happening very frequently.

igc2 is the interface to the wireless access point (1Gbps)
igc0 is the interface to my unmanaged 1Gbps LAN switches


549.798697 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
549.806464 [1039] generic_netmap_dtor       Native netmap adapter for igc2 restored
549.814383 [1047] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
563.897088 [1142] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
563.905859 [1039] generic_netmap_dtor       Native netmap adapter for igc2 restored
563.913758 [1047] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
563.922093 [1142] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
563.930994 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
672.439061 [ 295] generic_netmap_unregister Emulated adapter for igc0 deactivated
672.446788 [1039] generic_netmap_dtor       Native netmap adapter for igc0 restored
672.454630 [1047] generic_netmap_dtor       Emulated netmap adapter for igc0 destroyed
672.519959 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
672.530783 [1039] generic_netmap_dtor       Native netmap adapter for igc2 restored
672.538635 [1047] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed

Eventually I get an error message that the interface (igc2 usually) went down. I don't have any of those errors handy at the moment (as they're older than dmesg.yesterday in my log files).

Thanks for all your work on this! I will give this another try next time there are updates to netmap.

So far so good. I've been running on the 23.1.1 build with no netmap issues. Even when netmap worked for me before,dmesg would show lots of dtor, attach, register, unregister entries. None of that since in the last 9 hours or so for me with no interface flaps.

Looking promising...

Also...the native netmap driver is working great too. It's like a night and day difference. Having no issues thus far. :-)

What's the rollback command for this?

I see that VLANs are all emulated types... didn't know that. :)

Unfortunately I am also still having interface crashes ( VLANs) with the new Netmap.. Thought I was ok after applying the fix but it popped again today. So we are saying I should disabled ZenArmor?

If Zenarmor is the issue I guess I should ask for a refund since I cannot use the services I am paying for?

February 19, 2023, 10:01:26 PM #41 Last Edit: February 19, 2023, 10:03:28 PM by SpinningRust
Quote from: SpinningRust on February 18, 2023, 03:33:59 PM
Also...the native netmap driver is working great too. It's like a night and day difference. Having no issues thus far. :-)

Unfortunately I was wrong. User error. My custom policy in Zenarmor showed it was enabled and actively running, but it wasn't, so Netmap hasn't been running at all since the upgrade 2 nights ago. Must have been something with the 23.1.1 update that had the setting toggled incorrectly.

I noticed something was off when the reports didn't show anything blocked or even as a threat in my custom pollcy. Creating a blocklist item for a test domain also wouldn't be blocked and only pass through the default policy. Once I tickled my custom zenarmor policy off and then on again, it began to work as desired...and all my netmap problems returned. Bummer!

Back to passive mode I go.

Here is what I see in dmesg with native netmap:
99.862767 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
499.870971 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
500.600314 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
500.636894 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc0: link state changed to UP
igc2: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
367.771681 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
367.779815 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
igc2: link state changed to DOWN
igc2: link state changed to UP
480.959262 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
480.967393 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc2: link state changed to UP
igc2: link state changed to DOWN
igc2: link state changed to UP
523.403350 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
523.411486 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc2: link state changed to UP
igc2: link state changed to DOWN
igc2: link state changed to UP
arp: 192.168.200.22 moved from f2:0f:ab:ac:aa:a9 to 1c:53:f9:aa:b5:65 on igc2
621.436105 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
621.444361 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc2: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
666.783413 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
666.791593 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
684.987780 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
684.996044 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
789.998143 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
790.006289 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
arp: 192.168.200.22 moved from 1c:53:f9:aa:b5:65 to f2:0f:ab:ac:aa:a9 on igc2
igc2: link state changed to DOWN
igc2: link state changed to UP
arp: 192.168.200.22 moved from 1c:53:f9:aa:b5:65 to f2:0f:ab:ac:aa:a9 on igc2
989.861585 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
989.869836 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048

And again for emulated (goes on and on until it eventually flaps wireless off):
012.906533 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
012.915326 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
012.923231 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
012.931527 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
012.940339 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
012.963658 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
012.974662 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
012.982483 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
021.006448 [1137] generic_netmap_attach     Emulated adapter for igc0 created (prev was igc0)
021.015312 [1034] generic_netmap_dtor       Native netmap adapter for igc0 restored
021.023196 [1042] generic_netmap_dtor       Emulated netmap adapter for igc0 destroyed
021.031676 [1137] generic_netmap_attach     Emulated adapter for igc0 created (prev was igc0)
021.040581 [ 320] generic_netmap_register   Emulated adapter for igc0 activated
021.083370 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
021.092225 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
021.100132 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
021.108489 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
021.117381 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
igc0: link state changed to UP
024.274716 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
024.282425 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
024.290272 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
038.032110 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
038.040849 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
038.048749 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
038.057062 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
038.065931 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
039.777730 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
039.785589 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
039.793627 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
053.197044 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
053.205799 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
053.213707 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
053.222033 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
053.230856 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
068.242712 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
068.250438 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
068.258245 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
081.388358 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
081.397402 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
081.405610 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
081.414240 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
081.423429 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
088.474600 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
088.482314 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
088.490121 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
102.524921 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
102.533710 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
102.541601 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
102.549912 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
102.558763 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
142.291684 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
142.301275 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
142.309289 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
155.734586 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
155.743387 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
155.751244 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
155.759608 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
155.768493 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
327.084679 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
327.092373 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
327.100229 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
341.229386 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
341.239156 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
341.247428 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
341.256522 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
341.266272 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
365.673498 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
365.681226 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
365.689054 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
379.343039 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
379.351809 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
379.359675 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
379.367996 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
379.376820 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
667.362913 [ 295] generic_netmap_unregister Emulated adapter for igc0 deactivated
667.370615 [1034] generic_netmap_dtor       Native netmap adapter for igc0 restored
667.378426 [1042] generic_netmap_dtor       Emulated netmap adapter for igc0 destroyed
667.464266 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
667.475151 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
667.482963 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed

It's really hard for anyone involved to follow if reports are being brought up multiple times that are outside of the test scope. Yes, I acknowledge that igc(4) is doing emulation on netmap, but the driver doing up/down dances is not a problem that the published patch can possibly address. The driver is relatively new and likely not maintained by Intel in FreeBSD, which can also lead to this problematic situation of being sub-par.


Cheers,
Franco

Franco,

I've done a lot of testing and this fix works for me. Was restarting the previous version of opnsense every 1-2 days.

Just moved alot of data (300gb large files) and had no issues, i'll keep an eye on it over the next week. The thread getting hijacked doesn't help

Hi Franco,

Before I dig into the issue, I want to thank you for the hard work you have been doing for the community.

My setup was working without issues, until I added a new VLAN into my opnsense setup with new routes (VLAN18). I have pasted the information below for troubleshooting as I am unable to fire up my zenarmor instance at the moment unless it is in passive mode which is the only thing I can do at the moment.

If you can provide some tips or ideas what I can try next that would be great. Side note I did try to do native, and emulated L3 netmap mode both failed, and also, I've tried to just monitor the igb1 interface only and not the VLAN interfaces and that also failed.

My Nic is the quad i350 Intel NIC fyi.

Below is the commands I ran for documentation purposes:

# opnsense-update -zkr 23.1.1-netmap
# opnsense-shell reboot

# dmesg | grep generic_netmap_register
864.793366 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
865.767213 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
867.483308 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
023.080385 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
023.109638 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
024.318984 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
024.496539 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
274.365244 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
274.377485 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
275.318691 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
474.486927 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
474.542792 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
475.248696 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
475.458719 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
268.825997 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
270.486516 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
270.798396 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
306.520468 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
307.940215 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
308.567320 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
887.830041 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
888.686489 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
889.171163 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
930.579711 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated



# uname -a
FreeBSD myhostname 13.1-RELEASE-p6 FreeBSD 13.1-RELEASE-p6 netmap-n250399-995512c8607 SMP amd64