OPNsense Forum

Archive => 23.1 Legacy Series => Topic started by: franco on January 27, 2023, 11:38:45 am

Title: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on January 27, 2023, 11:38:45 am
Hi!

Zenarmor and OPNsense have been working with Klara to bring netmap improvements to FreeBSD, some of which have already landed in the development branch for upcoming FreeBSD 14.

One of the goals in the project was to find and remove bugs from netmap. One of those bugs has been network traffic becoming unresponsive on generic mode, which means the driver itself doesn't support netmap, but can be made to interact with netmap wrapping around it...

It's easy to spot these on your system, e.g.:

# dmesg | grep generic_netmap_register
442.167865 [ 320] generic_netmap_register   Emulated adapter for gif1 activated

If you see log messages here then you might be affected and perhaps saw the behaviour before: suricata/zenarmor needs to be restarted in order to continue packet flow.

The change in question is: https://github.com/opnsense/src/commit/0c47d02eefec

And the kernel can be installed on 23.1 easily:

# opnsense-update -zkr 23.1.2-netmap
# opnsense-shell reboot

We would hope some of you could try this one out and see if problems disappear (or perhaps cause another dropout as we've solved internally already with an earlier version of the patch).

The patch does have implications on reliability in generic mode (which was always and will always be less reliable than native netmap mode), but we will explain these at a later time.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: ofu12345 on January 27, 2023, 11:47:54 am
awesome!

Works for me, now zenarmor reports can be seen again (using zenarmor 1.12.4 and App/RulesDB 1.12.22122618).

Great work, thank you!
 Oliver
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on January 27, 2023, 11:49:32 am
Hi Oliver,

huh, I'm missing some context here. It's not supposed to fix a previously unbroken Zenarmor. Perhaps the reboot did it for you? ;)


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: cgone on January 27, 2023, 12:54:22 pm
I installed the new kernel and it works!

The problems with registering the telephone by the fritzbox and the surfing by the rest of the family are gone.

Unfortunately both interface uses vlan and therefore the generic netmap driver...
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: BNaCl on January 31, 2023, 02:36:18 pm
Not quite sure if this applies to my situation - looking for clarification.

I have been troubleshooting an issue with Sensei/ZA which I have documented here:

https://forum.opnsense.org/index.php?topic=31544.0

Sunny Valley support has indicated the problem is netmap and asked me to give this a try, which I did yesterday. The result is that it "works", but I still have the interface flapping so it didn't resolve my particular issue. I have a feeling this doesn't apply to me due to the fact that I have OPNs configured as a transparent filtering bridge and using the ZA bridge deployment mode. It doesn't stall, it just doesn't work at all which seems different.

IF I understand correctly (big assumption), their "bridge mode" currently uses netmap and bypasses the OS, but the problem is that ZA won't pass traffic at all unless the bridge is also configured in OPNs (resulting in the flapping). Therefore, the solution is to either fix netmap or add support to if_bridge(4). It should be noted that this config did previously work (with the OPNs bridge or without), so not sure where a change was implemented to break it.

Am I on the right path here? Apologies if I am off target, I am a bit out of my comfort zone on this one.   
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on January 31, 2023, 04:16:48 pm
> Not quite sure if this applies to my situation - looking for clarification.

It's easy to spot these on your system, e.g.:

# dmesg | grep generic_netmap_register
442.167865 [ 320] generic_netmap_register   Emulated adapter for gif1 activated

If you see log messages here then you might be affected and perhaps saw the behaviour before: suricata/zenarmor needs to be restarted in order to continue packet flow.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: BNaCl on January 31, 2023, 04:34:34 pm
I actually did that but I’m not super well versed in SSH shell. I was pretty sure blank means not applicable but wanted to check. What threw me off was Sunny Valley wanted me to try this. I even asked if this applied b/c it seemed like it didn’t. Thanks Franco.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on January 31, 2023, 04:44:01 pm
Ok, so you are not using the generic netmap mode in that case. The patch is not for you, but we do have an if_bridge patch coming up shortly (iterating through QA at the moment).

However, moving all to bridge will just try to work around the issue of a hardware interface going down/up. The actual issue might persist. The down/up is actually a failsafe for removing hardware filter option settings from the device which needs a hard reset, but in theory the reset is not needed if the hardware bits are all set correctly already. A patch is not planned at this point in the project, but was discussed.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: BNaCl on January 31, 2023, 05:19:54 pm
Appreciate the clarification, figured that was the case. It seems they don't have a working solution for a transparent bridge config at this time. Like I mentioned, it was previously working without the bridge configured in OPNs, but something changed along the way. Also, good to know the longer term bridge fix for this scenario isn't forthcoming. I have a call with them today and hope to get their reporting only mode functioning which if I understand correctly uses pcap.

Take care, love OPNs and the work you guys are doing here.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 01, 2023, 09:11:17 am
Could be that it was working either due to older FreeBSD state or old code paths that have subsequently been rewritten. For both things there is a problem:

1. FreeBSD state does sometimes deteriorate due to surrounding networking changes. Netmap has its limits both in technical and organisational sense. It's being worked on but the main consumers seem to be OPNsense/pfSense and research projects (where this originally came from). That's also why we involved Klara to look at a few shortcomings and problems encountered over the years.

2. The rework of code paths is always done to simplify and to take side effects out of the configuration paths as they are reported. There is no ill intention on breaking a certain setup (and none was implied here  but I feel I should state it explicitly). And past that we do seem to trigger other side effects from these reworks that are more in the area of the kernel than our code, which could have the averse effect stated as well.

I think at least starting to see kernel issues for what they are is a good step all things considered. Some work is being done although really slow in the grand scheme of things but still gradual so as to take one step at a time. :)


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: cgone on February 01, 2023, 07:59:46 pm
...

And the kernel can be installed on 23.1 easily:

# opnsense-update -zkr 23.1-netmap
# opnsense-shell reboot

...
I coincidentally installed the original kernel by installation of 23.1_6 back and my problems reappeared.
So I am looking forward to make this fix permanent.

@Franco: Will this fix be included in 23.2 or earlier?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 02, 2023, 09:37:31 am
The "results" (or rather a bit of lack thereof) seem promising. We've heard of no crashes, no regressions and no problem on the reliability front with more dropped packets vs. before.

The review is https://reviews.freebsd.org/D38065 but it's currently on hold because netmap developer has a different view on the subject. I'm unsure how quickly this will be resolved.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Phiolin on February 03, 2023, 11:25:00 am
I'm affected by the netmap/Zenarmor issue and will install the patch today to test. Thanks for bringing this forward! :)
Will report back in 2-3 days as it usually took a while for Zenarmor to get stuck on the old kernel.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: jbhorner on February 03, 2023, 08:15:41 pm
I just posted a comment in the Zenarmor noting that I think Netmap is causing a bit more of an issue than issues with generic mode (which I do not use). I am getting regular kernel panics that seem to point to Netmap:

--- trap 0xc, rip = 0xffffffff81226810, rsp = 0xfffffe00dafcb758, rbp = 0xfffffe00dafcb830 ---
lapic_handle_timer() at lapic_handle_timer/frame 0xfffffe00dafcb830
virtqueue_notify() at virtqueue_notify+0x87/frame 0xfffffe00dafcb860
vtnet_txq_mq_start_locked() at vtnet_txq_mq_start_locked+0xa2/frame 0xfffffe00dafcb8b0
vtnet_txq_mq_start() at vtnet_txq_mq_start+0x61/frame 0xfffffe00dafcb8e0
vlan_transmit() at vlan_transmit+0xf3/frame 0xfffffe00dafcb930
nm_os_generic_xmit_frame() at nm_os_generic_xmit_frame+0x6d/frame 0xfffffe00dafcb950
generic_netmap_txsync() at generic_netmap_txsync+0x2eb/frame 0xfffffe00dafcba40
netmap_ioctl() at netmap_ioctl+0x1a4/frame 0xfffffe00dafcbb10
freebsd_netmap_ioctl() at freebsd_netmap_ioctl+0x74/frame 0xfffffe00dafcbb50

I'll implement the potential solution noted here to see if it makes a difference.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: mb on February 03, 2023, 08:31:43 pm
@jbhorner, thanks. Quick question: are you using vlans on a vtnet interface?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: jbhorner on February 04, 2023, 02:30:49 am
Yes, I do.  I don't use pass-throughs on my VM. They cause problems with snapshots. (Or at least they have for me in the past.)

After my last reply here, it had another kernel panic (post patch). Not sure what's going on here so will just have to stick with a prior release snapshot, or pfSense. I'm sure it will be sorted soon...I just won't have time to deal with the crashes.

I might bring it up later so that I can forward the crash log. But for now, it's peacefully sleeping...

Cheers!
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: almodovaris on February 04, 2023, 02:41:11 am
The test kernel works.

# dmesg | grep generic_netmap_register gives me nothing.

Can I test Zenarmor multicore? I.e. eastpect multicore.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: mb on February 04, 2023, 07:04:41 pm
Yes, I do.  I don't use pass-throughs on my VM. They cause problems with snapshots. (Or at least they have for me in the past.)

After my last reply here, it had another kernel panic (post patch).

Thanks for more information @jbhorner. Since you're using vlan(4), you're actually using the netmap emulated driver. We'll take a look.

For the time being, for the sake of clarity, please confirm these crashes happen when you're using the netmap beta kernel?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: djr92 on February 05, 2023, 12:17:57 am
Just reporting my findings.

Performance is much better overall, but Large file transfers (like Win11.iso) between vlans still result in a complete lockup of the router when Zenarmor is active. All vlans are on the same parent interface (ix0) which is monitored by Zenarmor.

I am also seeing 100% A+ ratings on waveform bufferbloat test with Shaper configured. I used to see an A rating with moderate increase in buffebloat. It’s absolutely zero now with a consistent A+. Even without Zenarmor, I’ll be running this kernel for awhile. Thank you all for your efforts to improve the platform.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: mb on February 05, 2023, 04:58:58 pm
@djr92, thanks, very helpful. Glad to hear that you've seen improvements in bufferbloat tests.

WRT the stalls, can you try the same test with Zenarmor in bypass mode? I want to make sure it's not ZA-related. In bypass mode, ZA acts as a dummy bridge switching packets back and forth.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: djr92 on February 05, 2023, 11:17:36 pm
Hello.

For clarity, I had this same issue on two different machines before testing the new Kernel. This issue only appears when Netmap ZA is in Routed Mode. In bypass or passive mode I have no issue.

The issue is specifically a router crash. Router becomes unresponsive and all connectivity fails for a period of time. Often requires a power reset to recover. 

The issue only appears when ZA is in routed mode and it only happens when transferring a single large file between vlans. Something like a 5GB .iso file. I can transfer a 5GB folder full of smaller files with no issue.

I have tried with the vlans on the same parent interface and I’ve also tried with the vlans on different physical interfaces. Same issue.

I’m currently using a Netgate 6100 with the 10G (ix) uplinks. I also had the same issue on my previous Dell SFF PC using X550-T2 NIC (also ix).

It happened on the stock kernel and it also happens with this new test kernel.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 06, 2023, 02:23:15 am
I still get errors, but this patch is enabling ZA to actually work in L3 routed mode. The following combination seems to work best for me using Intel I225-V 2.5G interfaces (Protectli VP2420):
- Disable flow control in tunables (dev.igc.0.fc, dev.igc.1.fc, dev.igc.2.fc, dev.igc.3.fc all set to 0)
- Install this 23.1-netmap kernel
- Set ZA to run in L3 Reporting and Blocking with emulated driver.

Any one of the above settings changed, and I have flapping interfaces and issues. Especially with wireless. Wired and wireless connect to different interfaces on the firewall with difference subnets and firewall rules.

Most of the errors occur on the wireless interface (igc2)
424.125647 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
438.207994 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
452.313472 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
484.519552 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
498.622187 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
514.752345 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
544.042637 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
558.191049 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
572.323451 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
599.501288 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
614.628354 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
632.829857 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
763.054897 [ 320] generic_netmap_register   Emulated adapter for igc2 activated

With an occasional error on the wired interface:
325.237102 [ 320] generic_netmap_register   Emulated adapter for igc0 activated
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: markj on February 06, 2023, 04:26:43 pm
I just posted a comment in the Zenarmor noting that I think Netmap is causing a bit more of an issue than issues with generic mode (which I do not use). I am getting regular kernel panics that seem to point to Netmap:

Could you please show the full stack trace and panic message? The snippet you pasted just shows a thread taking a timer interrupt while pushing packets out of netmap, so it's hard to draw any conclusions.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Phiolin on February 07, 2023, 08:14:46 am
Unfortunately I just had Zenarmor pass out again on the netmap test kernel. I use VLANs on a vtnet interface, so I'm the classic case for this issue I guess.
Had to restart Zenarmor and then everything came back.
Let me know if you need any more specific information!

Code: [Select]
% uname -a
FreeBSD redacted.local 13.1-RELEASE-p5 FreeBSD 13.1-RELEASE-p5 netmap-n250377-0c47d02eefe SMP amd64

Actually, why did this start happening anyway? I never had these issues before like... idk, November 2022 or so?
I guess there have been kernel changes in this area that are now causing the issue, so it's good that it is being looked into, but I wonder if it wouldn't be easier to just roll back whatever change introduced the problem in the first place?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: mb on February 08, 2023, 08:48:29 pm
Adding some context here...

@markj is from the FreeBSD Project/Klara Systems. We're currently collaborating with Klara and Mark to sort out outstanding netmap issues.

In this regard, any help you can provide here would be much appreciated by the community, since it'll help ship a reliable netmap kernel not just for OPNsense, but for the whole BSD ecosystem which is relying on FreeBSD, since these improvements will be upstreamed.

Thanks in advance for all your attention and help.!
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 08, 2023, 09:21:22 pm
After giving it a few more days of testing, like many others I'm still having issues with netmap, especially when significant bandwidth intensive traffic is taking place. I've resorted to placing Zenarmor into passive mode so that it's using just pcap and not having issues with that. I try to monitor the reports regularly and look for threats to possibly block in the firewall rules besides the other measures already in place (DNSBL's, Geo-IP, URL table subscriptions, CrowdSec, etc).

Here's hoping netmap fixes come soon...
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: mb on February 08, 2023, 10:59:02 pm
@SpinningRust, thanks for the feedback.

Which ethernet were you using for the ZA protected interface? Were there any VLANs involved?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 09, 2023, 05:22:25 am
I have it set on the interface to my LAN/wired network (igc0) and my access point (igc2). I had originally setup vlans for each of these with the intention to eventually logically separate in a downlink switch for IoT, etc. with additional vlans. Or for multiple SSIDs to the wireless, but I haven't done that yet since I don't have managed switches yet or an AP that supports vlan trunks. So, the vlans are pointless right now and were only associated with the parent interfaces but have never been assigned as an interface for firewall policies, etc.

I've deleted the vlans, but they do still show up in Zenarmor as an assignable interface, though I've never used them. Not sure how to clear them out of Zenarmor since they no longer exist.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 09, 2023, 09:26:16 am
> I'm still having issues with netmap, especially when significant bandwidth intensive traffic is taking place

Let's be a bit more clear about this: we are fixing queue stalls. If you had queue stalls and still see queue stalls we would like to know before moving the goalpost to performance and further reliability.

Does anyone see queue stalls with the kernel published here? No ping going through at all? Single connections being stuck must be excluded and I'm not even sure if this is something that Zenarmor could cause as well given the nature of flow tracking in the user application.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 09, 2023, 01:20:49 pm
Does anyone see queue stalls with the kernel published here? No ping going through at all?

Yes, I believe so, but I'm very new to this and may be experiencing a different issue. However, dmesg fills up with errors. My comment previously was that the best way to cause the errors is a large upload/download or something bandwidth intensive. It was not in regards to performance.

Also, I can replicate issues, to less degree in some testing with the IPS feature, which I believe also uses netmap...but I'm not sure if it's using the emulated netmap driver. I have extensively tested the emulated driver for Zenarmor. While it works longer than the native netmap driver, it will fail causing wifi connectivity or other lan activity to experience complete drops for periods of time before eventually recovering.

So, for now, I'm using both IDS and Zenarmor in passive mode with no issues at all since netmap isn't used.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 09, 2023, 02:37:32 pm
To be frank, there is a dmesg/grep combo in my post to diagnose up front and it would be nice to have that.

It would also be nice to have error messages you are seeing. They do point to something, but different as the queue stalls were silent.

Lastly, a queue stall requires killing Suricata or Zenarmor or reboot to get connectivity back. Since I'm not seeing this clear wording here I'm still sceptical.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 09, 2023, 03:38:33 pm
I did post this (https://forum.opnsense.org/index.php?topic=32114.msg156280#msg156280) before, but I wasn't posting all the errors.

I will have to wait until the weekend to do more testing/logging as it's very disruptive to others in the house. Will post back more soon. There were other log entries. I clear out dmesg frequently to help track the changes. I also have the Intel I225-V adapters that may be more unstable. Eventually connectivity does come back without my restarting anything, but it's what I would call very similar to a flapping interface type issue.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Phiolin on February 12, 2023, 08:03:30 am
Yes, I still see queue stalls with this kernel.
I have even gone through some effort to pass a hardware interface through to Opnsense to move away from vtnet onto an igb driver interface and now am no longer using Netmap generic mode (at least I no longer see it in dmesg) - but I still see the queue stalls where traffic stops flowing and i need to stop eastpect to get it to work again.

It’s a regular occurrence here, I pretty much see it every 2-3 days. So if you want me to test/debug something, I can probably do it within that timeframe.

Can also switch back to generic mode easily if required for further testing.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 13, 2023, 07:35:19 am
I do think the queue stalls question only counts for generic netmap mode running and the issue being reproducible on Zenarmor and Suricata at the same time. Otherwise we have too many unrelated things happening without a way to trace them reliably.

In an upstream Suricata ticket the same pattern seems to emerge that generic netmap mode is prone to stalling although the user base affected seems almost too small to sample properly.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 15, 2023, 03:36:45 am
I can't get netmap to work without issues for more than a day. Native netmap doesn't even work beyond 10 minutes or so, but it does make sense as the netmap documentation doesn't state that it supports the igc driver. My previous box with igb drivers (supported) didn't have the same issues, but it was underpowered.

Emulated mode works up to a day or so, but it doesn't take much to cause the interfaces being protected by Zenarmor (or, separately, Suricata when in IPS mode) to flap. Just changing a setting in the profile such as adding or removing blocking of ad tracking in the web content filter can cause issues.

I eliminated vlans, as I didn't need them (yet), but that didn't make a difference.

I'm going back to running in passive mode.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 15, 2023, 08:56:38 am
The problem I have trying to keep this focused is that I have to note that down/up hiccups are not part of the scope here and either are an issue with the driver or with the switch in front of the device doing no-so-great speed negotiation things or packet flooding making the driver drop out, which could be the same network issue as before but the old driver being more resilient in these situations.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 15, 2023, 02:52:52 pm
I agree, and this is my last follow-up as my issue is different. My issue is almost certainly due to the hardware I have, specfically: Intel I225-V 2.5G interfaces with igc drivers. And it's definitely netmap related as the same issues pop up whether I use Suricata in IPS mode or Zenarmor in L3 mode, with native or emulated netmap driver (using either version of the emulated netmap driver). All other OPNsense plug-ins and policies I'm using work without issue.

I will be getting a different wireless access point in the future with a 2.5Gbps WAN port to connect to this OPNsense box and will try this again in the future to see if that makes a difference.

When in netmap emulated mode, I get lots and lots of the below in dmesg but when in native mode, it's very different with complete interface drops happening very frequently.

igc2 is the interface to the wireless access point (1Gbps)
igc0 is the interface to my unmanaged 1Gbps LAN switches


549.798697 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
549.806464 [1039] generic_netmap_dtor       Native netmap adapter for igc2 restored
549.814383 [1047] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
563.897088 [1142] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
563.905859 [1039] generic_netmap_dtor       Native netmap adapter for igc2 restored
563.913758 [1047] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
563.922093 [1142] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
563.930994 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
672.439061 [ 295] generic_netmap_unregister Emulated adapter for igc0 deactivated
672.446788 [1039] generic_netmap_dtor       Native netmap adapter for igc0 restored
672.454630 [1047] generic_netmap_dtor       Emulated netmap adapter for igc0 destroyed
672.519959 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
672.530783 [1039] generic_netmap_dtor       Native netmap adapter for igc2 restored
672.538635 [1047] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed

Eventually I get an error message that the interface (igc2 usually) went down. I don't have any of those errors handy at the moment (as they're older than dmesg.yesterday in my log files).

Thanks for all your work on this! I will give this another try next time there are updates to netmap.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 17, 2023, 01:04:31 am
So far so good. I've been running on the 23.1.1 build with no netmap issues. Even when netmap worked for me before,dmesg would show lots of dtor, attach, register, unregister entries. None of that since in the last 9 hours or so for me with no interface flaps.

Looking promising...
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 18, 2023, 03:33:59 pm
Also...the native netmap driver is working great too. It's like a night and day difference. Having no issues thus far. :-)
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: lilsense on February 18, 2023, 06:11:24 pm
What's the rollback command for this?

I see that VLANs are all emulated types... didn't know that. :)
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: DoBoY on February 19, 2023, 12:24:43 am
Unfortunately I am also still having interface crashes ( VLANs) with the new Netmap.. Thought I was ok after applying the fix but it popped again today. So we are saying I should disabled ZenArmor?

If Zenarmor is the issue I guess I should ask for a refund since I cannot use the services I am paying for?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 19, 2023, 10:01:26 pm
Also...the native netmap driver is working great too. It's like a night and day difference. Having no issues thus far. :-)

Unfortunately I was wrong. User error. My custom policy in Zenarmor showed it was enabled and actively running, but it wasn't, so Netmap hasn't been running at all since the upgrade 2 nights ago. Must have been something with the 23.1.1 update that had the setting toggled incorrectly.

I noticed something was off when the reports didn't show anything blocked or even as a threat in my custom pollcy. Creating a blocklist item for a test domain also wouldn't be blocked and only pass through the default policy. Once I tickled my custom zenarmor policy off and then on again, it began to work as desired...and all my netmap problems returned. Bummer!

Back to passive mode I go.

Here is what I see in dmesg with native netmap:
99.862767 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
499.870971 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
500.600314 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
500.636894 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc0: link state changed to UP
igc2: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
367.771681 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
367.779815 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
igc2: link state changed to DOWN
igc2: link state changed to UP
480.959262 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
480.967393 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc2: link state changed to UP
igc2: link state changed to DOWN
igc2: link state changed to UP
523.403350 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
523.411486 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc2: link state changed to UP
igc2: link state changed to DOWN
igc2: link state changed to UP
arp: 192.168.200.22 moved from f2:0f:ab:ac:aa:a9 to 1c:53:f9:aa:b5:65 on igc2
621.436105 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
621.444361 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc2: link state changed to DOWN
igc2: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
666.783413 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
666.791593 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
684.987780 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
684.996044 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
789.998143 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
790.006289 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
igc0: link state changed to DOWN
igc0: link state changed to UP
arp: 192.168.200.22 moved from 1c:53:f9:aa:b5:65 to f2:0f:ab:ac:aa:a9 on igc2
igc2: link state changed to DOWN
igc2: link state changed to UP
arp: 192.168.200.22 moved from 1c:53:f9:aa:b5:65 to f2:0f:ab:ac:aa:a9 on igc2
989.861585 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048
989.869836 [ 851] iflib_netmap_config       txr 4 rxr 4 txd 1024 rxd 1024 rbufsz 2048

And again for emulated (goes on and on until it eventually flaps wireless off):
012.906533 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
012.915326 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
012.923231 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
012.931527 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
012.940339 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
012.963658 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
012.974662 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
012.982483 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
021.006448 [1137] generic_netmap_attach     Emulated adapter for igc0 created (prev was igc0)
021.015312 [1034] generic_netmap_dtor       Native netmap adapter for igc0 restored
021.023196 [1042] generic_netmap_dtor       Emulated netmap adapter for igc0 destroyed
021.031676 [1137] generic_netmap_attach     Emulated adapter for igc0 created (prev was igc0)
021.040581 [ 320] generic_netmap_register   Emulated adapter for igc0 activated
021.083370 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
021.092225 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
021.100132 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
021.108489 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
021.117381 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
igc0: link state changed to UP
024.274716 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
024.282425 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
024.290272 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
038.032110 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
038.040849 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
038.048749 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
038.057062 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
038.065931 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
039.777730 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
039.785589 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
039.793627 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
053.197044 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
053.205799 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
053.213707 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
053.222033 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
053.230856 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
068.242712 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
068.250438 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
068.258245 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
081.388358 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
081.397402 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
081.405610 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
081.414240 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
081.423429 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
088.474600 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
088.482314 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
088.490121 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
102.524921 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
102.533710 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
102.541601 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
102.549912 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
102.558763 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
142.291684 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
142.301275 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
142.309289 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
155.734586 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
155.743387 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
155.751244 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
155.759608 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
155.768493 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
327.084679 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
327.092373 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
327.100229 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
341.229386 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
341.239156 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
341.247428 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
341.256522 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
341.266272 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
365.673498 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
365.681226 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
365.689054 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
379.343039 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
379.351809 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
379.359675 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
379.367996 [1137] generic_netmap_attach     Emulated adapter for igc2 created (prev was igc2)
379.376820 [ 320] generic_netmap_register   Emulated adapter for igc2 activated
667.362913 [ 295] generic_netmap_unregister Emulated adapter for igc0 deactivated
667.370615 [1034] generic_netmap_dtor       Native netmap adapter for igc0 restored
667.378426 [1042] generic_netmap_dtor       Emulated netmap adapter for igc0 destroyed
667.464266 [ 295] generic_netmap_unregister Emulated adapter for igc2 deactivated
667.475151 [1034] generic_netmap_dtor       Native netmap adapter for igc2 restored
667.482963 [1042] generic_netmap_dtor       Emulated netmap adapter for igc2 destroyed
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 20, 2023, 12:54:16 pm
It's really hard for anyone involved to follow if reports are being brought up multiple times that are outside of the test scope. Yes, I acknowledge that igc(4) is doing emulation on netmap, but the driver doing up/down dances is not a problem that the published patch can possibly address. The driver is relatively new and likely not maintained by Intel in FreeBSD, which can also lead to this problematic situation of being sub-par.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: optimus_prime on February 21, 2023, 07:00:19 am
Franco,

I've done a lot of testing and this fix works for me. Was restarting the previous version of opnsense every 1-2 days.

Just moved alot of data (300gb large files) and had no issues, i'll keep an eye on it over the next week. The thread getting hijacked doesn't help
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on February 23, 2023, 04:31:04 am
Hi Franco,

Before I dig into the issue, I want to thank you for the hard work you have been doing for the community.

My setup was working without issues, until I added a new VLAN into my opnsense setup with new routes (VLAN18). I have pasted the information below for troubleshooting as I am unable to fire up my zenarmor instance at the moment unless it is in passive mode which is the only thing I can do at the moment.

If you can provide some tips or ideas what I can try next that would be great. Side note I did try to do native, and emulated L3 netmap mode both failed, and also, I've tried to just monitor the igb1 interface only and not the VLAN interfaces and that also failed.

My Nic is the quad i350 Intel NIC fyi.

Below is the commands I ran for documentation purposes:

# opnsense-update -zkr 23.1.1-netmap
# opnsense-shell reboot

# dmesg | grep generic_netmap_register
864.793366 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
865.767213 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
867.483308 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
023.080385 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
023.109638 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
024.318984 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
024.496539 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
274.365244 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
274.377485 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
275.318691 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
474.486927 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
474.542792 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
475.248696 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
475.458719 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
268.825997 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
270.486516 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
270.798396 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
306.520468 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
307.940215 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
308.567320 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
887.830041 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
888.686489 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
889.171163 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
930.579711 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated



# uname -a
FreeBSD myhostname 13.1-RELEASE-p6 FreeBSD 13.1-RELEASE-p6 netmap-n250399-995512c8607 SMP amd64
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: beki on February 23, 2023, 01:46:52 pm
Hi kintaroju,

Did you disable Hardware VLAN Filtering?
https://www.zenarmor.com/docs/guides/disabling-hardware-offloading#disabling-hardware-offloading-on-opnsense

You may also send a bug report to the zenarmor team for further investigation of your issue.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: DoBoY on February 23, 2023, 02:39:45 pm
I have the exact same issue and my hardware offloading has been disabled from the start. even with new netmap my Vlan's stop responding after a few days and require a reboot. Zenarmor only works in passive mode for me as well..

I also have intel Nics i-226V
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on February 23, 2023, 06:16:13 pm
Hi kintaroju,

Did you disable Hardware VLAN Filtering?
https://www.zenarmor.com/docs/guides/disabling-hardware-offloading#disabling-hardware-offloading-on-opnsense

You may also send a bug report to the zenarmor team for further investigation of your issue.

Today I decided to see if there was a firmware upgrade for my NIC, which there wasn't, but on the odd note, I did let the system fully turn off, and turn on and now it works. The below output is below:

 # dmesg | grep generic_netmap_register
204.167012 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
204.367396 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
206.994495 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
207.424808 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated


So doing that little exercise changed the number of Emulated adapaters from like 20+ to just the 4. So not sure what is the cause here, lol.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: SpinningRust on February 23, 2023, 07:50:29 pm
Today I decided to see if there was a firmware upgrade for my NIC, which there wasn't, but on the odd note, I did let the system fully turn off, and turn on and now it works.

I know when I removed all vlans from my config to see if that was causing my netmap issues, it took a full reboot for the vlan interfaces to disappear from zenarmor. Perhaps it's that way with additions as well. It was removed from OPNsense config right away, but zenarmor continued to see the vlan interfaces until I rebooted.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on February 28, 2023, 03:09:39 pm
The latest kernel fixes another queue stall problem:

# opnsense-update -zkr 23.1.1-netmap2 && opnsense-shell reboot


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 01, 2023, 12:52:42 am
Hi Franco, thanks for that persistent work on this issue. I just upgraded my test router config, and mostly things work but on the UI I get an alert for this:

There were error(s) loading the rules: /tmp/rules.debug:63: cannot define table bogonsv6: Cannot allocate memory - The line in question reads [63]: table <bogonsv6> persist file "/usr/local/etc/bogonsv6"

Not sure if it is related to the kernel upgrade, but I don't recall seeing this error message

Also thanks again for your hard work!!
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on March 01, 2023, 07:49:45 am
Unrelated issue, you can check upper right corner for Firewall: Aliases... the indicator should be full.

If that's the case go to Firewall: Settings: Advanced and increase "Firewall Maximum Table Entries" until all your alias-generated entries fit into the memory.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 01, 2023, 05:57:21 pm
Unrelated issue, you can check upper right corner for Firewall: Aliases... the indicator should be full.

If that's the case go to Firewall: Settings: Advanced and increase "Firewall Maximum Table Entries" until all your alias-generated entries fit into the memory.


Cheers,
Franco

Seems to be the case, as it only indicates 2% of the entries are used lol, thanks again
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on March 02, 2023, 08:20:08 pm
That might be because it couldn't load the large batch as it would be over 100% then ;)


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Phiolin on March 06, 2023, 06:22:04 am
Updated to the new kernel yesterday and switched to Zenarmor emulated driver mode.
Unfortunately not even 24 hours later my Protectli VP2410 running Opnsense is completely unreachable via network, not only the Zenarmor protected interfaces, but also my separate interface on a management VLAN. Had to do a hard reboot to get it back online again, as currently I don't have serial console access at the location where it is installed.
For what it's worth, I was still able to get an IP via DHCP on the management interface, but couldn't access any services (web gui, SSH etc).
So possibly a hint that mainly TCP connections were affected.

At least in native driver mode the Zenarmor worker just crashes every 2-3 days and restarts automatically, so I only have a connection drop lasting a couple of seconds.
In emulated mode with the new kernel it doesn't really work longer than a few hours for me.

This is on a Protectli VP2410 with igb network interfaces, no virtualization, Opnsense installed directly on the hardware.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 07, 2023, 05:19:49 pm
Hi Franco, just noticed that 23.1.2 just got released and I upgraded recently. Unfortunately now my zenarmor isn't starting again :(. Just wondering if you had a new kernel that includes the updated netmap stuff by chance?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on March 07, 2023, 08:13:38 pm
@kintaroju: You can use an older kernel without any issue, but I'll prep a new one tomorrow. The bridge support for netmap was updated so I need to adjust the branch this is built on.

@Phiolin: thanks for the update! the generic patch is still in flux it seems and I'm expecting a new version this week, but not entirely sure this will happen depending on the challenge of the stalls given at the moment.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 07, 2023, 08:18:37 pm
@kintaroju: You can use an older kernel without any issue, but I'll prep a new one tomorrow. The bridge support for netmap was updated so I need to adjust the branch this is built on.

@Phiolin: thanks for the update! the generic patch is still in flux it seems and I'm expecting a new version this week, but not entirely sure this will happen depending on the challenge of the stalls given at the moment.


Cheers,
Franco

@Franco, thanks for the quick update on this, appreciate it. I'll keep a watchful eye for your new netmap kernel :D.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: andre2000 on March 09, 2023, 06:21:18 pm
I am trying to update the kernel, but get the following error message:

Code: [Select]
opnsense-update -zkr 23.1.2-netmap2 && opnsense-shell reboot
Fetching kernel-23.1.2-netmap2-amd64.txz: .......[fetch: https://mirror.dns-root.de/opnsense/FreeBSD:13:amd64/snapshots/sets/kernel-23.1.2-netmap2-amd64.txz.sig: No address record] failed, no signature found

OPNsense is on 23.1.2. I switched the mirror, rebooted but no change.

EDIT: above isn't the error I got before. Resolution works fine. The actual error is this (no signature found):

Fetching kernel-23.1.2-netmap2-amd64.txz: ..[fetch: https://mirror.dns-root.de/opnsense/FreeBSD:13:amd64/snapshots/sets/kernel-23.1.2-netmap2-amd64.txz.sig: Not Found] failed, no signature found

Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: andre2000 on March 09, 2023, 06:27:59 pm
okay interesting. on the first mirror the file was kernel-23.1.2-netmap2-amd64.txz, while on the second there seems to be an older version: kernel-23.1.2-netmap-amd64.txz
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 09, 2023, 06:34:17 pm
@kintaroju: You can use an older kernel without any issue, but I'll prep a new one tomorrow. The bridge support for netmap was updated so I need to adjust the branch this is built on.

@Phiolin: thanks for the update! the generic patch is still in flux it seems and I'm expecting a new version this week, but not entirely sure this will happen depending on the challenge of the stalls given at the moment.


Cheers,
Franco

Hi Franco,

Was going to downgrade the kernel today except I noticed the old kernel with your netmap kernel addition was missing. I did see the new 23.1.2-netmap version, tried it and unfortunately it produced different results where I am missing a netmap interface:

024.325300 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
024.645563 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
025.677618 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan4 activated

there should also be one for vlan18. When i tried to exclude vlan18 from the zenarmor protected interfaces it still doesn't start up.

So if you need anything on my end to help with the debugging let me know.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on March 09, 2023, 07:39:28 pm
I got heavily side-tracked since 23.1.2 came out... the current build with the latest FreeBSD review state is:

# opnsense-update -zkr 23.1.2-netmap
# opnsense-shell reboot

Notes:

1. kernel-23.1.2-netmap2-amd64.txz never existed. You mean kernel-23.1.1-netmap2-amd64.txz perhaps, which is obviously older than the current kernel-23.1.2-netmap-amd64.txz one.
2. The patch does nothing for which interfaces land in netmap mode. That is solely GUI configuration.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 09, 2023, 07:57:46 pm
I got heavily side-tracked since 23.1.2 came out... the current build with the latest FreeBSD review state is:

# opnsense-update -zkr 23.1.2-netmap
# opnsense-shell reboot

Notes:

1. kernel-23.1.2-netmap2-amd64.txz never existed. You mean kernel-23.1.1-netmap2-amd64.txz perhaps, which is obviously older than the current kernel-23.1.2-netmap-amd64.txz one.
2. The patch does nothing for which interfaces land in netmap mode. That is solely GUI configuration.


Cheers,
Franco

After disabling vlan hw accelerating and changing to protect individual vlan to just the main igb0 interface zenarmor is working again.

the only thing is now i have lots of registered netmap devices:

024.325300 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
024.645563 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
025.677618 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
087.352077 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
542.812852 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
544.525316 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan4 activated
620.148165 [ 321] generic_netmap_register   Emulated adapter for igb1_vlan10 activated

what would be the way to cleanup the above entries?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on March 09, 2023, 08:39:00 pm
I believe this requires a reboot for Zenarmor to cope.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 09, 2023, 10:26:29 pm
I believe this requires a reboot for Zenarmor to cope.


Cheers,
Franco

Hi Franco,

I've tried power cycling the system a few times and that didn't seem to help, any other ideas?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on March 09, 2023, 10:29:18 pm
Contact their support? I don't have much to go on.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 09, 2023, 11:14:09 pm
Contact their support? I don't have much to go on.


Cheers,
Franco

sounds good, i'll do that, and thanks again for the hard work, at least my zenarmor/surcata is basically functional again
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on March 23, 2023, 04:59:25 pm
@franco, had a quick question, when do you think the netmap kernel fix will be introduced to opnsense 23.x , just curious, thanks!
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on March 24, 2023, 09:11:31 am
It has been accepted and should be included in the FreeBSD tree by the end of last week I hope.

Once it's there we will also release it in 23.1.x.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: kintaroju on April 04, 2023, 12:42:26 am
Hi Franco,

Just installed the latest 23.1.5, and it seems to be awesome, no more weird netmap issues so far, thanks for the awesome work!
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Phiolin on April 05, 2023, 08:27:59 am
Not sure whether the Netmap-kernel is now part of 23.1.5 as it is not mentioned in the Changelog, so I'd assume that's still in the queue?

For me, all my Zenarmor issues remain.
In native mode with the igb Intel driver, Zenarmor will either stall or crash after 1-2 days, breaking all my inter-VLAN connections until Zenarmor is restarted.
In emulated mode, with or without the new netmap kernel, I'm seeing ever increasing MBUF usage until MBUF is topped out at 100% and practically everything just stops working.

So Zenarmor is currently more or less unusable. It used to be rock solid for me a year ago, not sure what has changed that led to it becoming the major issue of my network connectivity. Of course I have long-running support cases open with them, but it's not really moving forward either direction.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on April 06, 2023, 09:25:43 pm
The patch (including another mbuf leak fix) went into FreeBSD source tree now. I've built a version based on this patch:

# opnsense-update -zkr 23.1.5-netmap

It hasn't been released yet but will be released in either 23.1.6 or 23.1.7 depending on which will do a base/kernel patch round.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Phiolin on April 08, 2023, 09:40:23 pm
Thanks Franco. :)
I'll give that a try in emulated mode tomorrow to see if that fixes my mbuf issue.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Phiolin on April 10, 2023, 09:24:30 pm
I no longer see the mbuf leak with this version. Will keep running this one to see if there's any further issues.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on April 11, 2023, 10:10:03 am
Nice to hear. The current state via FreeBSD source tree has been merged to the stable branch. Almost there. :)


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: almodovaris on April 11, 2023, 07:14:12 pm
Now it gives me stable/23.1-n250429-c163ff33fa8.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on April 11, 2023, 08:56:58 pm
That would be 23.1.5-netmap2 I uploaded this morning for internal testing ;)

But that is the final state on the stable branch so better to test by anyone who can:

# opnsense-update -zkr 23.1.5-netmap2 && opnsense-shell reboot


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on April 13, 2023, 05:30:41 pm
Installed, working for now. I'll have to wait 2-3 days to see if it drops again or not.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: donatom3 on April 14, 2023, 05:01:28 pm
I'll have to restart later today but all my vlan interfaces are having problems with this update using both emulated and native in Zenarmor with ipv4, ipv6 seems to be fine.

Restarting individual services didn't help, only stopping zenarmor did. So I will just restart the whole router later tonight when I can
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: donatom3 on April 16, 2023, 06:15:01 am
So Vlans are working now, but I've had 2 complete loss of internet today since being on Zenarmor emulated with the new 23.1.5-netmap2. I wasn't home so I couldn't tell if it was ZenArmor or Suricata that was the issue.
Before the netmap upgrade in the past I would only lose internet inside and I could use Zenarmors cloud portal to restart it on my router and bring internet back up, or wireguard and get into my router to restart it. Now with the new netmap I'm losing control remotely, I can't use wireguard or Zenarmor's page to restart it.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on April 17, 2023, 05:50:21 pm
I'm up four days now after applying the patch (correctly, don't think I did the first time) and using netmap emulator for ZenArmor config. This is the longest I've been running since upgrading to 23.1. Things have been, dare I say it, stable? Fingers crossed. :)
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on April 17, 2023, 06:00:05 pm
We are looking for internal approval between participating parties on the last published state for 23.1.6. Overall it looks like we are better off with the patches than without and we likely won't get broader feedback otherwise. If not I expect 23.1.7 to have it in a few weeks.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on April 18, 2023, 05:23:35 pm
Thanks Franco.

Hate to report though that with all of the right things in place, I still dropped early this morning and had to restart ZenArmor to resolve it. It was a longer uptime duration this time, but it still wound up dropping packets on LAN. :/
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on April 19, 2023, 03:23:44 pm
Not much to be sorry about. The current state of the project is better than what we had before so either we ditch it or move forward. We are going to do the latter. ;)


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on April 20, 2023, 12:17:59 pm
I've unstickied this and removed the remaining test kernels since 23.1.6 has all of it. Feel free to respond here with enough information to continue discussion (log files, setup, previous experience).


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on April 23, 2023, 04:29:39 pm
[EDIT: Franco: Just realized you said it will be in 23.1.7, not .6. Nevermind!]

My connection still dropped. I don't know why. I was away when it happened and was able to bring it back up remotely. Restarting ZenArmor didn't help though, only a reboot. Currently have ZenArmor set to monitor only for now.

Just for my confirmation, I've upgraded to 23.1.6, and re: ZenArmor (when I take it out of monitoring only) I'm supposed to be using the emulated netmap driver not the native correct? Do I need to do anything with IDS/IPS/Suricata since I'm running that as well?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: beki on April 27, 2023, 09:48:42 am
Hi @dfw3xam1n3r
Did you test Zenarmor with Routed (L3 Mode, Reporting and Blocking available) with emulated netmap driver on OPNsense 23.1.6 and have any issues? Some users reported that their problems are resolved with this configuration.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: edsai on May 03, 2023, 06:30:12 pm
I tested with emulated and within a couple/few days the same thing happens. Interestingly the failure mode was the same but usually a restart of all services (using native) brings everything back up. When doing emulated, one of my vlan's didn't come back and I had errors in the console of the emulated netmap adapter interface being unavailable. A reboot brought everything back up.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on May 03, 2023, 07:49:40 pm
Hi @dfw3xam1n3r
Did you test Zenarmor with Routed (L3 Mode, Reporting and Blocking available) with emulated netmap driver on OPNsense 23.1.6 and have any issues? Some users reported that their problems are resolved with this configuration.

Yeah I did and the same thing happened, so I'm just in monitoring mode until 23.1.7 comes out.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on May 04, 2023, 01:50:25 pm
As said elsewhere whatever you expect for 23.1.7 is not in 23.1.7 because "it" does not exist.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on May 04, 2023, 09:03:23 pm
We are looking for internal approval between participating parties on the last published state for 23.1.6. Overall it looks like we are better off with the patches than without and we likely won't get broader feedback otherwise. If not I expect 23.1.7 to have it in a few weeks.

Based on this comment, I was thinking the patches for netmap issues were going to be a part of the 23.1.7 release. Guess that's not the case.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on May 04, 2023, 09:33:21 pm
Because they were part of 23.1.6 ;)


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on May 04, 2023, 09:39:42 pm
Ohhh. Geez man, I'm slow. Hmm, well I'm wondering then why I'm still getting these drop issues when I switch ZenArmor out of monitoring-only mode. :\ Oh well, thanks for the help.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on May 05, 2023, 09:30:25 am
No worries. 23.1.6 seems to work well enough as a base for future improvements. It may be easier to pinpoint cases such as yours now. We are currently collecting feedback to see if another netmap improvement round is viable...


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: dfw3xam1n3r on May 08, 2023, 03:33:30 pm
Understood.

Question on ZenArmor config: Do I need to use the Emulated driver for this or can I use Native?
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on May 08, 2023, 09:28:22 pm
This fix is only related to emulated mode. Native wasn't being worked on.


Cheers,
Franco
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: Arien on May 14, 2023, 11:45:18 am
Hi Franco.

Just a question related to netmap and his improvement.
Is anybody working to match netmap with PPPoE and trying to solve Suricata (netmap) IPS not working at all?

The way you told us "Zenarmor and OPNsense have been working with Klara to bring netmap improvements to FreeBSD" made me think its possible that someone is working on it.

https://forum.opnsense.org/index.php?topic=19740.msg92114#msg92114

Thanks for your time.
Cheers.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: almodovaris on May 18, 2023, 11:05:47 pm
There is some work still going on about kernels, but I don't know precisely what.
Title: Re: [CALL FOR TESTING] Netmap generic mode queue stall fixes
Post by: franco on May 19, 2023, 09:42:29 pm
The PPPoE/mpd5/Netgraph implementation which we currently use is likely never going to support Netmap for a number of technical reasons/challenges.

We worked on tun(4) previously which would enable the native PPPoE software to benefit from Netmap, but here also technical difficulties in the way tun(4) uses pseudo headers and no link layer headers poses a fundamental issue for Netmap integration since these packets cannot be shared as pure Ethernet packets because they never are. The tun(4) patch approach was abandoned during the project therefore.


Cheers,
Franco