[CALL FOR TESTING] Netmap generic mode queue stall fixes

Started by franco, January 27, 2023, 11:38:45 AM

Previous topic - Next topic
Hi kintaroju,

Did you disable Hardware VLAN Filtering?
https://www.zenarmor.com/docs/guides/disabling-hardware-offloading#disabling-hardware-offloading-on-opnsense

You may also send a bug report to the zenarmor team for further investigation of your issue.

I have the exact same issue and my hardware offloading has been disabled from the start. even with new netmap my Vlan's stop responding after a few days and require a reboot. Zenarmor only works in passive mode for me as well..

I also have intel Nics i-226V

Quote from: beki on February 23, 2023, 01:46:52 PM
Hi kintaroju,

Did you disable Hardware VLAN Filtering?
https://www.zenarmor.com/docs/guides/disabling-hardware-offloading#disabling-hardware-offloading-on-opnsense

You may also send a bug report to the zenarmor team for further investigation of your issue.

Today I decided to see if there was a firmware upgrade for my NIC, which there wasn't, but on the odd note, I did let the system fully turn off, and turn on and now it works. The below output is below:

# dmesg | grep generic_netmap_register
204.167012 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan10 activated
204.367396 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan16 activated
206.994495 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan18 activated
207.424808 [ 320] generic_netmap_register   Emulated adapter for igb1_vlan4 activated


So doing that little exercise changed the number of Emulated adapaters from like 20+ to just the 4. So not sure what is the cause here, lol.

Quote from: kintaroju on February 23, 2023, 06:16:13 PM
Today I decided to see if there was a firmware upgrade for my NIC, which there wasn't, but on the odd note, I did let the system fully turn off, and turn on and now it works.

I know when I removed all vlans from my config to see if that was causing my netmap issues, it took a full reboot for the vlan interfaces to disappear from zenarmor. Perhaps it's that way with additions as well. It was removed from OPNsense config right away, but zenarmor continued to see the vlan interfaces until I rebooted.

The latest kernel fixes another queue stall problem:

# opnsense-update -zkr 23.1.1-netmap2 && opnsense-shell reboot


Cheers,
Franco

Hi Franco, thanks for that persistent work on this issue. I just upgraded my test router config, and mostly things work but on the UI I get an alert for this:

There were error(s) loading the rules: /tmp/rules.debug:63: cannot define table bogonsv6: Cannot allocate memory - The line in question reads [63]: table <bogonsv6> persist file "/usr/local/etc/bogonsv6"

Not sure if it is related to the kernel upgrade, but I don't recall seeing this error message

Also thanks again for your hard work!!

Unrelated issue, you can check upper right corner for Firewall: Aliases... the indicator should be full.

If that's the case go to Firewall: Settings: Advanced and increase "Firewall Maximum Table Entries" until all your alias-generated entries fit into the memory.


Cheers,
Franco

Quote from: franco on March 01, 2023, 07:49:45 AM
Unrelated issue, you can check upper right corner for Firewall: Aliases... the indicator should be full.

If that's the case go to Firewall: Settings: Advanced and increase "Firewall Maximum Table Entries" until all your alias-generated entries fit into the memory.


Cheers,
Franco

Seems to be the case, as it only indicates 2% of the entries are used lol, thanks again

That might be because it couldn't load the large batch as it would be over 100% then ;)


Cheers,
Franco

Updated to the new kernel yesterday and switched to Zenarmor emulated driver mode.
Unfortunately not even 24 hours later my Protectli VP2410 running Opnsense is completely unreachable via network, not only the Zenarmor protected interfaces, but also my separate interface on a management VLAN. Had to do a hard reboot to get it back online again, as currently I don't have serial console access at the location where it is installed.
For what it's worth, I was still able to get an IP via DHCP on the management interface, but couldn't access any services (web gui, SSH etc).
So possibly a hint that mainly TCP connections were affected.

At least in native driver mode the Zenarmor worker just crashes every 2-3 days and restarts automatically, so I only have a connection drop lasting a couple of seconds.
In emulated mode with the new kernel it doesn't really work longer than a few hours for me.

This is on a Protectli VP2410 with igb network interfaces, no virtualization, Opnsense installed directly on the hardware.

Hi Franco, just noticed that 23.1.2 just got released and I upgraded recently. Unfortunately now my zenarmor isn't starting again :(. Just wondering if you had a new kernel that includes the updated netmap stuff by chance?

@kintaroju: You can use an older kernel without any issue, but I'll prep a new one tomorrow. The bridge support for netmap was updated so I need to adjust the branch this is built on.

@Phiolin: thanks for the update! the generic patch is still in flux it seems and I'm expecting a new version this week, but not entirely sure this will happen depending on the challenge of the stalls given at the moment.


Cheers,
Franco

Quote from: franco on March 07, 2023, 08:13:38 PM
@kintaroju: You can use an older kernel without any issue, but I'll prep a new one tomorrow. The bridge support for netmap was updated so I need to adjust the branch this is built on.

@Phiolin: thanks for the update! the generic patch is still in flux it seems and I'm expecting a new version this week, but not entirely sure this will happen depending on the challenge of the stalls given at the moment.


Cheers,
Franco

@Franco, thanks for the quick update on this, appreciate it. I'll keep a watchful eye for your new netmap kernel :D.

March 09, 2023, 06:21:18 PM #58 Last Edit: March 09, 2023, 06:25:52 PM by andre2000
I am trying to update the kernel, but get the following error message:

opnsense-update -zkr 23.1.2-netmap2 && opnsense-shell reboot
Fetching kernel-23.1.2-netmap2-amd64.txz: .......[fetch: https://mirror.dns-root.de/opnsense/FreeBSD:13:amd64/snapshots/sets/kernel-23.1.2-netmap2-amd64.txz.sig: No address record] failed, no signature found


OPNsense is on 23.1.2. I switched the mirror, rebooted but no change.

EDIT: above isn't the error I got before. Resolution works fine. The actual error is this (no signature found):

Fetching kernel-23.1.2-netmap2-amd64.txz: ..[fetch: https://mirror.dns-root.de/opnsense/FreeBSD:13:amd64/snapshots/sets/kernel-23.1.2-netmap2-amd64.txz.sig: Not Found] failed, no signature found


okay interesting. on the first mirror the file was kernel-23.1.2-netmap2-amd64.txz, while on the second there seems to be an older version: kernel-23.1.2-netmap-amd64.txz