netmap_transmit error

Started by awptechnologies, February 23, 2025, 03:39:16 AM

Previous topic - Next topic
February 23, 2025, 03:39:16 AM Last Edit: February 23, 2025, 03:42:37 AM by awptechnologies
I use Intrusion detection both ids/ips on my lan interface bge0.

Under heavy load i get error netmap_transmit bge0 full hwcur 358 hwtail 24 qlen 333.

The three numbers change and usually occur in a sequence of 2.


Is this a bad thing or normal? Also is there certain tunables i can adjust to fix these errors.
I already have tried the dev.netmap.admode and haved tried all options 0 1 2 none seem to have effect other then 1 not allowing intrusion detection to start.
I also did dev.netmap.buf_size and upped it to 8192 instead of 2048 still get error.

This is an 8 core system that is running in a vm on proxmox. I use CPU affinity to dedicate 8 cores to opnsense and i also have vm.numa.disabled set to 0 so it can see the numa nodes since the cores 0-7 span across 2 numa nodes on the host. The network card is passed through and it is a broadcom netextreme.

Just want to know what tunables people are running to fix the issue and allow maximum throughput for opnsense.

I also used net.isr.maxthreads and set it to 8
net.isr.bindthreads and set it to 1
net.inet.rss.enabled and set it to 1
dev.bge.1.msi set to 1
dev.bge.0.msi set to 1
kern.ipc.soacceptqueue and set to 256 over the 128

Having same issue.

I tried these, seems less frequent but not resolved.

Original values
dev.netmap.buf_num: 163840
dev.netmap.ring_num: 200
dev.netmap.buf_size=2048

New Values
sysctl dev.netmap.buf_num=200000
sysctl dev.netmap.ring_num=256
sysctl dev.netmap.buf_size=4096

Are you using hyperscan in intrusion detection?

Also are these packets bypassing intrusion detection when buffer is full? what is the actual reason they are happening? Slow hardware? Bad Settings?

Quote from: awptechnologies on February 24, 2025, 01:29:19 AMAre you using hyperscan in intrusion detection?

Also are these packets bypassing intrusion detection when buffer is full? what is the actual reason they are happening? Slow hardware? Bad Settings?

I started to experience this one the latest update or at least it's noticeably worse causing my LAN interface to hang.

My hardware:
CPU: (4 cores, 1.50GHz)
RAM: 16GB (16947675136 bytes)
Cores: 4 (no Hyper-Threading)
NICs: Realtek Gigabit (re0 for WAN, re1 for LAN)
Current CPU Frequency: 1500MHz
Available Free Memory Pages: 2,356,511

I've tried these tweaks incrementally increasing them and rebooting to test. Any high load with IPS/IDS enabled with hyperscan/aho and aho ken steele, results in the LAN interface hanging.


THEN!!! I realised because I'm a dumb***.... when I re-imaged my FW, I forgot to reinstall the Realtek driver plugin :D

Not sure if OP might be having same/similar issue with missing NIC plugin?

I use a broadcom nic because it is built into my dell r630. as far as i can tell there is no plugin related to the driver i have which is bge. I think it must be included in freebsd by default.

I am also seeing this,  but it only happens for me when i limit cpu core boost speed for power savings.  when i set to full power this doesnt happen. So it seems to to be related to not enough cpu freq. i have 16 cores dedicated they are all running @ 1ghz.
ix1 full hwcur

March 03, 2025, 08:14:35 AM #6 Last Edit: March 03, 2025, 04:07:46 PM by franco
Yeah it basically means the ring buffer will be full quickly because too many packets are coming in vs. going out.


Cheers,
Franco

This is happening as well with ZA (no surprise).

Its indeed as Franco mentioned.
If there is too much packets at a given time interval, the queue that is used (by default the queue of the NIC 1024 usually) and the CPU is not able to empty the queue fast enough you will see this error which is more like a notification telling you queue is getting full. If a queue is full Tail Drop will happen.

What is interesting; this started to happen after the upgrade to 25.1 prior this upgrade this was not happening.
I am not sure if netmap had some changes.
The only thing that changed was the FreeBSD version. But not sure if its related.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

The answer is relatively simple. We no longer carry this patch https://github.com/opnsense/src/commit/36fb07bfef7d38906403a28fb2c613712eb6baa4 because it's not in FreeBSD. Functionally it's the same as before with the message or without it.

QuoteAlso mutes a spammy message.  Bravely going where no man has gone before.  :)

hahaha this made my day


Personally I like to see that message, because now I have an exact timestamp when I see performance hit on the network. I was always aware of the potential limitation when using ZA + netmap. But now when I see a message with time stamp during an issue I am 100% sure what caused it.

For me this is an QoL improvement ;)

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

In the early days I think this wasn't even rate limited, but I could be wrong. It was pretty annoying in the beginning.


Cheers,
Franco