mbufs in use continues to grow until router stops passing traffic

Started by trinitrotoluene, October 28, 2022, 07:38:44 PM

Previous topic - Next topic
We have been experiencing an issue with mbufs continuing to grow until a console message starts being displayed and the router stops passing traffic.

Console error:
[zone: mbuf] kern.ipc.nmbufs limit reached

Version:
OPNsense 22.7.6-amd64
FreeBSD 13.1-RELEASE-p2
OpenSSL 1.1.1q 5 Jul 2022

Hardware:
VMware vSphere VM - Hardware version 18. vmxnet3 adapters.

I have seen the forum posts discussing mbuf exhaustion as well as the pfsense documentation about kernel tuning to increase nmbuf cluster max.
https://docs.netgate.com/pfsense/en/latest/hardware/tune.html#mbuf-exhaustion

However, it seems we are not seeing mbuf cluster exhaustion as described in the docs, nor are we using any of the hardware described in the mbuf tuning posts we can see. mbuf cluster usage stays relatively low, while mbufs in use counter continues to grow.

I can reboot the VM, and the mbufs starts much lower, but throughout the day the current and total amounts will consistently grow. Depending on the RAM assigned to the VM, we may get away with a few days before it falls over and another reboot is needed to clear it up.

# netstat -m
1263244/3206/1266450 mbufs in use (current/cache/total)
2091/4005/6096/250999 mbuf clusters in use (current/cache/total/max)
2/1522 mbuf+clusters out of packet secondary zone in use (current/cache)
12391/3103/15494/125499 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/37185 9k jumbo clusters in use (current/cache/total/max)
0/0/0/20916 16k jumbo clusters in use (current/cache/total/max)
369564K/21223K/390787K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed


As you can see in the below screenshot, mbuf clusters is chugging along happily here, but as soon as we see requests for mbufs delayed, its toast.


Upon further research, I've come across a FreeBSD bug and FreeBSD forum post describing what looks to be an identical issue.

https://forums.freebsd.org/threads/mbufs-leaks.78480/
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254453

What I have been unable to determine until now is whether OPNsense uses the following options when compiling the kernel:

makeoptions WITH_EXTRA_TCP_STACKS=1
options TCPHPTS


What can I do to resolve this issue? I've only seen kernel tunables that address the mbuf clusters, not mbufs themselves. Even if I can tune it, it looks to be an issue that would eventually crop up or utilize a great deal of resources until the system itself becomes unusable.

We have set the tunables as mentioned here:
https://docs.netgate.com/pfsense/en/latest/hardware/tune.html#vmware-vmx-4-interfaces

However, it has only seemed to slow but not eliminate mbuf growth.


# netstat -m | grep "mbufs in use" && date
66334/3521/69855 mbufs in use (current/cache/total)
Fri Oct 28 21:23:56 MDT 2022
# netstat -m | grep "mbufs in use" && date
122059/5711/127770 mbufs in use (current/cache/total)
Sat Oct 29 07:03:14 MDT 2022
# netstat -m | grep "mbufs in use" && date
146075/3025/149100 mbufs in use (current/cache/total)
Sat Oct 29 14:44:08 MDT 2022
# netstat -m | grep "mbufs in use" && date
146063/3037/149100 mbufs in use (current/cache/total)
Sat Oct 29 14:48:59 MDT 2022
# netstat -m | grep "mbufs in use" && date
314413/3602/318015 mbufs in use (current/cache/total)
Mon Oct 31 07:58:36 MDT 2022

I'm also seeing a spew of these errors in the console.



I suspect there is some config in these firewalls causing this issue. I've rebuilt one of the 6 firewalls and while I've seen the mbufs grow after a reboot, they seem to stabilize at a far lower level than in the problem firewalls.

I still don't have a good explanation for this issue unfortunately. If you've found this post sometime in the future, dear reader, I'm sorry.

We do have reports of atypical weirdness with memory allocation on NUMA hardware. About VMware/vmxnet here I'm not sure, but it might be worth trying a different NIC driver.


Cheers,
Franco

Even after rebuilding I was seeing the same mbuf growth as before.

That's interesting @franco. I am very hesitant to replace my vmxnet3 adapters with an E1000 or other, as a virtualized NIC will always outperform an emulated NIC.

However, I did have 4 vCPU added, and with my configuration resulted in 2 vNUMA nodes being created. I adjusted down to 2 vCPU, with 2 cores per Socket, resulting in a single vNUMA node being presented. This has changed the number of mbufs in use directly after a reboot, I'll keep an eye on it and see if this eliminates the mbuf growth.

November 10, 2022, 08:14:43 PM #5 Last Edit: November 10, 2022, 08:16:58 PM by trinitrotoluene
Another interesting tidbit, when using only two NICs (WAN/LAN) I see very low mbuf usage and no sign of mbuf growth. However, with my devices with 3 NICs, I immediately see about 10x mbuf use after a fresh reboot. I also see the mbuf growth problem as described above.

Quote from: trinitrotoluene on November 10, 2022, 06:50:55 PM
However, I did have 4 vCPU added, and with my configuration resulted in 2 vNUMA nodes being created. I adjusted down to 2 vCPU, with 2 cores per Socket, resulting in a single vNUMA node being presented. This has changed the number of mbufs in use directly after a reboot, I'll keep an eye on it and see if this eliminates the mbuf growth.
No difference, and possibly worse than before. At this point I'm out of ideas. I'll try a pfsense build and see if that acts differently. As mentioned, I didn't try other adapters - if this won't work with vmxnet3 adapters, the consequences of switching to an emulated adapter for us would likely be worse.

I'm all ears for trying something else with opnsense though, I really don't want to use pfsense.