MBUF grows and grows and grows ...

Started by BSAfH42, March 14, 2023, 02:36:18 PM

Previous topic - Next topic
Hi,

since upgrading to 23.1.x (currently 23.1.3), the MBUF usage grows and grows and grows ...
... until there is no connection on the LAN interface anymore (ping returns "No buffers pace available") and I have to reboot the firewall

This happens 1-3 times a day - which is rather bad when it happens during e.g. a video conference call

I have a 6 port Intel card, 3 interfaces are used
I have already set
kern.ipc.nmbclusters="1000000"
according to https://docs.netgate.com/pfsense/en/latest/hardware/tune.html#intel-igb-4-and-em-4-cards
but this only increases the time until the next reboot, it does not solve the issue



root@OPNsense:~ # netstat -m
390898/8267/399165 mbufs in use (current/cache/total)
169898/2908/172806/1000000 mbuf clusters in use (current/cache/total/max)
312/2823 mbuf+clusters out of packet secondary zone in use (current/cache)
12990/1172/14162/504409 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/149454 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84068 16k jumbo clusters in use (current/cache/total/max)
489528K/12570K/502099K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
root@OPNsense:~ #


hw-probe says:

root@OPNsense:~ # less /root/HW_PROBE/LATEST/hw.info/
devices  host     logs/
root@OPNsense:~ # less /root/HW_PROBE/LATEST/hw.info/devices
pci:8086-0284-8086-7270;06-01-00;detected;bridge;isab;Intel Corporation;Comet Lake PCH-LP LPC Premium Controller/eSPI Controller
pci:8086-02a3-8086-7270;0c-05-00;detected;smbus;ichsmb;Intel Corporation;Comet Lake PCH-LP SMBus Host Controller
pci:8086-02a4-8086-7270;0c-80-00;detected;serial bus controller;;Intel Corporation;Comet Lake SPI (flash) Controller
pci:8086-02b0-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;Comet Lake PCI Express Root Port
pci:8086-02b1-8086-7270;06-04-00;works;bridge;pcib;Intel Corporation;Comet Lake PCI Express Root Port
pci:8086-02bc-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;Comet Lake PCI Express Root Port
pci:8086-02bd-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;
pci:8086-02be-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;
pci:8086-02bf-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;
pci:8086-02c8-8086-7270;04-03-00;detected;sound;hdac;Intel Corporation;Comet Lake PCH-LP cAVS
pci:8086-02d3-8086-7270;01-06-01;works;storage;ahci;Intel Corporation;Comet Lake SATA AHCI Controller
pci:8086-02e0-8086-7270;07-80-00;failed;communication controller;;Intel Corporation;Comet Lake Management Engine Interface
pci:8086-02ed-8086-7270;0c-03-30;detected;usb controller;xhci;Intel Corporation;Comet Lake PCH-LP USB 3.1 xHCI Host Controller
pci:8086-02ef-8086-7270;05-00-00;detected;ram memory;;Intel Corporation;Comet Lake PCH-LP Shared SRAM
pci:8086-02f9-8086-7270;11-80-00;detected;signal processing;pchtherm;Intel Corporation;Comet Lake Thermal Subsytem
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1911-8086-7270;08-80-00;detected;system peripheral;;Intel Corporation;Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
pci:8086-9b41-8086-2212;03-00-00;detected;graphics card;vgapci;Intel Corporation;CometLake-U GT2 [UHD Graphics]
pci:8086-9b61-8086-7270;06-00-00;detected;bridge;hostb;Intel Corporation;Comet Lake-U v1 4c Host Bridge/DRAM Controller
usb:13ba-0018;03-01-01;detected;keyboard;ukbd;;Barcode Reader
usb:13d3-3273;ff-ff-ff;detected;network;run;Ralink;802.11 n WLAN 1.0
usb:1d6b-0003;09-00-00;detected;hub;uhub;BSD;XHCI root HUB
usb:214b-7250;09-00-00;detected;hub;uhub;;USB2.0 HUB
0;;detected;;;;
bios:american-megatrends-5-17-11-02-2020;;works;bios;;American Megatrends Inc.;BIOS 5.17 11/02/2020
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
ide:shiji-ssd-256gb-serial-07b8eec309032d7077329907f90c7c8a;;works;disk;ada, ahcich;ShiJi;SSD 256GB
mem:samsung-m471a2k43cb1-ctd-sodimm-serial-824723b6567166c3aa17e40bd3c574f9;;works;memory;;Samsung;RAM M471A2K43CB1-CTD 16GB SODIMM DDR4 2667MT/s
ps/2:keyboard;;detected;keyboard;atkbdc;;AT Keyboard
root@OPNsense:~ # catoot/HW_PROBE/LATEST/hw.info/devices
pci:8086-0284-8086-7270;06-01-00;detected;bridge;isab;Intel Corporation;Comet Lake PCH-LP LPC Premium Controller/eSPI Controller
pci:8086-02a3-8086-7270;0c-05-00;detected;smbus;ichsmb;Intel Corporation;Comet Lake PCH-LP SMBus Host Controller
pci:8086-02a4-8086-7270;0c-80-00;detected;serial bus controller;;Intel Corporation;Comet Lake SPI (flash) Controller
pci:8086-02b0-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;Comet Lake PCI Express Root Port
pci:8086-02b1-8086-7270;06-04-00;works;bridge;pcib;Intel Corporation;Comet Lake PCI Express Root Port
pci:8086-02bc-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;Comet Lake PCI Express Root Port
pci:8086-02bd-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;
pci:8086-02be-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;
pci:8086-02bf-8086-7270;06-04-00;detected;bridge;pcib;Intel Corporation;
pci:8086-02c8-8086-7270;04-03-00;detected;sound;hdac;Intel Corporation;Comet Lake PCH-LP cAVS
pci:8086-02d3-8086-7270;01-06-01;works;storage;ahci;Intel Corporation;Comet Lake SATA AHCI Controller
pci:8086-02e0-8086-7270;07-80-00;failed;communication controller;;Intel Corporation;Comet Lake Management Engine Interface
pci:8086-02ed-8086-7270;0c-03-30;detected;usb controller;xhci;Intel Corporation;Comet Lake PCH-LP USB 3.1 xHCI Host Controller
pci:8086-02ef-8086-7270;05-00-00;detected;ram memory;;Intel Corporation;Comet Lake PCH-LP Shared SRAM
pci:8086-02f9-8086-7270;11-80-00;detected;signal processing;pchtherm;Intel Corporation;Comet Lake Thermal Subsytem
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1533;02-00-00;works;network;igb;Intel Corporation;I210 Gigabit Network Connection
pci:8086-1911-8086-7270;08-80-00;detected;system peripheral;;Intel Corporation;Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
pci:8086-9b41-8086-2212;03-00-00;detected;graphics card;vgapci;Intel Corporation;CometLake-U GT2 [UHD Graphics]
pci:8086-9b61-8086-7270;06-00-00;detected;bridge;hostb;Intel Corporation;Comet Lake-U v1 4c Host Bridge/DRAM Controller
usb:13ba-0018;03-01-01;detected;keyboard;ukbd;;Barcode Reader
usb:13d3-3273;ff-ff-ff;detected;network;run;Ralink;802.11 n WLAN 1.0
usb:1d6b-0003;09-00-00;detected;hub;uhub;BSD;XHCI root HUB
usb:214b-7250;09-00-00;detected;hub;uhub;;USB2.0 HUB
0;;detected;;;;
bios:american-megatrends-5-17-11-02-2020;;works;bios;;American Megatrends Inc.;BIOS 5.17 11/02/2020
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
cpu:intel-6-142-12-core-i7-10510u;;works;cpu;;Intel;Core i7-10510U CPU @ 1.80GHz
ide:shiji-ssd-256gb-serial-07b8eec309032d7077329907f90c7c8a;;works;disk;ada, ahcich;ShiJi;SSD 256GB
mem:samsung-m471a2k43cb1-ctd-sodimm-serial-824723b6567166c3aa17e40bd3c574f9;;works;memory;;Samsung;RAM M471A2K43CB1-CTD 16GB SODIMM DDR4 2667MT/s
ps/2:keyboard;;detected;keyboard;atkbdc;;AT Keyboard
root@OPNsense:~ #


I tried to switch off
ntopng
suricata
zenarmor
but that doesn't change anything.

any ideas?


March 14, 2023, 03:45:25 PM #1 Last Edit: March 14, 2023, 08:45:17 PM by wstemb
Had only once a similar event on 22.7.?,  few days before the publication of 23.1. The Zenarmor was configured in mode: Routed Mode (L3 Mode, Reporting + Blocking) with emulated netmap driver.

As first aid, reconfigured Zenarmor to passive mode, until I reassigned the server interfaces, so all the interfaces under control of Zenarmor were supported ones (igb)

After that, I reconfigured the Zenarmor to: Routed Mode (L3 Mode, Reporting + Blocking) with native netmap driver.

During the tests, I also disabled flow control on all NIC, in System: Settings: Tunables

No problem since these changes.

I know I made a admin mistake by doing two changes at once, but I couldn't afford to have the system not working intermittently.

Walter

March 15, 2023, 11:09:37 AM #2 Last Edit: March 15, 2023, 11:15:44 AM by BSAfH42
I'll try that.

I had it running on L3, blocking "emulated driver", because Zenarmor was enabled on a run0_wlan1 interface and one OpenVPN interface.

Native driver can't be used on those, I presume.

Those interface have to live under suricata then.

For some unknown reason, Zenarmor offers only igb1 und igb2 (two internal networks) in the select dialogue, not igb1 (WAN).
So I'll enable Zenarmor only on those two interfaces (igb0 und igb2).

which settings did you disable specifically in tunables?
There is nothing labeled anything like flow control?

at a first glance

dev.igb.X.fc = 0 seems to fix the MBUF issue

as a sidenote:

dev.igb.X.eee_control = 0

stops the interface from working at all, it has to be set to 1, at least with my hardware

Yes, for igb it is:

dev.igb.x.fc = 0, with x=0, 1... for every igb you have.

The other option I am not using. 

Other types of NIC have other different settings.

I made the interface reassignment, because the last config change before the mbuf problem I made on a stable FW was on zenarmor

I am not using eee_control, no need for now, I hope :-)



March 15, 2023, 07:25:54 PM #5 Last Edit: March 16, 2023, 09:01:18 AM by BSAfH42
eee_control is on by default, at least in my hardware

setting dev.igb.X.fc on any of the 6 available interfaces results in automagically setting it on all interfaces (all on one card)

but I spoke too early

half an hour after my previous post, with eee_control=1 (default) an fc=0 on all interfaces, all interfaces suddenly became unrepsonsive

after resetting them to fc=3 using

sysctl dev.igb.0.fc=3 and cycling through a few ifconfig down; ifconfig up they started working again.

so basically I'm back to where I started: eee_control=1 and fc=3

but - at least for the time being, MUBUF stays low and does not grow any more.

Moreover, swap usage ist suddenly down (had bee very high for a very long time).


Versions OPNsense 23.1.3-amd64
FreeBSD 13.1-RELEASE-p7
OpenSSL 1.1.1t 7 Feb 2023
Updates Click to check for updates.
CPU type Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz (4 cores, 8 threads)
CPU usage
Load average 3.17, 2.34, 2.02
Uptime 00:57:17
Current date/time Wed Mar 15 19:18:51 CET 2023
Last config change Wed Mar 15 19:00:53 CET 2023
CPU usage
55 %
State table size
13 % ( 225460/1622000 )
MBUF usage
1 % ( 14804/1000000 )
Memory usage
69 % ( 11272/16220 MB )
SWAP usage
5 % ( 433/8191 MB )
21 % ( 438/2048 MB )
1 % ( 438/32768 MB )
Disk usage



I have no idea whether this will come up again after reboot - it's been a long day and I'll just keep it running for the time being


[cbadmin@OPNsense ~]$ netstat -m
23772/10518/34290 mbufs in use (current/cache/total)
8280/4744/13024/1000000 mbuf clusters in use (current/cache/total/max)
89/2959 mbuf+clusters out of packet secondary zone in use (current/cache)
2635/2793/5428/504409 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/149454 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84068 16k jumbo clusters in use (current/cache/total/max)
33047K/23289K/56337K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
[cbadmin@OPNsense ~]$



[cbadmin@OPNsense ~]$ sysctl -a | grep igb.0.fc
dev.igb.0.fc_low_water: 32752
dev.igb.0.fc_high_water: 32768
dev.igb.0.fc: 3
[cbadmin@OPNsense ~]$ sysctl -a | grep igb.1.fc
dev.igb.1.fc_low_water: 32752
dev.igb.1.fc_high_water: 32768
dev.igb.1.fc: 3
[cbadmin@OPNsense ~]$



[cbadmin@OPNsense ~]$ sysctl -a | grep igb.1.eee_control
dev.igb.1.eee_control: 1
[cbadmin@OPNsense ~]$


it's a bit spooky, I have to admit :-(