I tried to enable jumbo frames on ax0 today. I used 9000 as MTU, which was accepted (10000 seems to be out of range, which can be verified by ifconfig).
However, when I tried "ping -s 8972 -D xxxxx", the pings never went through. The highest I could manage was 4054, which indicates a real MTU of 4082. I tried two different targets, which among themselves can ping.
Is this a hardware limitation or a kernel/driver bug? If it is a hardware limitation, why does ifconfig not complain when such a big MTU is applied (i.e. why does the axgbe driver not bork)?
Largest Jumbo frame is 9216. Now, I am not sure if you configured the same on both ends. I have no issues doing jumbo.
As I wrote, I tried two counterparts I confirmed working with 9K among themselves, so I can rule those out.
Also, all devices are on the same switch, so I rule out the switch as well.
If you don't have problems with 9K, there is only two things that could be at fault: the DAC cable connecting my OpnSense to the switch or my OpnSense itself.
Matter-of-fact that specific DAC cable is a singleton - I have other DAC cables and 10GbE transceivers for all other devices.
So, I swapped DAC cables and guess what? No change. BTW: By doing this I also swapped ports on the switch, so it cannot be a defective switch port either.
I still can set 9K MTU, but everything beyond 4K gets discarded when I really try. When I ping from my OpnSense, I can even see that OpnSense emits the packets and that they get replied to by the counterpart (using tcpdump on the counterpart). When pinging from the counterpart, I see outgoing packets but no answers. Once the size gets too big, there is nothing to bee seen on OpnSense. Thus this seems to be a problem on OpnSense's receiving end.
4082 bytes is rather close to 4096, which may be one physical memory page, but I am only theorizing here.
Maybe different settings, like RSS or hardware offloading? Are you really sure your 9K MTU works?
how's this interface with Jumbo configured. I mean is it a trunk or routed interface.
if it's a routed interface, are you testing the ping point to point?
We are talking about the LAN interface connected to a switch talking to another directly connected device.
It is a trunk in that there are three VLAN sub-interfaces, or what are you referring to?
BTW: I have an indication that it is a driver limitation on this specific implementation, as sysctl shows:
dev.ax.0.iflib.rxq2.rxq_fl0.buf_size: 4096
dev.ax.0.iflib.rxq1.rxq_fl0.buf_size: 4096
dev.ax.0.iflib.rxq0.rxq_fl0.buf_size: 4096
I was always wondering why there are only 3 RX queues. When you google axgbe, you will see this often:
ax0: <AMD 10 Gigabit Ethernet Driver> mem 0xef7e0000-0xef7fffff,0xef7c0000-0xef7dffff,0xef80e000-0xef80ffff irq 40 at device 0.4 on pci6
ax0: Using 512 TX descriptors and 512 RX descriptors
ax0: Using 4 RX queues 4 TX queues
ax0: Using MSI-X interrupts with 8 vectors
ax0: xgbe_phy_reset: no phydev
On the DEC750, there are only 3 RX queues and I have found no way of changing that. The bufsizes above are read-only as well.
Do you have a DEC750 or something else? I tried disabling hardware offloading to no avail, but not disabling RSS yet. Do you have it enabled?
So that may be... I have an A20 netboard and looks like you have a A10 which makes sense that your max MTU is 4K as opposed to 8K which is still considered Jumbo as anything over 15oo is considered jumbo.
What is your output when you issue 'sysctl -a | fgrep dev.ax.0.iflib' or 'netstat -m'?
Here's the doc for the I210 NIC on A20 spec which states 9.5K size.
https://www.mouser.com/datasheet/2/612/i210_ethernet_controller_datasheet-257785.pdf
On pg 12 Table 1-3
Size of jumbo frames supported 9.5 KB
I'll post it later as I am out.
I was talking about the axgbe driver via SFP+, not the 1 GbE igb. As I said in my opening post:
Quote from: meyergru on July 16, 2022, 12:22:49 PM
I tried to enable jumbo frames on ax0 today.
I know this is a pretty old topic, but I'd like to follow up on that topic. I have a DEC2752 firewall since last year, and I was experiencing the same issues with enabling jumbo frames on the 10GBit AMD network ports. I did some investigation on the internet, but could not find a solution. Then I started looking into the source code of the driver, and I believe that's were the cuplprit is to be found:
https://cgit.freebsd.org/src/tree/sys/dev/axgbe/xgbe-drv.c?id=f341e9bad3eb7438fe3ac5cff2e58df50f1a28e4#n129
int
xgbe_calc_rx_buf_size(struct ifnet *netdev, unsigned int mtu)
{
unsigned int rx_buf_size;
if (mtu > XGMAC_JUMBO_PACKET_MTU)
return (-EINVAL);
rx_buf_size = mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
rx_buf_size = min(max(rx_buf_size, XGBE_RX_MIN_BUF_SIZE), PAGE_SIZE);
rx_buf_size = (rx_buf_size + XGBE_RX_BUF_ALIGN - 1) &
~(XGBE_RX_BUF_ALIGN - 1);
return (rx_buf_size);
}
From my understanding, this function is used to calculate the buffer size for the receive buffer. The following line will actually limit the size of the receive buffer to the systems page size (which probably is 4k):
rx_buf_size = min(max(rx_buf_size, XGBE_RX_MIN_BUF_SIZE), PAGE_SIZE);
Now my question is, how to proceed from here? Should I open an issue with FreeBSD? Or would maybe someone Opnsense developer feel knowledgeable enough if that can be easily fixed. I am not a driver developer, and I suspect that it is not enough to remove the min(..., PAGE_SIZE), since I imagine that a continguous set of pages would need to be allocated for the hardware to transfer the data via DATA to the correct address.
It might be a driver glitch, but then again, it could as well be a physical limitation of the hardware. Some NICs use DMA to the host's RAM and maybe, there is no queue management on the ax devices. I know of NICs that actually can follow a linked list of buffers in memory. If the ax can only do contiguous DMA or has limited physical buffer space, this might not be cureable at all.
I did not try this any further, as I switched hardware.
You might get lucky if you find something on the internet, where somebody has tried this under Linux, where the limitation may not exist if it is only a driver issue.
Thanks for the quick answer. I did some research, and it seems that the AMD Ryzen V1500B is used in a couple of NAS devices from Synology and QNAP, which support 10GBit interfaces with jumbo frames of at least 9000 bytes. I could not find a test with an explicit test of the jumbo frame functionality (i.e. using a ping with appropriate packet sizes), but some people were using jumbo frames with 9000 bytes. At least with Synology, I trust that if one can configure 9000 bytes jumbo frames, then this will also work.
So to me, it currently looks like a driver issue. You have a good point with the possibility, that chaining multiple buffers could be a solution if the hardware supports it. I try to do some comparisons between the current Linux and FreeBSD code. If I understand it correctly, the FreeBSD code was ported over from Linux, so both versions should be comparable to some extent.
Not neccessarily, because kernel APIs sure are way different. They changed in Linux multiple times over the years. This is low-level stuff, so maybe if the driver really has been carried over, some shortcuts have been taken, of which using physical page size for buffers might be one logical candidate.
It now sure looks like it is not a hardware restriction, though, so you could file a bug report with FreeBSD...
Quote from: Kaya on May 07, 2025, 08:59:23 PMNow my question is, how to proceed from here? Should I open an issue with FreeBSD?
Opening an issue with your findings on github opnsense/src would be a good first step.
I opened an issue with OPNsense (although the driver obviously is part of FreeBSD): https://github.com/opnsense/src/issues/251
With FreeBSD 14.3beta2 out, if this gets an actual fix I'd be surprised if it makes it prior to 14.4 upstream.
( With OPNsense that would most likely be included in the next dot release that has a kernel update and at least one test kernel before that )
Much less so if nobody opens an issue on FreeBSD, where it belongs.
Out of curiosity I restarted my DEC740 with VyOS (on a stick) and in that setup I was able to set the MTU to 9000 (eth3 below) and ping was successful. No hardware restriction, software only.
The setup was: DEC740 (eth3, 10.66.6.1) - MikroTik Switch CSS610-8P-2S+IN - QNAP QNA-UC5G1T (5Gbit USB NIC, 10.66.6.2)
vyos@vyos# run sh inter ethernet eth3
eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
link/ether f4:90:ea:00:73:63 brd ff:ff:ff:ff:ff:ff
altname enp6s0f1
inet 10.66.6.1/24 brd 10.66.6.255 scope global eth3
valid_lft forever preferred_lft forever
inet6 fe80::f690:eaff:fe00:7363/64 scope link
valid_lft forever preferred_lft forever
RX: bytes packets errors dropped overrun mcast
226837 366 0 0 0 109
TX: bytes packets errors dropped carrier collisions
160576 88 0 0 0 0
vyos@vyos# ping -M do -s 8972 -c 4 10.66.6.2
PING 10.66.6.2 (10.66.6.2) 8972(9000) bytes of data.
8980 bytes from 10.66.6.2: icmp_seq=1 ttl=64 time=1.13 ms
8980 bytes from 10.66.6.2: icmp_seq=2 ttl=64 time=1.29 ms
8980 bytes from 10.66.6.2: icmp_seq=3 ttl=64 time=1.42 ms
8980 bytes from 10.66.6.2: icmp_seq=4 ttl=64 time=1.23 ms
Trying the same with FreeBSD 15-CURRENT 2025-05-08 showed the same issue as with OPNsense.
Btw: Linux also reports 3 rx and 3 tx queues, seems to be implemented that way in hardware.
Thanks for your definitive validation of the hardware capabilities @patient0. Since I only have access to one device, which is in active use as a firewall, I can't do such experiments.
Quote from: Kaya on May 12, 2025, 05:57:15 PMThanks for your definitive validation of the hardware capabilities @patient0. Since I only have access to one device, which is in active use as a firewall, I can't do such experiments.
No worries, I do like trying stuff like that out.
And while I only have one DEC740 I do have multiple replacements since my home network is pretty simple and there's nobody to complain when I take it down :).
A test kernel has been provided on Github by Stephan
opnsense-update -zkr 25.1.6-axgbe
Yes, that is great news. I will give the patched kernel a try in the next couple of days.
I am running OPNsense Business 25.4. I assume that the patched kernel version (which obviously refers to 25.1) should not create any hiccups because of different OPNsense versions, right?
No should be all good this time. In case of an issue booting the older kernel will get you back to 25.4.
Snapshots are also an option but may be overkill for this particular issue.
For those, who are interested, but not following the GitHub issue:
I installed the patched kernel, rebooted, changed MTU to 9000 and performed a ping test again. Initially, results looked great, ping was working with a payload size of 8192 bytes. But after some more experiments, the OPNsense crashed and rebooted. Since the box was rock solid before, I assume that the crash is related to the large MTU.
Therefore, for the time being, I reverted the MTU to 1500 bytes.
For me it does crash right away if I transfer data from ax0 to ax1 with MTU 9000 (see attached minicom log).
My test setup:
- Hardware: DEC740
- OPNsense Version: 25.1-amd64 + 25.1.6-axgbe kernel
- Running of an USB stick (compiled with opnsense/tools, `make vm-raw,8G,never,nano`)
## Network Setup On DEC740 ##
WAN | igb0 : DHCP client | MTU 1500
LAN | idb2 : 192.168.1.1/24 | MTU 1500
CLIENT_LAN | ax0 : 10.199.198.1/24 | MTU 9000
STORAGE_LAN : ax1 : 172.31.30.1/24 | MTU 9000
CLIENT_LAN -> HP Elite 600 | MTU 9000
STORAGE_LAN -> TrueNAS | MTU 9000
On CLIENT_LAN direct attached is a HP Elite 600 with a Mellanox NIC, MTU set to 9000
On STORAGE_LAN direct attached is a TrueNAS Scale (QNAP TS-473A) with an Intel X520 NIC, MTU set to 9000
- Ping works from the HP to OPnsense -> OK
- Mounting a SMB share from the STORAGE_LAN TrueNAS on the CLIENT_LAN HP client -> OK
- Running `dd if=/dev/zero of=<SMB mount>/whatever.file bs=1M count=100` on the CLIENT_LAN HP client crashes OPNsense
- Running the above dd command with MTU set to 1500 on the HP client runs fine