Jumbo frames on axgbe / DEC7x0

Started by meyergru, July 16, 2022, 12:22:49 PM

Previous topic - Next topic
July 16, 2022, 12:22:49 PM Last Edit: July 16, 2022, 12:49:43 PM by meyergru
I tried to enable jumbo frames on ax0 today. I used 9000 as MTU, which was accepted (10000 seems to be out of range, which can be verified by ifconfig).

However, when I tried "ping -s 8972 -D xxxxx", the pings never went through. The highest I could manage was 4054, which indicates a real MTU of 4082. I tried two different targets, which among themselves can ping.

Is this a hardware limitation or a kernel/driver bug? If it is a hardware limitation, why does ifconfig not complain when such a big MTU is applied (i.e. why does the axgbe driver not bork)?
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Largest Jumbo frame is 9216. Now, I am not sure if you configured the same on both ends. I have no issues doing jumbo.

July 17, 2022, 12:12:29 PM #2 Last Edit: July 17, 2022, 12:34:36 PM by meyergru
As I wrote, I tried two counterparts I confirmed working with 9K among themselves, so I can rule those out.
Also, all devices are on the same switch, so I rule out the switch as well.

If you don't have problems with 9K, there is only two things that could be at fault: the DAC cable connecting my OpnSense to the switch or my OpnSense itself.
Matter-of-fact that specific DAC cable is a singleton - I have other DAC cables and 10GbE transceivers for all other devices.

So, I swapped DAC cables and guess what? No change. BTW: By doing this I also swapped ports on the switch, so it cannot be a defective switch port either.

I still can set 9K MTU, but everything beyond 4K gets discarded when I really try. When I ping from my OpnSense, I can even see that OpnSense emits the packets and that they get replied to by the counterpart (using tcpdump on the counterpart). When pinging from the counterpart, I see outgoing packets but no answers. Once the size gets too big, there is nothing to bee seen on OpnSense. Thus this seems to be a problem on OpnSense's receiving end.

4082 bytes is rather close to 4096, which may be one physical memory page, but I am only theorizing here.

Maybe different settings, like RSS or hardware offloading? Are you really sure your 9K MTU works?
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

how's this interface with Jumbo configured. I mean is it a trunk or routed interface.

if it's a routed interface, are you testing the ping point to point?

July 17, 2022, 03:07:39 PM #4 Last Edit: July 17, 2022, 03:41:14 PM by meyergru
We are talking about the LAN interface connected to a switch talking to another directly connected device.

It is a trunk in that there are three VLAN sub-interfaces, or what are you referring to?

BTW: I have an indication that it is a driver limitation on this specific implementation, as sysctl shows:

dev.ax.0.iflib.rxq2.rxq_fl0.buf_size: 4096
dev.ax.0.iflib.rxq1.rxq_fl0.buf_size: 4096
dev.ax.0.iflib.rxq0.rxq_fl0.buf_size: 4096

I was always wondering why there are only 3 RX queues. When you google axgbe, you will see this often:

ax0: <AMD 10 Gigabit Ethernet Driver> mem 0xef7e0000-0xef7fffff,0xef7c0000-0xef7dffff,0xef80e000-0xef80ffff irq 40 at device 0.4 on pci6
ax0: Using 512 TX descriptors and 512 RX descriptors
ax0: Using 4 RX queues 4 TX queues
ax0: Using MSI-X interrupts with 8 vectors
ax0: xgbe_phy_reset: no phydev


On the DEC750, there are only 3 RX queues and I have found no way of changing that. The bufsizes above are read-only as well.

Do you have a DEC750 or something else? I tried disabling hardware offloading to no avail, but not disabling RSS yet. Do you have it enabled?
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

So that may be... I have an A20 netboard and looks like you have a A10 which makes sense that your max MTU is 4K as opposed to 8K which is still considered Jumbo as anything over 15oo is considered jumbo.

What is your output when you issue 'sysctl -a | fgrep dev.ax.0.iflib' or 'netstat -m'?
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Here's the doc for the I210 NIC on A20 spec which states 9.5K size.

https://www.mouser.com/datasheet/2/612/i210_ethernet_controller_datasheet-257785.pdf

On pg 12 Table 1-3

Size of jumbo frames supported 9.5 KB

I'll post it later as I am out.

I was talking about the axgbe driver via SFP+, not the 1 GbE igb. As I said in my opening post:

Quote from: meyergru on July 16, 2022, 12:22:49 PM
I tried to enable jumbo frames on ax0 today.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

I know this is a pretty old topic, but I'd like to follow up on that topic. I have a DEC2752 firewall since last year, and I was experiencing the same issues with enabling jumbo frames on the 10GBit AMD network ports. I did some investigation on the internet, but could not find a solution. Then I started looking into the source code of the driver, and I believe that's were the cuplprit is to be found:

https://cgit.freebsd.org/src/tree/sys/dev/axgbe/xgbe-drv.c?id=f341e9bad3eb7438fe3ac5cff2e58df50f1a28e4#n129

int
xgbe_calc_rx_buf_size(struct ifnet *netdev, unsigned int mtu)
{
unsigned int rx_buf_size;

if (mtu > XGMAC_JUMBO_PACKET_MTU)
return (-EINVAL);

rx_buf_size = mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
rx_buf_size = min(max(rx_buf_size, XGBE_RX_MIN_BUF_SIZE), PAGE_SIZE);
rx_buf_size = (rx_buf_size + XGBE_RX_BUF_ALIGN - 1) &
    ~(XGBE_RX_BUF_ALIGN - 1);

return (rx_buf_size);
}

From my understanding, this function is used to calculate the buffer size for the receive buffer. The following line will actually limit the size of the receive buffer to the systems page size (which probably is 4k):

rx_buf_size = min(max(rx_buf_size, XGBE_RX_MIN_BUF_SIZE), PAGE_SIZE);

Now my question is, how to proceed from here? Should I open an issue with FreeBSD? Or would maybe someone Opnsense developer feel knowledgeable enough if that can be easily fixed. I am not a driver developer, and I suspect that it is not enough to remove the min(..., PAGE_SIZE), since I imagine that a continguous set of pages would need to be allocated for the hardware to transfer the data via DATA to the correct address.

It might be a driver glitch, but then again, it could as well be a physical limitation of the hardware. Some NICs use DMA to the host's RAM and maybe, there is no queue management on the ax devices. I know of NICs that actually can follow a linked list of buffers in memory. If the ax can only do contiguous DMA or has limited physical buffer space, this might not be cureable at all.

I did not try this any further, as I switched hardware.

You might get lucky if you find something on the internet, where somebody has tried this under Linux, where the limitation may not exist if it is only a driver issue.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Thanks for the quick answer. I did some research, and it seems that the AMD Ryzen V1500B is used in a couple of NAS devices from Synology and QNAP, which support 10GBit interfaces with jumbo frames of at least 9000 bytes. I could not find a test with an explicit test of the jumbo frame functionality (i.e. using a ping with appropriate packet sizes), but some people were using jumbo frames with 9000 bytes. At least with Synology, I trust that if one can configure 9000 bytes jumbo frames, then this will also work.

So to me, it currently looks like a driver issue. You have a good point with the possibility, that chaining multiple buffers could be a solution if the hardware supports it. I try to do some comparisons between the current Linux and FreeBSD code. If I understand it correctly, the FreeBSD code was ported over from Linux, so both versions should be comparable to some extent.

Not neccessarily, because kernel APIs sure are way different. They changed in Linux multiple times over the years. This is low-level stuff, so maybe if the driver really has been carried over, some shortcuts have been taken, of which using physical page size for buffers might be one logical candidate.

It now sure looks like it is not a hardware restriction, though, so you could file a bug report with FreeBSD...
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote from: Kaya on May 07, 2025, 08:59:23 PMNow my question is, how to proceed from here? Should I open an issue with FreeBSD?

Opening an issue with your findings on github opnsense/src would be a good first step.

I opened an issue with OPNsense (although the driver obviously is part of FreeBSD): https://github.com/opnsense/src/issues/251