Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Crate2729

#1
I bought an Intel X710-DA2 to replace the X520-DA2 as final resolution after tuning everything possible.
And voila, the issue is gone!  8) 
All Android devices are now able to update all the previously stucked apps, and the YT videos previously unplayable became playable again.
WOW, a massive headache now gone!

Why X710? It uses the i40e driver and it's PCIe Gen3, while the X520 used ixgbe and it's PCIe Gen2, and I wanted something that is different in both hardware (the X710 is a lot newer) and in driver as well. I don't know what the original issue was but I wanted to solve it once and for good. Maybe this was an issue with the HTTP/3 UDP-based QUIC protocol on the old hardware with an unlucky combination of host and guest kernels? We'll never know, but it's now solved at last.

If anybody else faces the issue, here are the details of the cards for reference:

X520-DA2

lspci

2d:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
        Subsystem: Intel Corporation 10GbE 2P X520 Adapter
        Kernel driver in use: ixgbe
        Kernel modules: ixgbe


lshw

       description: Ethernet interface
       product: Ethernet 10G 2P X520 Adapter
       vendor: Intel Corporation
       physical id: 0.1
       bus info: pci@0000:2d:00.1
       logical name: enp45s0f1
       version: 01
       size: 10Gbit/s
       capacity: 10Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 10000bt-fd
       configuration: autonegotiation=off broadcast=yes driver=ixgbe driverversion=6.5.11-8-pve duplex=full
firmware=0x8000042f latency=0 link=yes multicast=yes port=fibre speed=10Gbit/s
       resources: irq:204 memory:fbd80000-fbdfffff ioport:f000(size=32) memory:fbf00000-fbf03fff
memory:fbd00000-fbd7ffff memory:c0200000-c02fffff memory:c0300000-c03fffff


X710-DA2

lspci

2d:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
        Subsystem: Intel Corporation Ethernet 10G 2P X710 Adapter
        Kernel driver in use: i40e
        Kernel modules: i40e


lshw

       description: Ethernet interface
       product: Ethernet Controller X710 for 10GbE SFP+
       vendor: Intel Corporation
       physical id: 0.1
       bus info: pci@0000:2d:00.1
       logical name: enp45s0f1
       version: 02
       size: 10Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre autonegotiation
       configuration: autonegotiation=off broadcast=yes driver=i40e driverversion=6.5.11-8-pve duplex=full
firmware=6.80 0x80003d72 18.8.9 latency=0 link=yes multicast=yes port=fibre speed=10Gbit/s
       resources: irq:76 memory:f8000000-f8ffffff memory:fa000000-fa007fff memory:fcd00000-fcd7ffff
#2
I did another test with a 4G LTE USB modem I have as a backup WAN (ZTE MF79U), and well, all Android Play Store updates work  ::)  The whole USB device is passed through from Proxmox to OPNsense, and it's mapped to an interface/gateway as an Ethernet device. So this means, when I have WAN on a 3rd device other than my original 2 Intel NICs, the symptoms are gone. Interesting. However, this doesn't explain at all what the problem really is, how it started at some point and how could I eliminate it.
#3
Now I tried to shuffle around the NICs, since I have 2x1G i210 ports, 2x10G X520-DA2 ports and a few SFP+ copper dongles that can handle 1/2.5/5/10G fine:
- WAN on 1G different port than my original setup, LAN on 10G as original - issue persists
- WAN and LAN on 1G - issue persists
- WAN and LAN on 10G - issue persists

I don't see any change across ports on these NICs, I think this is not a HW problem for me :( And everything worked fine until mid-Dec and works fine even now, except most Play Store app updates and a few Youtube videos :o This is very annoying ::)
#4
Oh, wow, that's strange, indeed.

These are my NICs:

1G WAN from my mobo: https://www.asrockrack.com/general/productdetail.asp?Model=X570D4U#Specifications
  Device-1: Intel I210 Gigabit Network vendor: ASRock driver: igb v: kernel pcie: speed: 2.5 GT/s
    lanes: 1 port: e000 bus-ID: 26:00.0 chip-ID: 8086:1533 class-ID: 0200
  IF: enp38s0 state: up speed: 1000 Mbps duplex: full mac: ****


10G LAN from Intel X520-DA2 (2x SFP+): https://www.intel.com/content/dam/doc/product-brief/ethernet-x520-server-adapters-brief.pdf
  Device-4: Intel Ethernet 10G 2P X520 Adapter driver: ixgbe v: kernel pcie: speed: 5 GT/s
    lanes: 8 port: f000 bus-ID: 2d:00.1 chip-ID: 8086:154d class-ID: 0200
  IF: enp45s0f1 state: up speed: 10000 Mbps duplex: full mac: ****


Both added to 1-1 Linux bridges in Proxmox, and OPNsense has 1-1 Virtio interfaces to these bridges.
I also tried to change them from Virtio to Intel E1000 in Proxmox but OPNsense didn't recognize them afterwards, so I needed to revert the settings back and restore OPNsense from backup as the interface settings got permanently damaged in the VM somehow, it couldn't match the virtual interfaces to its settings anymore.

The LAN shouldn't affect anything IMHO as the LTE Wireguard connection was only going through the WAN NIC. No packets should go out of the LAN NIC to the switch and then to the wifi AP in this case.

The mobo also has 1 x Realtek RTL8211E for dedicated IPMI but I'm not sure if that could be used for anything else, and I've read that that IF stays up during shutdown but the i210 NICs don't, so I wouldn't experiment remapping it as WAN.

My main question in all this is what happened in mid Dec that started producing this "selective packet loss" or something. No HW changes were made, just the updated OPNsense.
#5
I also have issues with Google Play Store and YouTube, but my symptom is a bit different but still very annoying.

- YT: Some of the videos won't load at all while most of the videos work fine. I'm using NewPipe, so I can see in its network error dump that the video's random CDN domain is resolved to the IP (not a DNS issue), but still, it won't load at all.

- Play Store: The store generally works fine for browsing, installing apps is also fine but some of the apps won't update or just partially. The symptom is that the app update process is stuck in pending forever, or loads to some % value (fully random between 1 and 99%), and then the update can't finish ever. Just a few minutes ago I could update ChatGPT from OpenAI but Firefox Focus and Google Calendar can't update.

All my 3 different Android devices produce the same in different subnets even, 2 phones and 1 tablet.
First, I thought it's a Unifi AP issue as both YT/PS would work on LTE (4G or 5G), just not on wifi.
But then I was on Wireguard VPN to my home network from a phone, and YT/PS would also produce the same issue. Wireguard is hosted by OPNsense, and it has nothing to do with the Unifi AP. This is why I landed on that it's an OPNsense issue.

As I remember back, the symptoms started to appear around mid December after an OPNsense upgrade but I can't remember the from-to versions. Since then, I updated to the latest version even, but the problem didn't go away. No hardware changes happened in the meantime, and everything was fine before.

Now the only way to update some apps from Play Store is to go LTE and use precious mobile data, and also skip some videos on Wifi that are unwatchable. :(

I have Intel NICs BTW. OPNsense is virtualized in Proxmox with more than enough resources.