CARP IP not pingable from other SR-IOV virtual function on same host

Started by Simon42, February 18, 2024, 01:27:11 PM

Previous topic - Next topic
thanks for the info.
did some quick testing today:
enabled it with ethtool --set-priv-flags enp9s0 vf-true-promisc-support on

but does not seem to work unfortunately :(. Still ping not even reaching opnsense packet capture.

but still have some more testing left todo. (your comments don't let me hope, but lets see...  ;)): still need to do an nvm update, ...

PS:  ethtool -k does not show these priv-flags:
root@pve:~# ethtool -k enp9s0
Features for enp9s0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]


but i got them using
root@pve:~# ethtool --show-priv-flags enp9s0
Private flags for enp9s0:
MFP                    : off
total-port-shutdown    : off
LinkPolling            : off
flow-director-atr      : on
veb-stats              : off
hw-atr-eviction        : off
link-down-on-close     : off
legacy-rx              : off
disable-source-pruning : off
disable-fw-lldp        : off
rs-fec                 : off
base-r-fec             : off
vf-vlan-pruning        : off
vf-true-promisc-support: on


maybe you could post the output of both of these commands for you E810, so we could compare if there are maybe some more interesting (enabled by default) flags on the E810 :)

NVM update of my X710 to current version 9.4 (or something) did also not work in my tests.

here the output of my E810

root@proxmox:~# ethtool -k enp7s0f0np0
Features for enp7s0f0np0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off
rx-vlan-stag-hw-parse: off
rx-vlan-stag-filter: on
l2-fwd-offload: off [fixed]
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]


root@proxmox:~# ethtool --show-priv-flags enp7s0f0np0
Private flags for enp7s0f0np0:
link-down-on-close     : off
fw-lldp-agent          : off
vf-true-promisc-support: on
mdd-auto-reset-vf      : off
vf-vlan-pruning        : off
legacy-rx              : off


There are far less priv-flags???

Sorry for taking so long, but just found time to test again today.

Unfortunately, no success even after updating to 9.40.

Found this older thread. May be interesting: https://forum.proxmox.com/threads/issues-with-sriov-based-nic-passthrough-to-firewall.66392/
It talks about using iavf driver. (current OPNsense is already doing that by default as far as I can see)
And VLAN filters, which I don't use.
Unfortunately, no actual solution was posted there.

Does Proxmox kernel 6.8 also keep your hosts from starting?

This is my X710, stock or latest Intel drivers, no luck...

Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

Hi,

It's not an issue "does not start"... OpnSense on Proxmox works great also with SR-IOV (I've updated to Proxmox 8.2.2 last weekend and it runs great). If it does not start, you probably have to disable secure boot in the "Guest BIOS" => that was my issue when I installed OpnSense on Proxmox the first time ;D

Your error message "smells like" none unique IOMMU groups...

It's an issue with Intel virtual function network interfaces and high availability virtual IP addresses that uses CARP. The issue is that CARP needs a second MAC address and the packet flow inside the Intel driver has some "issues with this by design" on X710 NIC's. That's why it is possible to ping the CARP IP from outside (from another client/PC) but not if the client runs "on the same physical NIC" with another virtual function network device on the same physical card.

As I figured out (and also this link tells us https://forum.proxmox.com/threads/issues-with-sriov-based-nic-passthrough-to-firewall.66392/) it's needed to define "vf-true-promisc-support on" on the Proxmox host on the first NIC interface + promisc is needed to be set within the guest (in our case OpnSense / I think for CARP OpnSense enables promisc anyway?). With this settings and a newer Intel E810 card all works... but it still doesn't work on older X710 Intel NIC's.

Regards

Quote from: subivoodoo on April 29, 2024, 04:24:47 PM
Hi,

It's not an issue "does not start"... OpnSense on Proxmox works great also with SR-IOV (I've updated to Proxmox 8.2.2 last weekend and it runs great). If it does not start, you probably have to disable secure boot in the "Guest BIOS" => that was my issue when I installed OpnSense on Proxmox the first time ;D

Your error message "smells like" none unique IOMMU groups...

It's an issue with Intel virtual function network interfaces and high availability virtual IP addresses that uses CARP. The issue is that CARP needs a second MAC address and the packet flow inside the Intel driver has some "issues with this by design" on X710 NIC's. That's why it is possible to ping the CARP IP from outside (from another client/PC) but not if the client runs "on the same physical NIC" with another virtual function network device on the same physical card.

As I figured out (and also this link tells us https://forum.proxmox.com/threads/issues-with-sriov-based-nic-passthrough-to-firewall.66392/) it's needed to define "vf-true-promisc-support on" on the Proxmox host on the first NIC interface + promisc is needed to be set within the guest (in our case OpnSense / I think for CARP OpnSense enables promisc anyway?). With this settings and a newer Intel E810 card all works... but it still doesn't work on older X710 Intel NIC's.

Regards

I have been running OPNsense and other VMs with SR-IOV for years now, no problems. It's only kernel 6.8 with the X710 interface preventing any of my VMs (Linux or OPNsense) from starting. It's a Supermicro EPYC board with full IOMMU support, no hacks required.
Older Intel 10G card works fine, too.
I have ordered an E180 adapter now, you not having any issues with that one is a good starting point.

I have a consumer Intel H770 board with great IOMMU groups (every device + its functions separate) and i3-13100 only. But I've never had starting issues on VF NIC's with Win11, Ubuntu and OpnSense... and no need for acs override or other hacks

Note that I have no starting issues on both X710 and E810

My issues are:
X710 = CARP does not work properly (only) with VM's on same NIC
E810 = CARP works but C8 state not reachable for low power consumption... X710 can do this, E810 has disabled ASPM

Thinking about the current state:
So because of the X710 vs E810 case, we are sure it's a Problem with the Intel drivers AND only on the X710.
Do you think contacting intel support would get us anywhere?

Why not...

I myself use my E810 (from ebay) and live with the fact that my Proxmox/OpnSense SR-IOV firewall/HA-Cluster node with VLAN tagging in HW now requires 30 watts instead of 22 watts on average. For home useage not too bad, only 20$ electricity bill more per year  >:(

Yeah, maybe I look if I can also find a cheap enough E810 on ebay, now that I know this one will work for sure.

It's been working perfectly for me for a few days now... if it doesn't, I hear it immediately from my wife or the kids  ;)

Homeassistant runs on the same Proxmox/NIC as the OpnSense cluster slave and it can still reache the separate IoT LAN  when the master OpnSense is down for maintenance...
And I'm getting 4Gbit/s iperf between the 2 OpnSenses without any performance tunings and without any resends.

The only downside is the litle higher power consumption becauso of max C3 CPU state instead of C8 with the older X710.

One quirk? If I do a snapshot in Proxmox of the main OpnSense, it freezes it for a few seconds and for this time a HA failover happens. But I don't know if this is related to SR-IOV or the qemu guest of OpnSense not supporting all features.