CARP IP not pingable from other SR-IOV virtual function on same host

Started by Simon42, February 18, 2024, 01:27:11 PM

Previous topic - Next topic
So let me start with a little diagram to hopefully make this better understandable:


I have 2 proxmox hosts each running an opnsense vm for HA.
Both of these hosts have an intel xl710 installed, but pve-router has the full card PCI-passthroughed and on pve-main I created multiple SR-IOV virtual functions(VFs) on the host and just used PCI-passtrough on one of the virtual function's pci-device. pve-main also has some other vms (on other VF) handling other services.

Normally the pve-router (master opnsense) handles all the traffic and everything is fine.  But when this one fails, the main server(slave opnsense) should take over routing for the time.

So when OPNsense2 becomes CARP-master, here comes the Problem:
The CARP IP is not pingable from other vms/VFs on the same host. or to be more specific:
vm1 (10.10.110.200) can't ping carp (10.10.110.1)
but vm1 can ping directly to 10.10.110.3
checking with a client outside(10.10.110.40), the client can ping both (.3 AND .1), so the carp is should theoretically be setup fine?
But something seems to go wrong when the traffic is heading to the carp ip on the same host (and this one is using sr-iov VFs - as when opnsense1 is master (which is not using a VF) everthing works).

Some more debugging I already did:

  • Started a Interfaces: Diagnostics: Packet Capture on the interface (including the  Promiscuous checkbox checked) and tried vm -> carp ping again - unfortuneately nothing: no traffic seems to reach the firewall at all (capture works because pinging .3 from vm or .1 from client the capture shows traffic)
  • Checked the arp table of 10.10.110.1 on both vm1 and client: both correctly point to the virtual carp mac (00:00:5e:00:01:0a)

PS:

  • The VF has spoof checking off, trust on

Anyone any idea what could go wrong here?


Unfortunately not (yet).
Do you have the same / a similar problem also?
If yes, please keep me updated here if you find something.

Yes, I plan a similar setup and run into the same issue during testing.

Proxmox host, newest OpnSense version running with a CARP IP on LAN... all clients on my network can reach this IP, the Proxmox host can ping it too, but no other VM client on this host that uses virtual function NIC's can ping the CARP IP.

The only thing that works from such a client is the ARP broadcast for the CARP IP, so the client knows the MAC address of the CARP IP but after that no packages received by Opnsense (traffic to "real" IP of the OpnSense no issues!).

In my case, I have Intel E810... i think it's an Intel iavf driver issue as the 710 and 810 cards uses the same VF driver.

Would be interesting to know if we could find someone who got it working with a different nic / driver and vfs.
So we could be more certain its actually a driver problem and not something else.
If so, maybe contacting Intel support then, but getting them to understand this honestly quite complex problem, acknowledging its a problem in their driver and fixing it is another story....

Maybe I can test it with a Mellanox ConnectX4, but unfortunately certainly not before easter...

Strange, a virtual IP of type "IP Alias" works well also from such clients  :-\
Whereby these have the same MAC address as the real LAN NIC IP

No hurry, would be great if you could test and post your findings.

Yeah IP Alias as you said has same MAC.
I guess the vf only has one MAC assigned (and not the CARP). So it does not receive Carp packages (although promiscuous mode is enabled). And for some reason all this applies to packets within the same nic. (So maybe its indeed a driver issue with packet routing between different vfs - as the traffic should not leave the nic)

It left me no peace... I did a "quick" test with a ConnectX4.

But I have a different problem with this setup, I can't achieve a MASTER CARP state on a Mellanox VT interface. It looks as if with Mellanox cards the CARP requests are sent to themselves and therefore always a "better" master is available. I am currently unable to obtain a MASTER CARP IP even with just one OpnSense instance running.

Getting the following log messages:

<6>carp: 8@mce0: MASTER -> BACKUP (more frequent advertisement received)

So multiple MAC's on the same SR-IOV virtual adapter does not work at all???

Interesting...
What driver / settings do you use with your connectx4?
As (in my opinion at least) dealing with mellanox drivers is quite annoying (even more than this "little" problem with the intel ones here).
For context: i had connectx 3 fcbt before but just could not get anything to work with those ... So I eventually gave up and bought intel ones.

No special driver, just the default that comes with newest proxmox 8.1.5

Activated and configured the virtual adapter like this (defined a fix MAC, eanbled spoofing and trust):

echo 8 > /sys/class/infiniband/mlx5_1/device/sriov_numvfs
ip link set dev enp8s0f1np1 vf 7 mac xx:yy:zz:..
ip link set dev enp8s0f1np1 vf 7 spoofchk off
ip link set dev enp8s0f1np1 vf 7 trust on


I haven't had any issues setup or use these virtual NIC within the OpnSense VM or access the normal LAN IP to configure a test CARP IP...


I think I got it! I managed to ping the CARP IP (Intel E810 NIC) of a test OpnSense VM firewall and even open the management web GUI over these CARP IP from within an Ubuntu + Win11 VM running on the same Proxmox host also useing virtual adapters on the same NIC and same PF... Do you want to know what I've done ;D Did some Google research and tried out:

ethtool --set-priv-flags enp8s0f1np1 vf-true-promisc-support on

On the Proxmox host before starting any VM (enp8s0f1np1 is my PF of all the test VM's with the virtual adapters).

Can you test this too on your setup?

One downside is that this setting is global for all VT's on this NIC... but the trust on could be off on all other VT's and just be on for the OpnSense VT.

I have tested my findings also on an Intel X710-DA2 NIC... and it did NOT work. Even after an NVM update to the newest version.

The following statements applied on the X710 NIC:

echo 8 > /sys/class/net/enp7s0f0/device/sriov_numvfs
ethtool --set-priv-flags enp7s0f0 vf-true-promisc-support on
ip link set enp7s0f0 vf 0 mac 76:9e:17:83:00:00
ip link set dev enp7s0f0 vf 0 trust on
ip link set dev enp7s0f0 vf 0 spoofchk off
ip link set enp7s0f0v0 promisc on


DID NOT WORK!

Shutdown the test rig, swapped back to E810 and applied the same statements (search replace enp7s0f0 with enp7s0f0np0):

echo 8 > /sys/class/net/enp7s0f0np0/device/sriov_numvfs
ethtool --set-priv-flags enp7s0f0np0 vf-true-promisc-support on
ip link set enp7s0f0np0 vf 0 mac 76:9e:17:83:00:00
ip link set dev enp7s0f0np0 vf 0 trust on
ip link set dev enp7s0f0np0 vf 0 spoofchk off
ip link set enp7s0f0v0 promisc on


Started OpnSense (LAN on VT 0) + the Win11 test VM (on VT 3)... and it all works!

Thanks for reporting back your findings.

This vf-true-promisc was actually something I had left on my to-do list of things to still try out when I got time again to investigate this further, as I found it in the nvm changelogs.
But for now didn't really know where i had to apply this (as in tge opnsense vm i had no ethtool)

Am I reading this correctly?:
So I have to do this on the proxmox host after creating the vfs/ runing echo, but before starting any vms using the vfs? And this than applies to all vfs.

I will hope to try this out in the next couple of days and report back.

PS: just to be sure: with "VT" you are talking about virtual functions? ip link calls them also "vf". What does "VT" stand for? Or am I missing something?

yes I mean virtual function... VT is "Virtualization Technology"

yes the vf-true-promisc flag must be set on the Proxmox host... before the first VM is started (you will get an error if a VM is running that uses such a VF NIC when you try to set "vf-true-promisc-support on"). I think before or after "echo" doesn't matter. With ethtool -k IFNAME you can see the current flags on your interface... and also yes (as I understood) this flags are for all virtual function network adapers on this IF.

I think it will also not work on your X710 card...  :(