Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Simon42

#1
Ok, thanks for the information.
#2
Hi,
I updated 3 opnsense vms today as well and experienced this issue on 1 of them.
I don't know if it's just a coincidence, but the one that broke is using zfs (other 2 still use ufs).
Fixed it using:
- opnsense-update -sn "24\.7\/latest"
- pkg update
- opnsense-update -pA 24.7

PS:
were at something like 24.1.6 before (don't remember the exact version)
update to 24.1.10 went through without issues
then proceeded with the update to 24.7

nothing reproduceable, just sharing some thinking to try and help track this down:
maybe something related to the recent zsh over ufs / zfs installer script changes?

@greaman do you use zsf or ufs?
#3
Yeah, maybe I look if I can also find a cheap enough E810 on ebay, now that I know this one will work for sure.
#4
Thinking about the current state:
So because of the X710 vs E810 case, we are sure it's a Problem with the Intel drivers AND only on the X710.
Do you think contacting intel support would get us anywhere?
#5
Sorry for taking so long, but just found time to test again today.

Unfortunately, no success even after updating to 9.40.

Found this older thread. May be interesting: https://forum.proxmox.com/threads/issues-with-sriov-based-nic-passthrough-to-firewall.66392/
It talks about using iavf driver. (current OPNsense is already doing that by default as far as I can see)
And VLAN filters, which I don't use.
Unfortunately, no actual solution was posted there.
#6
thanks for the info.
did some quick testing today:
enabled it with ethtool --set-priv-flags enp9s0 vf-true-promisc-support on

but does not seem to work unfortunately :(. Still ping not even reaching opnsense packet capture.

but still have some more testing left todo. (your comments don't let me hope, but lets see...  ;)): still need to do an nvm update, ...

PS:  ethtool -k does not show these priv-flags:
root@pve:~# ethtool -k enp9s0
Features for enp9s0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]


but i got them using
root@pve:~# ethtool --show-priv-flags enp9s0
Private flags for enp9s0:
MFP                    : off
total-port-shutdown    : off
LinkPolling            : off
flow-director-atr      : on
veb-stats              : off
hw-atr-eviction        : off
link-down-on-close     : off
legacy-rx              : off
disable-source-pruning : off
disable-fw-lldp        : off
rs-fec                 : off
base-r-fec             : off
vf-vlan-pruning        : off
vf-true-promisc-support: on


maybe you could post the output of both of these commands for you E810, so we could compare if there are maybe some more interesting (enabled by default) flags on the E810 :)
#7
Thanks for reporting back your findings.

This vf-true-promisc was actually something I had left on my to-do list of things to still try out when I got time again to investigate this further, as I found it in the nvm changelogs.
But for now didn't really know where i had to apply this (as in tge opnsense vm i had no ethtool)

Am I reading this correctly?:
So I have to do this on the proxmox host after creating the vfs/ runing echo, but before starting any vms using the vfs? And this than applies to all vfs.

I will hope to try this out in the next couple of days and report back.

PS: just to be sure: with "VT" you are talking about virtual functions? ip link calls them also "vf". What does "VT" stand for? Or am I missing something?
#8
Interesting...
What driver / settings do you use with your connectx4?
As (in my opinion at least) dealing with mellanox drivers is quite annoying (even more than this "little" problem with the intel ones here).
For context: i had connectx 3 fcbt before but just could not get anything to work with those ... So I eventually gave up and bought intel ones.
#9
No hurry, would be great if you could test and post your findings.

Yeah IP Alias as you said has same MAC.
I guess the vf only has one MAC assigned (and not the CARP). So it does not receive Carp packages (although promiscuous mode is enabled). And for some reason all this applies to packets within the same nic. (So maybe its indeed a driver issue with packet routing between different vfs - as the traffic should not leave the nic)
#10
Would be interesting to know if we could find someone who got it working with a different nic / driver and vfs.
So we could be more certain its actually a driver problem and not something else.
If so, maybe contacting Intel support then, but getting them to understand this honestly quite complex problem, acknowledging its a problem in their driver and fixing it is another story....
#11
Unfortunately not (yet).
Do you have the same / a similar problem also?
If yes, please keep me updated here if you find something.
#12
So let me start with a little diagram to hopefully make this better understandable:


I have 2 proxmox hosts each running an opnsense vm for HA.
Both of these hosts have an intel xl710 installed, but pve-router has the full card PCI-passthroughed and on pve-main I created multiple SR-IOV virtual functions(VFs) on the host and just used PCI-passtrough on one of the virtual function's pci-device. pve-main also has some other vms (on other VF) handling other services.

Normally the pve-router (master opnsense) handles all the traffic and everything is fine.  But when this one fails, the main server(slave opnsense) should take over routing for the time.

So when OPNsense2 becomes CARP-master, here comes the Problem:
The CARP IP is not pingable from other vms/VFs on the same host. or to be more specific:
vm1 (10.10.110.200) can't ping carp (10.10.110.1)
but vm1 can ping directly to 10.10.110.3
checking with a client outside(10.10.110.40), the client can ping both (.3 AND .1), so the carp is should theoretically be setup fine?
But something seems to go wrong when the traffic is heading to the carp ip on the same host (and this one is using sr-iov VFs - as when opnsense1 is master (which is not using a VF) everthing works).

Some more debugging I already did:

  • Started a Interfaces: Diagnostics: Packet Capture on the interface (including the  Promiscuous checkbox checked) and tried vm -> carp ping again - unfortuneately nothing: no traffic seems to reach the firewall at all (capture works because pinging .3 from vm or .1 from client the capture shows traffic)
  • Checked the arp table of 10.10.110.1 on both vm1 and client: both correctly point to the virtual carp mac (00:00:5e:00:01:0a)

PS:

  • The VF has spoof checking off, trust on

Anyone any idea what could go wrong here?