OPNsense Forum

Archive => 21.1 Legacy Series => Topic started by: mircolino on July 09, 2021, 06:16:05 am

Title: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mircolino on July 09, 2021, 06:16:05 am
Running OPNsense in a Hyper-V VM with five SR-IOV enabled network interfaces.
mlx4en_load=YES is set in Tunables.
With 21.1.7 everything was working fine.
After upgrading to 21.1.8 I now see lots and lots of Mellanox Hyper-V errors and boot takes several minutes.

Code: [Select]
Copyright (c) 2013-2019 The HardenedBSD Project.
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.1-RELEASE-p19-HBSD #0  0c59842367b(stable/21.1)-dirty: Mon Jul  5 15:08:43 CEST 2021
    root@sensey:/usr/obj/usr/src/amd64.amd64/sys/SMP amd64
FreeBSD clang version 8.0.1 (tags/RELEASE_801/final 366581) (based on LLVM 8.0.1)
SRAT: Ignoring memory at addr 0x108000000
SRAT: Ignoring memory at addr 0x1000000000
SRAT: Ignoring memory at addr 0x10000000000
SRAT: Ignoring memory at addr 0x20000000000
SRAT: Ignoring memory at addr 0x40000000000
SRAT: Ignoring memory at addr 0x80000000000
VT(efifb): resolution 1024x768
Hyper-V Version: 10.0.17763 [SP2]
  Features=0x2e7f<VPRUNTIME,TMREFCNT,SYNIC,SYNTM,APIC,HYPERCALL,VPINDEX,REFTSC,IDLE,TMFREQ>
  PM Features=0x0 [C2]
  Features3=0xbed7b2<DEBUG,XMMHC,IDLE,NUMA,TMFREQ,SYNCMC,CRASH,NPIEP>
Timecounter "Hyper-V" frequency 10000000 Hz quality 2000
HardenedBSD: initialize and check features (__HardenedBSD_version 1200059 __FreeBSD_version 1201000).
CPU: AMD Ryzen 7 2700 Eight-Core Processor           (3194.10-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f82  Family=0x17  Model=0x8  Stepping=2
  Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0xfed83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x4003f3<LAHF,CMP,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,Topology>
  Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  AMD Extended Feature Extensions ID EBX=0x2001004<XSaveErPtr>
Hypervisor: Origin = "Microsoft Hv"
real memory  = 4294967296 (4096 MB)
avail memory = 4118040576 (3927 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <VRTUAL MICROSFT>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23 on motherboard
Launching APs: 1 2 3
random: entropy device external interface
wlan: mac acl policy registered
Timecounter "Hyper-V-TSC" frequency 10000000 Hz quality 3000
kbd0 at kbdmux0
module_register_init: MOD_LOAD (vesa, 0xffffffff8128e7f0, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
000.000055 [4336] netmap_init               netmap: loaded module
[ath_hal] loaded
nexus0
efirtc0: <EFI Realtime Clock> on motherboard
efirtc0: registered as a time-of-day clock, resolution 1.000000s
cryptosoft0: <software crypto> on motherboard
acpi0: <VRTUAL MICROSFT> on motherboard
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_syscontainer0: <System Container> on acpi0
vmbus0: <Hyper-V Vmbus> on acpi_syscontainer0
vmbus_res0: <Hyper-V Vmbus Resource> irq 5 on acpi0
Timecounters tick every 10.000 msec
usb_needs_explore_all: no devclass
vmbus0: version 3.0
hvet0: <Hyper-V event timer> on vmbus0
Event timer "Hyper-V" frequency 10000000 Hz quality 1000
hvkbd0: <Hyper-V KBD> on vmbus0
hvheartbeat0: <Hyper-V Heartbeat> on vmbus0
hvkvp0: <Hyper-V KVP> on vmbus0
hvshutdown0: <Hyper-V Shutdown> on vmbus0
hvtimesync0: <Hyper-V Timesync> on vmbus0
hvtimesync0: RTT
hvvss0: <Hyper-V VSS> on vmbus0
storvsc0: <Hyper-V SCSI> on vmbus0
da0 at storvsc0 bus 0 scbus0 target 0 lun 0
da0: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 65536MB (134217728 512 byte sectors)
hn0: <Hyper-V Network Interface> on vmbus0
cd0 at storvsc0 bus 0 scbus0 target 0 lun 1
cd0: <Msft Virtual DVD-ROM 1.0> Removable CD-ROM SPC-3 SCSI device
cd0: 300.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
hn0: got notify, nvs type 128
hn0: Ethernet address: XX:XX:XX:bf:a3:0c
hn0: link state changed to UP
hn1: <Hyper-V Network Interface> on vmbus0
hn1: got notify, nvs type 128
hn1: Ethernet address: XX:XX:XX:bf:a3:0d
hn1: link state changed to UP
hn2: <Hyper-V Network Interface> on vmbus0
hn2: got notify, nvs type 128
hn2: Ethernet address: XX:XX:XX:bf:a3:0e
hn2: link state changed to UP
hn3: <Hyper-V Network Interface> on vmbus0
hn3: got notify, nvs type 128
hn3: Ethernet address: XX:XX:XX:bf:a3:0f
hn3: link state changed to UP
hn4: <Hyper-V Network Interface> on vmbus0
hn4: got notify, nvs type 128
hn4: Ethernet address: XX:XX:XX:bf:a3:10
hn4: link state changed to UP
pcib0: <Hyper-V PCI Express Pass Through> on vmbus0
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
pci0: <PCI bus> on pcib0
mlx4_core0: <mlx4_core> at device 2.0 on pci0
mlx4_core: Mellanox ConnectX core driver v3.5.1 (April 2019)
mlx4_core: Initializing mlx4_core
mlx4_core0: Detected virtual function - running in slave mode
mlx4_core0: Sending reset
mlx4_core0: Sending vhcr0
mlx4_core0: HCA minimum page size:512
mlx4_core0: Timestamping is not supported in slave mode
mlx4_en mlx4_core0: Activating port:1
mlxen0: Ethernet address: XX:XX:XX:bf:a3:0c
mlx4_en: mlx4_core0: Port 1: Using 4 TX rings
mlx4_en: mlx4_core0: Port 1: Using 4 RX rings
mlxen0: link state changed to DOWN
mlx4_en: mlxen0: Using 4 TX rings
hn0: link state changed to DOWN
mlx4_en: mlxen0: Using 4 RX rings
mlx4_en: mlxen0: Initializing port
pcib1: <Hyper-V PCI Express Pass Through> on vmbus0
pci1: <PCI bus> on pcib1
mlx4_core1: <mlx4_core> at device 2.0 on pci1
mlx4_core: Initializing mlx4_core
mlx4_core1: Detected virtual function - running in slave mode
mlx4_core1: Sending reset
mlx4_core1: Sending vhcr0
mlx4_core1: HCA minimum page size:512
mlx4_core1: Timestamping is not supported in slave mode
mlx4_en mlx4_core1: Activating port:1
mlxen2: Ethernet address: XX:XX:XX:bf:a3:0d
mlx4_en: mlx4_core1: Port 1: Using 4 TX rings
mlxen2: link state changed to DOWN
mlx4_en: mlx4_core1: Port 1: Using 4 RX rings
hn1: link state changed to DOWN
mlx4_en: mlxen2: Using 4 TX rings
mlx4_en: mlxen2: Using 4 RX rings
mlx4_en: mlxen2: Initializing port
pcib2: <Hyper-V PCI Express Pass Through> on vmbus0
pci2: <PCI bus> on pcib2
mlx4_core2: <mlx4_core> at device 2.0 on pci2
mlx4_core: Initializing mlx4_core
mlx4_core2: Detected virtual function - running in slave mode
mlx4_core2: Sending reset
mlx4_core2: Sending vhcr0
mlx4_core2: HCA minimum page size:512
mlx4_core2: Timestamping is not supported in slave mode
mlx4_en mlx4_core2: Activating port:1
mlxen4: Ethernet address: XX:XX:XX:bf:a3:0e
mlx4_en: mlx4_core2: Port 1: Using 4 TX rings
mlx4_en: mlx4_core2: Port 1: Using 4 RX rings
mlxen4: link state changed to DOWN
mlx4_en: mlxen4: Using 4 TX rings
hn2: link state changed to DOWN
mlx4_en: mlxen4: Using 4 RX rings
mlx4_en: mlxen4: Initializing port
pcib3: <Hyper-V PCI Express Pass Through> on vmbus0
pci3: <PCI bus> on pcib3
mlx4_core3: <mlx4_core> at device 2.0 on pci3
mlx4_core: Initializing mlx4_core
mlx4_core3: Detected virtual function - running in slave mode
mlx4_core3: Sending reset
mlx4_core3: Sending vhcr0
mlx4_core3: HCA minimum page size:512
mlx4_core3: Timestamping is not supported in slave mode
mlx4_en mlx4_core3: Activating port:1
mlxen6: Ethernet address: XX:XX:XX:bf:a3:0f
mlx4_en: mlx4_core3: Port 1: Using 4 TX rings
mlx4_en: mlx4_core3: Port 1: Using 4 RX rings
mlxen6: link state changed to DOWN
mlx4_en: mlxen6: Using 4 TX rings
hn3: link state changed to DOWN
mlx4_en: mlxen6: Using 4 RX rings
mlx4_en: mlxen6: Initializing port
pcib4: <Hyper-V PCI Express Pass Through> on vmbus0
pci4: <PCI bus> on pcib4
mlx4_core4: <mlx4_core> at device 2.0 on pci4
mlx4_core: Initializing mlx4_core
mlx4_core4: Detected virtual function - running in slave mode
mlx4_core4: Sending reset
mlx4_core4: Sending vhcr0
mlx4_core4: HCA minimum page size:512
mlx4_core4: Timestamping is not supported in slave mode
mlx4_en mlx4_core4: Activating port:1
mlxen8: Ethernet address: XX:XX:XX:bf:a3:10
mlx4_en: mlx4_core4: Port 1: Using 4 TX rings
mlx4_en: mlx4_core4: Port 1: Using 4 RX rings
mlx4_en: mlxen8: Using 4 TX rings
mlx4_en: mlxen8: Using 4 RX rings
mlxen8: link state changed to DOWN
hn4: link state changed to DOWN
mlx4_en: mlxen8: Initializing port
mlxen0: tso6 disabled due to -txcsum6.
mlxen0: enable txcsum6 first.
mlxen2: tso6 disabled due to -txcsum6.
mlxen2: enable txcsum6 first.
mlxen4: tso6 disabled due to -txcsum6.
mlxen4: enable txcsum6 first.
mlxen6: tso6 disabled due to -txcsum6.
mlxen6: enable txcsum6 first.
mlxen8: tso6 disabled due to -txcsum6.
mlxen8: enable txcsum6 first.
lo0: link state changed to UP
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256> on motherboard
hn2: disable IPV6 mbuf hash delivery
mlxen4: link state changed to UP
hn2: link state changed to UP
mlx4_core2: going promisc on 1
hn4: disable IPV6 mbuf hash delivery
mlxen8: link state changed to UP
hn4: link state changed to UP
mlx4_core4: going promisc on 1
hn3: disable IPV6 mbuf hash delivery
mlxen6: link state changed to UP
hn3: link state changed to UP
mlx4_core3: going promisc on 1
hn1: disable IPV6 mbuf hash delivery
mlxen2: link state changed to UP
hn1: link state changed to UP
mlx4_core1: going promisc on 1
hn0: disable IPV6 mbuf hash delivery
mlxen0: link state changed to UP
hn0: link state changed to UP
mlx4_core0: going promisc on 1
hn3: got notify, nvs type 129
mlx4_core1: Internal error detected on the communication channel
mlx4_core2: Internal error detected on the communication channel
mlx4_core3: Internal error detected on the communication channel
mlx4_core4: Internal error detected on the communication channel
mlx4_core3: device is going to be reset
mlx4_core3: VF reset is not needed
mlx4_core3: device was reset successfully
mlx4_en mlx4_core3: Internal error detected, restarting device
mlx4_core4: device is going to be reset
mlx4_core4: VF reset is not needed
mlx4_core4: device was reset successfully
mlx4_en mlx4_core4: Internal error detected, restarting device
mlx4_core1: device is going to be reset
mlx4_core1: VF reset is not needed
mlx4_core1: device was reset successfully
mlx4_en mlx4_core1: Internal error detected, restarting device
mlx4_core2: device is going to be reset
mlx4_core2: VF reset is not needed
mlx4_core2: device was reset successfully
mlx4_en mlx4_core2: Internal error detected, restarting device
mlx4_core2: command 0x5 failed: fw status = 0x1
hn3: got notify, nvs type 128
hn3: got notify, nvs type 128
hn2: got notify, nvs type 129
hn2: got notify, nvs type 128
hn2: got notify, nvs type 128
hn4: got notify, nvs type 129
hn4: got notify, nvs type 128
hn4: got notify, nvs type 128
hn1: got notify, nvs type 129
hn1: got notify, nvs type 128
hn1: got notify, nvs type 128
pflog0: promiscuous mode enabled
mlx4_en: mlxen2: Failed disabling multicast filter
mlx4_en: mlxen4: Failed disabling multicast filter
mlx4_core3: Failed to close slave function
mlx4_core3: Detected virtual function - running in slave mode
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
mlx4_core4: Failed to close slave function
mlx4_core4: Detected virtual function - running in slave mode
mlx4_core1: Failed to close slave function
mlx4_core1: Detected virtual function - running in slave mode
mlx4_core2: Failed to close slave function
mlx4_core2: Detected virtual function - running in slave mode
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
mlx4_core3: recovering from previously mis-behaved VM
mlx4_core4: recovering from previously mis-behaved VM
mlx4_core1: recovering from previously mis-behaved VM
mlx4_core2: recovering from previously mis-behaved VM
mlx4_core3: Communication channel is offline.
mlx4_core3: PF is not responsive, skipping initialization
mlx4_core3: Failed to initialize slave
mlx4_core3: mlx4_restart_one: ERROR: mlx4_load_one failed, pci_name=Mellanox driver (3.5.1), err=-5
mlx4_core3: mlx4_restart_one was ended, ret=-5
mlx4_core4: Communication channel is offline.
mlx4_core4: PF is not responsive, skipping initialization
mlx4_core4: Failed to initialize slave
mlx4_core4: mlx4_restart_one: ERROR: mlx4_load_one failed, pci_name=Mellanox driver (3.5.1), err=-5
mlx4_core4: mlx4_restart_one was ended, ret=-5
mlx4_core1: Communication channel is offline.
mlx4_core1: PF is not responsive, skipping initialization
mlx4_core1: Failed to initialize slave
mlx4_core1: mlx4_restart_one: ERROR: mlx4_load_one failed, pci_name=Mellanox driver (3.5.1), err=-5
mlx4_core1: mlx4_restart_one was ended, ret=-5
mlx4_core2: Communication channel is offline.
mlx4_core2: PF is not responsive, skipping initialization
mlx4_core2: Failed to initialize slave
mlx4_core2: mlx4_restart_one: ERROR: mlx4_load_one failed, pci_name=Mellanox driver (3.5.1), err=-5
mlx4_core2: mlx4_restart_one was ended, ret=-5
mlx4_core2: mlx4_remove_one: interface is down
mlx4_core2: detached
mlx4_core4: mlx4_remove_one: interface is down
mlx4_core4: detached
mlx4_core1: mlx4_remove_one: interface is down
mlx4_core1: detached
mlx4_core3: mlx4_remove_one: interface is down
mlx4_core3: detached
pci3: detached
pcib3: detached
pci4: detached
pcib4: detached
pci2: detached
pcib2: detached
pci1: detached
pcib1: detached
pcib1: <Hyper-V PCI Express Pass Through> on vmbus0
pci1: <PCI bus> on pcib1
mlx4_core1: <mlx4_core> at device 2.0 on pci1
mlx4_core: Initializing mlx4_core
mlx4_core1: Detected virtual function - running in slave mode
mlx4_core1: Sending reset
mlx4_core1: Sending vhcr0
mlx4_core1: HCA minimum page size:512
mlx4_core1: Timestamping is not supported in slave mode
mlx4_en mlx4_core1: Activating port:1
mlxen2: Ethernet address: XX:XX:XX:bf:a3:0f
mlx4_en: mlx4_core1: Port 1: Using 4 TX rings
mlx4_en: mlx4_core1: Port 1: Using 4 RX rings
mlxen2: link state changed to DOWN
mlx4_en: mlxen2: Using 4 TX rings
hn3: link state changed to DOWN
mlx4_en: mlxen2: Using 4 RX rings
mlx4_en: mlxen2: Initializing port
pcib2: <Hyper-V PCI Express Pass Through> on vmbus0
pci2: <PCI bus> on pcib2
mlx4_core2: <mlx4_core> at device 2.0 on pci2
mlx4_core: Initializing mlx4_core
mlx4_core2: Detected virtual function - running in slave mode
mlx4_core2: Sending reset
mlx4_core2: Sending vhcr0
mlx4_core2: HCA minimum page size:512
mlx4_core2: Timestamping is not supported in slave mode
mlx4_en mlx4_core2: Activating port:1
mlxen4: Ethernet address: XX:XX:XX:bf:a3:10
mlx4_en: mlx4_core2: Port 1: Using 4 TX rings
mlx4_en: mlx4_core2: Port 1: Using 4 RX rings
mlx4_en: mlxen4: Using 4 TX rings
mlx4_en: mlxen4: Using 4 RX rings
mlxen4: link state changed to DOWN
hn4: link state changed to DOWN
mlx4_en: mlxen4: Initializing port
pcib3: <Hyper-V PCI Express Pass Through> on vmbus0
pci3: <PCI bus> on pcib3
mlx4_core3: <mlx4_core> at device 2.0 on pci3
mlx4_core: Initializing mlx4_core
mlx4_core3: Detected virtual function - running in slave mode
mlx4_core3: Sending reset
mlx4_core3: Sending vhcr0
mlx4_core3: HCA minimum page size:512
mlx4_core3: Timestamping is not supported in slave mode
mlx4_en mlx4_core3: Activating port:1
mlxen6: Ethernet address: XX:XX:XX:bf:a3:0e
mlx4_en: mlx4_core3: Port 1: Using 4 TX rings
mlx4_en: mlx4_core3: Port 1: Using 4 RX rings
mlxen6: link state changed to DOWN
mlx4_en: mlxen6: Using 4 TX rings
hn2: link state changed to DOWN
mlx4_en: mlxen6: Using 4 RX rings
mlx4_en: mlxen6: Initializing port
pcib4: <Hyper-V PCI Express Pass Through> on vmbus0
pci4: <PCI bus> on pcib4
mlx4_core4: <mlx4_core> at device 2.0 on pci4
mlx4_core: Initializing mlx4_core
mlx4_core4: Detected virtual function - running in slave mode
mlx4_core4: Sending reset
mlx4_core4: Sending vhcr0
mlx4_core4: HCA minimum page size:512
mlx4_core4: Timestamping is not supported in slave mode
mlx4_en mlx4_core4: Activating port:1
mlxen8: Ethernet address: XX:XX:XX:bf:a3:0d
mlx4_en: mlx4_core4: Port 1: Using 4 TX rings
mlx4_en: mlx4_core4: Port 1: Using 4 RX rings
mlx4_en: mlxen8: Using 4 TX rings
mlx4_en: mlxen8: Using 4 RX rings
mlxen8: link state changed to DOWN
hn1: link state changed to DOWN
mlx4_en: mlxen8: Initializing port
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
hn3: disable IPV6 mbuf hash delivery
mlxen2: link state changed to UP
hn3: link state changed to UP
mlx4_core1: going promisc on 1
hn4: disable IPV6 mbuf hash delivery
mlxen4: link state changed to UP
hn4: link state changed to UP
mlx4_core2: going promisc on 1
hn2: disable IPV6 mbuf hash delivery
mlxen6: link state changed to UP
hn2: link state changed to UP
mlx4_core3: going promisc on 1
hn1: disable IPV6 mbuf hash delivery
mlxen8: link state changed to UP
hn1: link state changed to UP
mlx4_core4: going promisc on 1
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled
pflog0: promiscuous mode enabled
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mircolino on July 10, 2021, 06:00:00 am
This is pretty much a disaster. 😢
Any clean way to downgrade to 21.1.7? Is there an image I can download?
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mimugmail on July 10, 2021, 06:42:34 am
Dont you have a snapshot of the VM?
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mircolino on July 10, 2021, 07:01:44 am
Unfortunately I don't 😞
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mircolino on July 13, 2021, 04:51:39 am
Anything?
What could be causing all this issues, given that the exact same configuration was working flawlessly in 21.1.7?

Also, if I recreate a VM using the 21.1 image from https://pkg.opnsense.org/releases/, will I then be able to upgrade to 21.1.7 and not 21.1.8?
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mimugmail on July 13, 2021, 06:12:07 am
You can try to revert the kernel via opnsense-revert. Just have a look at the docs
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: franco on July 13, 2021, 08:51:49 am
I don't think any kernel changes in 21.x would cause this.


Cheers,
Franco
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mircolino on July 13, 2021, 09:04:27 am
I know I made a mistake not to create a VM snapshot.
But the simple fact that a minor upgrade from .7 to .8 would render my firewall inoperable and there is no way to reinstall a complete 21.1.7 image is IMO nonsensical.

I guess if nobody else has a suggestion, this is the end of the opnsense road for me 😞
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: franco on July 13, 2021, 09:12:04 am
I'm unsure what you expect us to do. We could tell you to revert packges, we could tell you to reinstall 21.1 but at the end of the day it's likely that the VM state itself is stuck in limbo and you will not be able to get rid of the error.

I can tell you that this is not a particular guest issue. It looks like a host/container issue.


Cheers,
Franco
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: mircolino on July 13, 2021, 09:34:04 am
I can tell you that this is not a particular guest issue. It looks like a host/container issue.

AFAIK nothing has changed on the host, so the only way for me to know for sure would be to reinstall the complete and original 21.1.7 VM, like it was before the upgrade and verify that it's still working.

Can I install 21.1 and then upgrade to 21.1.7? If that's not possible, I'm not sure what else I can do to verify your statement.
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: franco on July 13, 2021, 10:42:52 am
Fine, just go back to older kernel:

# opnsense-update -kr 21.1.7
# opnsense-shell reboot

With 99% certainty this will not fix your error. Particularly because we do not change Hyper-V or Mellanox driver code...


Cheers,
Franco
Title: Re: Lots of Hyper-V SR-IOV Mellanox boot errors after upgrading to 21.1.8
Post by: random1104 on August 16, 2021, 04:42:14 pm
out of curiosity, how did it go?