Kernel panic after upgrade

Started by Hendre, July 28, 2024, 02:29:01 PM

Previous topic - Next topic
July 28, 2024, 02:29:01 PM Last Edit: July 28, 2024, 02:31:23 PM by hendre
Hi all - after upgrade from 24.1 with a perfectly well running system, I'm now running into kernel panics with IPV6.

DHCPv6 with PD enabled on PPPoE link.

cannot forward src fe80:5::a3ba:e66f:cc48:2823, dst 2a03:2880:f080:12:face:b00c:0:8e, nxt 6, rcvif ixv4, outif pppoe0
cannot forward src fe80:5::a3ba:e66f:cc48:2823, dst 2a03:2880:f111:81:face:b00c:0:38d9, nxt 6, rcvif ixv4, outif pppoe0
panic: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe010a46d000
cpuid = 1
time = 1722172301
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe010a46c220
vpanic() at vpanic+0x131/frame 0xfffffe010a46c350
panic() at panic+0x43/frame 0xfffffe010a46c3b0
vm_fault() at vm_fault+0x15af/frame 0xfffffe010a46c4d0
vm_fault_trap() at vm_fault_trap+0x81/frame 0xfffffe010a46c520
trap_pfault() at trap_pfault+0x1be/frame 0xfffffe010a46c570
calltrap() at calltrap+0x8/frame 0xfffffe010a46c570
--- trap 0xc, rip = 0xffffffff806b6b58, rsp = 0xfffffe010a46c640, rbp = 0xfffffe010a46c640 ---
ixv_if_multi_set_cb() at ixv_if_multi_set_cb+0x18/frame 0xfffffe010a46c640
if_foreach_llmaddr() at if_foreach_llmaddr+0x5d/frame 0xfffffe010a46c690
ixv_if_multi_set() at ixv_if_multi_set+0x45/frame 0xfffffe010a46c9c0
iflib_if_ioctl() at iflib_if_ioctl+0x108/frame 0xfffffe010a46ca30
if_addmulti() at if_addmulti+0x41f/frame 0xfffffe010a46cad0
in6_joingroup_locked() at in6_joingroup_locked+0x1d8/frame 0xfffffe010a46cba0
ip6_setmoptions() at ip6_setmoptions+0xd66/frame 0xfffffe010a46cd30
sosetopt() at sosetopt+0x96/frame 0xfffffe010a46cd90
kern_setsockopt() at kern_setsockopt+0x9d/frame 0xfffffe010a46cde0
sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe010a46ce00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe010a46cf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe010a46cf30
--- syscall (105, FreeBSD ELF64, setsockopt), rip = 0x82462e93a, rsp = 0x82147dea8, rbp = 0x82147df20 ---
KDB: enter: panic
---<<BOOT>>---

Maybe it has something to do with using Intel Virtual Function NICs.

ixv1: <Intel(R) X540 Virtual Function> mem 0xfe200000-0xfe203fff,0xfe204000-0xfe207fff at device 0.0 on pci5
ixv1: Using 2048 TX descriptors and 2048 RX descriptors
ixv1: Using 1 RX queues 1 TX queues
ixv1: Using MSI-X interrupts with 2 vectors
ixv1: allocated for 1 queues
ixv1: allocated for 1 rx queues
ixv1: Ethernet address: 00:0c:29:8e:79:9f
ixv1: netmap queues/slots: TX 1/2048, RX 1/2048
pcib12: <ACPI PCI-PCI bridge> at device 22.1 on pci0
pci6: <ACPI PCI bus> on pcib12
ixv2: <Intel(R) X540 Virtual Function> mem 0xfe100000-0xfe103fff,0xfe104000-0xfe107fff at device 0.0 on pci6
ixv2: Using 2048 TX descriptors and 2048 RX descriptors
ixv2: Using 1 RX queues 1 TX queues
ixv2: Using MSI-X interrupts with 2 vectors
ixv2: allocated for 1 queues
ixv2: allocated for 1 rx queues
ixv2: Ethernet address: 00:0c:29:8e:79:a3
ixv2: netmap queues/slots: TX 1/2048, RX 1/2048
pcib13: <ACPI PCI-PCI bridge> at device 22.2 on pci0
pcib14: <ACPI PCI-PCI bridge> at device 22.3 on pci0
pcib15: <ACPI PCI-PCI bridge> at device 22.4 on pci0
pcib16: <ACPI PCI-PCI bridge> at device 22.5 on pci0
pcib17: <ACPI PCI-PCI bridge> at device 22.6 on pci0
pcib18: <ACPI PCI-PCI bridge> at device 22.7 on pci0
pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0
pci7: <ACPI PCI bus> on pcib19
ixv3: <Intel(R) X540 Virtual Function> mem 0xfda00000-0xfda03fff,0xfda04000-0xfda07fff at device 0.0 on pci7
ixv3: Using 2048 TX descriptors and 2048 RX descriptors
ixv3: Using 1 RX queues 1 TX queues
ixv3: Using MSI-X interrupts with 2 vectors
ixv3: allocated for 1 queues
ixv3: allocated for 1 rx queues
ixv3: Ethernet address: 00:0c:29:8e:79:a0
ixv3: netmap queues/slots: TX 1/2048, RX 1/2048
pcib20: <ACPI PCI-PCI bridge> at device 23.1 on pci0
pcib21: <ACPI PCI-PCI bridge> at device 23.2 on pci0
pcib22: <ACPI PCI-PCI bridge> at device 23.3 on pci0
pcib23: <ACPI PCI-PCI bridge> at device 23.4 on pci0
pcib24: <ACPI PCI-PCI bridge> at device 23.5 on pci0
pcib25: <ACPI PCI-PCI bridge> at device 23.6 on pci0
pcib26: <ACPI PCI-PCI bridge> at device 23.7 on pci0
pcib27: <ACPI PCI-PCI bridge> at device 24.0 on pci0
pci8: <ACPI PCI bus> on pcib27
ixv4: <Intel(R) X540 Virtual Function> mem 0xfd200000-0xfd203fff,0xfd204000-0xfd207fff at device 0.0 on pci8
ixv4: Using 2048 TX descriptors and 2048 RX descriptors
ixv4: Using 1 RX queues 1 TX queues
ixv4: Using MSI-X interrupts with 2 vectors
ixv4: allocated for 1 queues
ixv4: allocated for 1 rx queues
ixv4: Ethernet address: 00:0c:29:8e:79:a1
ixv4: netmap queues/slots: TX 1/2048, RX 1/2048
pcib28: <ACPI PCI-PCI bridge> at device 24.1 on pci0
pcib29: <ACPI PCI-PCI bridge> at device 24.2 on pci0
pcib30: <ACPI PCI-PCI bridge> at device 24.3 on pci0
pcib31: <ACPI PCI-PCI bridge> at device 24.4 on pci0
pcib32: <ACPI PCI-PCI bridge> at device 24.5 on pci0
pcib33: <ACPI PCI-PCI bridge> at device 24.6 on pci0
pcib34: <ACPI PCI-PCI bridge> at device 24.7 on pci0

Hi,

I checked this stack trace didn't find any particular change that would to address this directly. Also nothing on bugs.freebsd.org either. Unsure how to proceed.


Cheers,
Franco

Understood Franco, it's nasty one. I'll revert to 24.1 in the meantime as I've lost all IPV6 at the moment. Does it help if I share my tunables, maybe something in there?

I upgraded ESXi to latest as well as ixgben drivers to latest version but kernel panic persists - seems to be a FreeBSD or OPNsense issue and not hardware related to IPV6.

I ran into the exact same issue.  I ended up grabbing a config backup and re-installing Opnsense 24.7 from scratch, then doing a config import.   I was crashing randomly after 2-20 minutes before.  So far a couple of days since the change and it's still working.

Something corrupted with the upgrade process i suspect.. or goblins.

The ixgbe driver which ixv is a part of appears to be the latest code in FreeBSD development version even today so I can't find a patch there. I also noticed that there are several problems related to PPPoE being prone to crashing in IPv6 environments in particular now on FreeBSD 14.1. This might be one side effect of it.

Does the crash occur when IPv6 mode on WAN is disabled (best changed before upgrade)?


Cheers,
Franco

Disabling IPV6 on WAN does the trick, no more panics. I'll stick to it and happy to test any patches when they become available, thanks Franco!

Thanks for trying that out. It's odd, but at least it's a start. There is one other panic that I'm looking at now. On the surface it doesn't have to do anything with the driver-related issue here, but as I said it's also PPPoE and IPv6 causing it.


Cheers,
Franco

Just for fun I went digging and found something related. Can you try this kernel?

# opnsense-update -zkr 24.7_11


Cheers,
Franco

Unfortunately no luck.

User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0
FreeBSD 14.1-RELEASE-p2 stable/24.7-n267769-2d516dec75e6 SMP amd64
OPNsense 24.7_9 0d38c7804
Plugins os-acme-client-4.4 os-crowdsec-1.0.8_1 os-ddclient-1.22 os-debug-1.5 os-haproxy-4.3_1 os-mdns-repeater-1.1_1 os-sensei-1.17.5 os-sensei-agent-1.17.5 os-sensei-updater-1.17 os-sunnyvalley-1.4_3 os-udpbroadcastrelay-1.0_4 os-vmware-1.5_1
Time Wed, 31 Jul 2024 13:18:45 +0200
OpenSSL 3.0.14
Python 3.11.9
PHP 8.2.20

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0105a6d220
vpanic() at vpanic+0x131/frame 0xfffffe0105a6d350
panic() at panic+0x43/frame 0xfffffe0105a6d3b0
vm_fault() at vm_fault+0x15af/frame 0xfffffe0105a6d4d0
vm_fault_trap() at vm_fault_trap+0x81/frame 0xfffffe0105a6d520
trap_pfault() at trap_pfault+0x1be/frame 0xfffffe0105a6d570
calltrap() at calltrap+0x8/frame 0xfffffe0105a6d570
--- trap 0xc, rip = 0xffffffff806b6b58, rsp = 0xfffffe0105a6d640, rbp = 0xfffffe0105a6d640 ---
ixv_if_multi_set_cb() at ixv_if_multi_set_cb+0x18/frame 0xfffffe0105a6d640
if_foreach_llmaddr() at if_foreach_llmaddr+0x5d/frame 0xfffffe0105a6d690
ixv_if_multi_set() at ixv_if_multi_set+0x45/frame 0xfffffe0105a6d9c0
iflib_if_ioctl() at iflib_if_ioctl+0x108/frame 0xfffffe0105a6da30
if_addmulti() at if_addmulti+0x41f/frame 0xfffffe0105a6dad0
in6_joingroup_locked() at in6_joingroup_locked+0x1d8/frame 0xfffffe0105a6dba0
ip6_setmoptions() at ip6_setmoptions+0xd66/frame 0xfffffe0105a6dd30
sosetopt() at sosetopt+0x96/frame 0xfffffe0105a6dd90
kern_setsockopt() at kern_setsockopt+0x9d/frame 0xfffffe0105a6dde0
sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe0105a6de00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe0105a6df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0105a6df30
--- syscall (105, FreeBSD ELF64, setsockopt), rip = 0x823f8493a, rsp = 0x820ccd848, rbp = 0x820ccd8c0 ---
KDB: enter: panic

Just a question, because I get the message below, too. Is it related to the problem?
cannot forward src fe80:5::a3ba:e66f:cc48:2823, dst 2a03:2880:f080:12:face:b00c:0:8e, nxt 6, rcvif ixv4, outif pppoe0

@hendre thanks, bummer but I'll keep digging :)

@Baender just means that these android devices try to send to the internet without having a GUA assigned (it's stupid, don't ask).

Ich am sorry to ask, but I still don't know, if this triggers a kernel panik. Fun fact, the IPv6 is in my case from my computer, from which I access the firewall. It is a PC on Fedora.

This wouldn't trigger a panic because that bad traffic is dropped pretty quickly because it cannot be forwarded (link  local cannot send to GUA). All it indicates is that some client devices on LAN fail to acquire IPv6 addresses.


Cheers,
Franco

Hey Franco, what do you suggest as next steps please?