OPNsense Forum

English Forums => 24.7, 24.10 Legacy Series => Topic started by: Hendre on July 28, 2024, 02:29:01 PM

Title: Kernel panic after upgrade
Post by: Hendre on July 28, 2024, 02:29:01 PM
Hi all - after upgrade from 24.1 with a perfectly well running system, I'm now running into kernel panics with IPV6.

DHCPv6 with PD enabled on PPPoE link.

cannot forward src fe80:5::a3ba:e66f:cc48:2823, dst 2a03:2880:f080:12:face:b00c:0:8e, nxt 6, rcvif ixv4, outif pppoe0
cannot forward src fe80:5::a3ba:e66f:cc48:2823, dst 2a03:2880:f111:81:face:b00c:0:38d9, nxt 6, rcvif ixv4, outif pppoe0
panic: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe010a46d000
cpuid = 1
time = 1722172301
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe010a46c220
vpanic() at vpanic+0x131/frame 0xfffffe010a46c350
panic() at panic+0x43/frame 0xfffffe010a46c3b0
vm_fault() at vm_fault+0x15af/frame 0xfffffe010a46c4d0
vm_fault_trap() at vm_fault_trap+0x81/frame 0xfffffe010a46c520
trap_pfault() at trap_pfault+0x1be/frame 0xfffffe010a46c570
calltrap() at calltrap+0x8/frame 0xfffffe010a46c570
--- trap 0xc, rip = 0xffffffff806b6b58, rsp = 0xfffffe010a46c640, rbp = 0xfffffe010a46c640 ---
ixv_if_multi_set_cb() at ixv_if_multi_set_cb+0x18/frame 0xfffffe010a46c640
if_foreach_llmaddr() at if_foreach_llmaddr+0x5d/frame 0xfffffe010a46c690
ixv_if_multi_set() at ixv_if_multi_set+0x45/frame 0xfffffe010a46c9c0
iflib_if_ioctl() at iflib_if_ioctl+0x108/frame 0xfffffe010a46ca30
if_addmulti() at if_addmulti+0x41f/frame 0xfffffe010a46cad0
in6_joingroup_locked() at in6_joingroup_locked+0x1d8/frame 0xfffffe010a46cba0
ip6_setmoptions() at ip6_setmoptions+0xd66/frame 0xfffffe010a46cd30
sosetopt() at sosetopt+0x96/frame 0xfffffe010a46cd90
kern_setsockopt() at kern_setsockopt+0x9d/frame 0xfffffe010a46cde0
sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe010a46ce00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe010a46cf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe010a46cf30
--- syscall (105, FreeBSD ELF64, setsockopt), rip = 0x82462e93a, rsp = 0x82147dea8, rbp = 0x82147df20 ---
KDB: enter: panic
---<<BOOT>>---

Maybe it has something to do with using Intel Virtual Function NICs.

ixv1: <Intel(R) X540 Virtual Function> mem 0xfe200000-0xfe203fff,0xfe204000-0xfe207fff at device 0.0 on pci5
ixv1: Using 2048 TX descriptors and 2048 RX descriptors
ixv1: Using 1 RX queues 1 TX queues
ixv1: Using MSI-X interrupts with 2 vectors
ixv1: allocated for 1 queues
ixv1: allocated for 1 rx queues
ixv1: Ethernet address: 00:0c:29:8e:79:9f
ixv1: netmap queues/slots: TX 1/2048, RX 1/2048
pcib12: <ACPI PCI-PCI bridge> at device 22.1 on pci0
pci6: <ACPI PCI bus> on pcib12
ixv2: <Intel(R) X540 Virtual Function> mem 0xfe100000-0xfe103fff,0xfe104000-0xfe107fff at device 0.0 on pci6
ixv2: Using 2048 TX descriptors and 2048 RX descriptors
ixv2: Using 1 RX queues 1 TX queues
ixv2: Using MSI-X interrupts with 2 vectors
ixv2: allocated for 1 queues
ixv2: allocated for 1 rx queues
ixv2: Ethernet address: 00:0c:29:8e:79:a3
ixv2: netmap queues/slots: TX 1/2048, RX 1/2048
pcib13: <ACPI PCI-PCI bridge> at device 22.2 on pci0
pcib14: <ACPI PCI-PCI bridge> at device 22.3 on pci0
pcib15: <ACPI PCI-PCI bridge> at device 22.4 on pci0
pcib16: <ACPI PCI-PCI bridge> at device 22.5 on pci0
pcib17: <ACPI PCI-PCI bridge> at device 22.6 on pci0
pcib18: <ACPI PCI-PCI bridge> at device 22.7 on pci0
pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0
pci7: <ACPI PCI bus> on pcib19
ixv3: <Intel(R) X540 Virtual Function> mem 0xfda00000-0xfda03fff,0xfda04000-0xfda07fff at device 0.0 on pci7
ixv3: Using 2048 TX descriptors and 2048 RX descriptors
ixv3: Using 1 RX queues 1 TX queues
ixv3: Using MSI-X interrupts with 2 vectors
ixv3: allocated for 1 queues
ixv3: allocated for 1 rx queues
ixv3: Ethernet address: 00:0c:29:8e:79:a0
ixv3: netmap queues/slots: TX 1/2048, RX 1/2048
pcib20: <ACPI PCI-PCI bridge> at device 23.1 on pci0
pcib21: <ACPI PCI-PCI bridge> at device 23.2 on pci0
pcib22: <ACPI PCI-PCI bridge> at device 23.3 on pci0
pcib23: <ACPI PCI-PCI bridge> at device 23.4 on pci0
pcib24: <ACPI PCI-PCI bridge> at device 23.5 on pci0
pcib25: <ACPI PCI-PCI bridge> at device 23.6 on pci0
pcib26: <ACPI PCI-PCI bridge> at device 23.7 on pci0
pcib27: <ACPI PCI-PCI bridge> at device 24.0 on pci0
pci8: <ACPI PCI bus> on pcib27
ixv4: <Intel(R) X540 Virtual Function> mem 0xfd200000-0xfd203fff,0xfd204000-0xfd207fff at device 0.0 on pci8
ixv4: Using 2048 TX descriptors and 2048 RX descriptors
ixv4: Using 1 RX queues 1 TX queues
ixv4: Using MSI-X interrupts with 2 vectors
ixv4: allocated for 1 queues
ixv4: allocated for 1 rx queues
ixv4: Ethernet address: 00:0c:29:8e:79:a1
ixv4: netmap queues/slots: TX 1/2048, RX 1/2048
pcib28: <ACPI PCI-PCI bridge> at device 24.1 on pci0
pcib29: <ACPI PCI-PCI bridge> at device 24.2 on pci0
pcib30: <ACPI PCI-PCI bridge> at device 24.3 on pci0
pcib31: <ACPI PCI-PCI bridge> at device 24.4 on pci0
pcib32: <ACPI PCI-PCI bridge> at device 24.5 on pci0
pcib33: <ACPI PCI-PCI bridge> at device 24.6 on pci0
pcib34: <ACPI PCI-PCI bridge> at device 24.7 on pci0
Title: Re: Kernel panic after upgrade
Post by: franco on July 29, 2024, 01:07:00 PM
Hi,

I checked this stack trace didn't find any particular change that would to address this directly. Also nothing on bugs.freebsd.org either. Unsure how to proceed.


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: Hendre on July 29, 2024, 07:43:00 PM
Understood Franco, it's nasty one. I'll revert to 24.1 in the meantime as I've lost all IPV6 at the moment. Does it help if I share my tunables, maybe something in there?
Title: Re: Kernel panic after upgrade
Post by: Hendre on July 30, 2024, 10:21:06 PM
I upgraded ESXi to latest as well as ixgben drivers to latest version but kernel panic persists - seems to be a FreeBSD or OPNsense issue and not hardware related to IPV6.
Title: Re: Kernel panic after upgrade
Post by: DocGonzo74 on July 31, 2024, 12:44:19 AM
I ran into the exact same issue.  I ended up grabbing a config backup and re-installing Opnsense 24.7 from scratch, then doing a config import.   I was crashing randomly after 2-20 minutes before.  So far a couple of days since the change and it's still working.

Something corrupted with the upgrade process i suspect.. or goblins.
Title: Re: Kernel panic after upgrade
Post by: franco on July 31, 2024, 08:02:46 AM
The ixgbe driver which ixv is a part of appears to be the latest code in FreeBSD development version even today so I can't find a patch there. I also noticed that there are several problems related to PPPoE being prone to crashing in IPv6 environments in particular now on FreeBSD 14.1. This might be one side effect of it.

Does the crash occur when IPv6 mode on WAN is disabled (best changed before upgrade)?


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: Hendre on July 31, 2024, 08:13:44 AM
Disabling IPV6 on WAN does the trick, no more panics. I'll stick to it and happy to test any patches when they become available, thanks Franco!
Title: Re: Kernel panic after upgrade
Post by: franco on July 31, 2024, 08:20:58 AM
Thanks for trying that out. It's odd, but at least it's a start. There is one other panic that I'm looking at now. On the surface it doesn't have to do anything with the driver-related issue here, but as I said it's also PPPoE and IPv6 causing it.


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: franco on July 31, 2024, 11:14:16 AM
Just for fun I went digging and found something related. Can you try this kernel?

# opnsense-update -zkr 24.7_11


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: Hendre on July 31, 2024, 12:19:30 PM
Unfortunately no luck.

User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0
FreeBSD 14.1-RELEASE-p2 stable/24.7-n267769-2d516dec75e6 SMP amd64
OPNsense 24.7_9 0d38c7804
Plugins os-acme-client-4.4 os-crowdsec-1.0.8_1 os-ddclient-1.22 os-debug-1.5 os-haproxy-4.3_1 os-mdns-repeater-1.1_1 os-sensei-1.17.5 os-sensei-agent-1.17.5 os-sensei-updater-1.17 os-sunnyvalley-1.4_3 os-udpbroadcastrelay-1.0_4 os-vmware-1.5_1
Time Wed, 31 Jul 2024 13:18:45 +0200
OpenSSL 3.0.14
Python 3.11.9
PHP 8.2.20

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0105a6d220
vpanic() at vpanic+0x131/frame 0xfffffe0105a6d350
panic() at panic+0x43/frame 0xfffffe0105a6d3b0
vm_fault() at vm_fault+0x15af/frame 0xfffffe0105a6d4d0
vm_fault_trap() at vm_fault_trap+0x81/frame 0xfffffe0105a6d520
trap_pfault() at trap_pfault+0x1be/frame 0xfffffe0105a6d570
calltrap() at calltrap+0x8/frame 0xfffffe0105a6d570
--- trap 0xc, rip = 0xffffffff806b6b58, rsp = 0xfffffe0105a6d640, rbp = 0xfffffe0105a6d640 ---
ixv_if_multi_set_cb() at ixv_if_multi_set_cb+0x18/frame 0xfffffe0105a6d640
if_foreach_llmaddr() at if_foreach_llmaddr+0x5d/frame 0xfffffe0105a6d690
ixv_if_multi_set() at ixv_if_multi_set+0x45/frame 0xfffffe0105a6d9c0
iflib_if_ioctl() at iflib_if_ioctl+0x108/frame 0xfffffe0105a6da30
if_addmulti() at if_addmulti+0x41f/frame 0xfffffe0105a6dad0
in6_joingroup_locked() at in6_joingroup_locked+0x1d8/frame 0xfffffe0105a6dba0
ip6_setmoptions() at ip6_setmoptions+0xd66/frame 0xfffffe0105a6dd30
sosetopt() at sosetopt+0x96/frame 0xfffffe0105a6dd90
kern_setsockopt() at kern_setsockopt+0x9d/frame 0xfffffe0105a6dde0
sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe0105a6de00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe0105a6df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0105a6df30
--- syscall (105, FreeBSD ELF64, setsockopt), rip = 0x823f8493a, rsp = 0x820ccd848, rbp = 0x820ccd8c0 ---
KDB: enter: panic
Title: Re: Kernel panic after upgrade
Post by: Baender on July 31, 2024, 12:21:45 PM
Just a question, because I get the message below, too. Is it related to the problem?
cannot forward src fe80:5::a3ba:e66f:cc48:2823, dst 2a03:2880:f080:12:face:b00c:0:8e, nxt 6, rcvif ixv4, outif pppoe0
Title: Re: Kernel panic after upgrade
Post by: franco on July 31, 2024, 12:38:39 PM
@hendre thanks, bummer but I'll keep digging :)

@Baender just means that these android devices try to send to the internet without having a GUA assigned (it's stupid, don't ask).
Title: Re: Kernel panic after upgrade
Post by: Baender on July 31, 2024, 01:17:00 PM
Ich am sorry to ask, but I still don't know, if this triggers a kernel panik. Fun fact, the IPv6 is in my case from my computer, from which I access the firewall. It is a PC on Fedora.
Title: Re: Kernel panic after upgrade
Post by: franco on July 31, 2024, 01:19:12 PM
This wouldn't trigger a panic because that bad traffic is dropped pretty quickly because it cannot be forwarded (link  local cannot send to GUA). All it indicates is that some client devices on LAN fail to acquire IPv6 addresses.


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: Hendre on August 19, 2024, 12:47:23 PM
Hey Franco, what do you suggest as next steps please?
Title: Re: Kernel panic after upgrade
Post by: franco on August 19, 2024, 12:55:50 PM
Hey,

Try to gather the vmcore file using the 24.7.1 debug kernel:

# opnsense-update -zkr 24.7.1-dbg


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: Hendre on August 24, 2024, 01:30:54 PM
Unfortunately no luck installing the kernel. I tried 24.7.2 as well but same error.

sudo opnsense-update -zkr 24.7.1-dbg

Fetching kernel-24.7.1-dbg-amd64.txz: ..[fetch: https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/sets/kernel-24.7.1-dbg-amd64.txz.sig: Not Found] failed, no signature found
Title: Re: Kernel panic after upgrade
Post by: Hendre on August 24, 2024, 01:44:22 PM
Applying patch from this forum https://forum.opnsense.org/index.php?topic=42081.0 fixes the issue. Seeing if it holds over the next hours / days without panic.
Title: Re: Kernel panic after upgrade
Post by: furfix on August 24, 2024, 07:34:28 PM
I'm having same issue:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 5; apic id = 11
instruction pointer = 0x20:0xffffffff80d7c723
stack pointer         = 0x28:0xfffffe00c7148b90
frame pointer         = 0x28:0xfffffe00c7148bf0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (swi1: netisr 0)
rdi: fffff80001a73740 rsi: 000000000300000a rdx: 35b04bd7a137a137
rcx: ffffffff83a15000  r8: 000000000000c544  r9: 0000000000000005
rax: ffffffffff32ed00 rbx: 000000000000f023 rbp: fffffe00c7148bf0
r10: 000000000000000a r11: fffffe00219d2c30 r12: 000000000000c544
r13: fffff80001a73740 r14: fffffe00219d8c38 r15: 00000000020013ac
trap number = 9
panic: general protection fault
cpuid = 5
time = 1724463849
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c71488d0
vpanic() at vpanic+0x131/frame 0xfffffe00c7148a00
panic() at panic+0x43/frame 0xfffffe00c7148a60
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00c7148ac0
calltrap() at calltrap+0x8/frame 0xfffffe00c7148ac0
--- trap 0x9, rip = 0xffffffff80d7c723, rsp = 0xfffffe00c7148b90, rbp = 0xfffffe00c7148bf0 ---
in_pcblookup_hash_smr() at in_pcblookup_hash_smr+0x43/frame 0xfffffe00c7148bf0
in_pcblookup_mbuf() at in_pcblookup_mbuf+0x18/frame 0xfffffe00c7148c10
tcp_input_with_port() at tcp_input_with_port+0x4f6/frame 0xfffffe00c7148d80
tcp_input() at tcp_input+0xb/frame 0xfffffe00c7148d90
ip_input() at ip_input+0x268/frame 0xfffffe00c7148df0
swi_net() at swi_net+0x138/frame 0xfffffe00c7148e60
ithread_loop() at ithread_loop+0x257/frame 0xfffffe00c7148ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00c7148f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c7148f30
--- trap 0x83480824, rip = 0x4816ebc033047500, rsp = 0xc4834800000001b8, rbp = 0x4c89481024548948 ---
KDB: enter: panic
panic.txt0600003014662235351  7140 ustarrootwheelgeneral protection faultversion.txt0600007414662235351  7543 ustarrootwheelFreeBSD 14.1-RELEASE-p3 ixl_revert-n267779-6ca05616b9e9 SMP



A lot of:
<7>cannot forward src XXXXXXX, dst XXXXXXX, nxt 6, rcvif vlan0.20, outif pppoe0
<6>pid 34938 (php), jid 0, uid 0: exited on signal 10 (no core dump - bad address)


I can't test the Kernel in the PPPoE thread becase I already have a patched kernel :D but I will be happy to troubleshoot this one if needed.
Title: Re: Kernel panic after upgrade
Post by: doktornotor on August 24, 2024, 07:36:08 PM
Quote from: furfix on August 24, 2024, 07:34:28 PM
I can't test the Kernel in the PPPoE thread becase I already have a patched kernel :D but I will be happy to troubleshoot this one if needed.

Hmmm, the patches in that thread do not touch kernel at all.  ???
Title: Re: Kernel panic after upgrade
Post by: furfix on August 24, 2024, 08:14:23 PM
maybe the patch fixes something that is currently triggering the panic?
Title: Re: Kernel panic after upgrade
Post by: gillbot on August 24, 2024, 09:10:25 PM
Just updated my 4 port 2.5GBe mini PC with the latest update and it's bricked. If I use the CLI and revert to kernel ver2 of 2 instead of 1 I can get into the GUI but it won't let me do anything and won't connect to the internet. Seems DHCP is broken.
Title: Re: Kernel panic after upgrade
Post by: franco on August 24, 2024, 09:20:49 PM
People cross-posting all over with no information attached. I don't know how many times we've tried to say please do not. Make your own threads or find the exact match with the details you wanted to post.

@Hendre I followed up via mail.. I posted a garbled update command, but since your issue is gone I've given a few hints what it could have been.


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: doktornotor on August 24, 2024, 09:31:24 PM
Quote from: furfix on August 24, 2024, 08:14:23 PM
maybe the patch fixes something that is currently triggering the panic?

Well maybe - was my point. You can apply the PPPoE patches regardless of any patched kernel.
Title: Re: Kernel panic after upgrade
Post by: gillbot on August 24, 2024, 10:38:36 PM
Quote from: franco on August 24, 2024, 09:20:49 PM
People cross-posting all over with no information attached. I don't know how many times we've tried to say please do not. Make your own threads or find the exact match with the details you wanted to post.

@Hendre I followed up via mail.. I posted a garbled update command, but since your issue is gone I've given a few hints what it could have been.


Cheers,
Franco
I would love to provide more info but i'm not as versed as many on here and this is my first issue with opnsense. I've also tried to reinstall and DNS just will not work so nothing can be routed.
Title: Re: Kernel panic after upgrade
Post by: franco on August 24, 2024, 10:40:26 PM
Open a ticket, explain "bricked": does it boot, can you log in, do the firmware updates work, health audit ok, other things you want to say. Send me a PM with the link to the post so I will follow up there. Sometimes it's too busy to reply to all threads.


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: Hendre on August 30, 2024, 09:13:53 AM
I went back to my original 24.1 config on 24.7.3 and figured out udp broadcast relay plugin was somehow causing the panic. Disabled and installed plugin, now all is working perfectly on 24.7.3. This was tough but finally got there.

Thanks for the support Franco.
Title: Re: Kernel panic after upgrade
Post by: franco on August 30, 2024, 10:52:20 AM
udpbroadcastrelay is now at version 1.1 in 24.7.3, but this just as a side note.

I still suspect a kernel issue. I really need that vmcore file from the debug kernel.


Cheers,
Franco
Title: Re: Kernel panic after upgrade
Post by: Hendre on August 31, 2024, 02:05:42 PM
I'll find time for it, just happy I finally have a functional setup. Strangely udp broadcast relay plugin still shows 1.0 on 24.7.3 for me ...
Title: Re: Kernel panic after upgrade
Post by: franco on August 31, 2024, 02:12:00 PM
The plugin is not the third party package with the actual functional software. Therefore both have different versions.


Cheers,
Franco