Setup 2 Wireguard site2site tunnels - kernel panic

Started by mcouture, January 23, 2023, 01:02:59 PM

Previous topic - Next topic
I setup 2 Wireguard site to site tunnels yesterday and everything works as advertised.   I can see both sides of each tunnel.   So far so good right?

Within a couple hours, OpnSense would kernel panic and reboot.   Then within an hour panic again....then again....then again.

I'm running OpnSense in a VM under ProxMox, using KVM64 as the device.    My hardware is a TopCon 6port all-in-one device.     I have 3 more of these devices all running OpnSense under ProxMox without issue.

I was thinking of trying the KMOD version of Wireguard but unsure....thoughts?

...also, since I got these kernel panics, I finally decided to disable Wireguard in the GUI.

After disabling Wireguard, the system has been stable...



Then same with kmod? Got a kernel stack trace? Tried disabling shared forwarding?


Cheers,
Franco

I haven't tried the kmod version yet.     I have just found other threads which seem to be close to what I'm seeing but not sure yet.


kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x3b6
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80cdb671
stack pointer           = 0x28:0xfffffe000378fa70
frame pointer           = 0x28:0xfffffe000378fa90
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 12 (swi1: netisr 0)
trap number             = 12
timeout stopping cpus
panic: page fault
--More--(51%)cpuid = 1
time = 1674353202
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000378f830
vpanic() at vpanic+0x17f/frame 0xfffffe000378f880
panic() at panic+0x43/frame 0xfffffe000378f8e0
trap_fatal() at trap_fatal+0x385/frame 0xfffffe000378f940
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000378f9a0
calltrap() at calltrap+0x8/frame 0xfffffe000378f9a0
--- trap 0xc, rip = 0xffffffff80cdb671, rsp = 0xfffffe000378fa70, rbp = 0xfffffe000378fa90 ---
tdq_notify() at tdq_notify+0x31/frame 0xfffffe000378fa90
sched_add() at sched_add+0x25c/frame 0xfffffe000378fad0
intr_event_schedule_thread() at intr_event_schedule_thread+0xb8/frame 0xfffffe000378fb00
swi_sched() at swi_sched+0x6b/frame 0xfffffe000378fb40
pfsync_update_state() at pfsync_update_state+0x29d/frame 0xfffffe000378fb90
pf_test() at pf_test+0xfbe/frame 0xfffffe000378fd00
pf_check_in() at pf_check_in+0x25/frame 0xfffffe000378fd20
pfil_run_hooks() at pfil_run_hooks+0x97/frame 0xfffffe000378fd60
ip_input() at ip_input+0x759/frame 0xfffffe000378fdf0
swi_net() at swi_net+0x13e/frame 0xfffffe000378fe60
ithread_loop() at ithread_loop+0x25a/frame 0xfffffe000378fef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000378ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000378ff30
--- trap 0x80d09430, rip = 0xffffffff80c313af, rsp = 0, rbp = 0 ---
mi_startup() at mi_startup+0xdf
KDB: enter: panic

If close enough counts the culprit seems to be pfsync. There are patches in the pipeline from FreeBSD, but they haven't hit 23.1 yet which is going to be released this week. 22.7 will not receive any more kernel patching for sure.


Cheers,
Franco

IF this is what it is, what type of workaround do you suggest in the meantime?

I can try turning off "shared forwarding" and/or kmod if you think this would help.


Curious what type of nic you're running, Chelsio by any chance?
Wireguard does not like Chelsio for some reason but it is fixed in newer BSD versions.

Quote from: Demusman on January 23, 2023, 05:36:56 PM
Curious what type of nic you're running, Chelsio by any chance?
Wireguard does not like Chelsio for some reason but it is fixed in newer BSD versions.
I'm running Intel i226.   8 port on this box that keeps panicking.   It's running as virtual nic in ProxMox however.   

I have 2 other nodes that aren't panicking and still have wireguard module enabled..(both are i225 and i226 nics)just no VPN traffic running on them as they are the site to site endpoints to this node that I am having trouble with.   


Sent from my iPhone using Tapatalk

FYI,

I re-enabled the Wireguard plug-in and turned off "Shared Forwarding".   

I also realized that this Proxmox host was not running Kernel 6.1 as my others are.   So upgraded the kernel.

I am currently at 8 hours of uptime without error.    Will continue to monitor this situation...