CRASH in 19.1.8 when PPPOE refreshes

Started by cpw, May 23, 2019, 06:46:28 AM

Previous topic - Next topic
We know the panic occurs while disconnecting the pppoe interface. Together with the stack trace and trap number 12 (page fault while in kernel mode) it looks for me like a race condition in the kernel which results in accessing an invalid pointer.

A similar reported but already fixed bug mentioned the same (missing locks to synchronize smp).
OPNsense 24.7.11_2-amd64

Diving the kernel bug db, I see a couple of things that pop up.
1. I got a new more explicit panic today:
sbsndptr: sockbuf 0xfffff800bc34e878 and mbuf 0xfffff80034f66500 clashing
2. Looking at the kernel bug reports, I found a couple of interesting ones: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=148807 and https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218270

The latter suggests setting hw.igb.num_queues=1, so I'm going to try that. It also seems that several people in the former bug are seeing problems related to ipv6. I wonder if the combo of ipv6 that does but also doesn't work and bouncing pppoe might be the magic sauce to make this crash.

For testing purposes it may help to disable smp (simultaneous multiprocessing). The idea behind is to mitigate concurrency within kernel mode.


  • Disable smp: loader tunable kern.smp.disabled=1
  • Disable specific cpu: loader tunable hint.lapic.X.disabled with "X" as the apic id of the cpu

Further details, see here
OPNsense 24.7.11_2-amd64

@cpw: Did you make any progress in testing?
OPNsense 24.7.11_2-amd64

I'll be working on this the next day or so. I don't want to kill the network while others are using it ;)

So, a small, positive (maybe?) update on this. I set hw.igb.num_queues to 1 in the tunables section. It seems the box has remained up, over an extended period, including a full reset of the pppoe connection. I am not 100% confident yet, but this seems like a massive improvement relative to where I was previously.


June 30, 2019, 08:03:18 PM #37 Last Edit: June 30, 2019, 08:07:14 PM by JDtheHutt
I have been experiencing this issue for a while now, and I also use PPPoE.  It's been driving me crazy and I lack the technical experience to solve it myself.  I thought it was due to my use of the earlier Wireguard packages, or maybe my hardware was faulty.  However, I returned to OpenVPN and removed WG, and did a fresh install on top of that, and also tested my hardware and didn't see any faults occurring there, but the issue has kept occurring.

I have also set the tunable as detailed by cpw and I'll report back in a few days as to how it is going.  I usually see at least one kernel panic a day, sometimes multiple, so I should know quite quickly.

I've stayed up for 50 hours without failure, however my PPPoE connection has not reset during that. I forced a reload of my PPPoE and OPNsense immediately died and required a reboot. So at least I know it is due to PPPoE, but that tunable has not fixed it.

Quote from: schnipp on June 21, 2019, 03:19:36 PM
For testing purposes it may help to disable smp (simultaneous multiprocessing). The idea behind is to mitigate concurrency within kernel mode.


  • Disable smp: loader tunable kern.smp.disabled=1
  • Disable specific cpu: loader tunable hint.lapic.X.disabled with "X" as the apic id of the cpu

Further details, see here

I'll try this next and report back in a few days

JDTheHutt are you using realtek NICs, or intel NICs? That tuneable only affects intel NICs.

I am using Intel NICs. I use a Supermicro X10SBA board and just went to their support page to confirm just in case I was mistaken.


Quote from: mimugmail on July 02, 2019, 07:18:43 PM
Intel I211?

Intel i210AT is what is listed for them. I hope you're not about to tell me that those are bored in BSD!

No, there's also a system known to stop service cause of 211, but not for 210.