CRASH in 19.1.8 when PPPOE refreshes

Started by cpw, May 23, 2019, 06:46:28 AM

Previous topic - Next topic
igb0@pci0:1:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb1@pci0:2:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb2@pci0:3:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb3@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet


Great. What is the problem with I211?

Note: I got a complete lockup of network today, nothing to do with PPPOE at all. Network just completely stopped working.

July 04, 2019, 03:27:06 PM #46 Last Edit: July 04, 2019, 03:34:24 PM by cpw
Update: I had the same crash as previously overnight.


<5>igb3: link state changed to DOWN
<5>igb3: link state changed to UP
panic: sbsndptr: sockbuf 0xfffff801779641b8 and mbuf 0xfffff8000c7fc600 clashing
cpuid = 3
__HardenedBSD_version = 1100056 __FreeBSD_version = 1102000
version = FreeBSD 11.2-RELEASE-p10-HBSD  5e5adf26fc3(stable/19.1)
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01ed2a0ed0
vpanic() at vpanic+0x17c/frame 0xfffffe01ed2a0f30
panic() at panic+0x43/frame 0xfffffe01ed2a0f90
sbsndptr() at sbsndptr+0xd5/frame 0xfffffe01ed2a0fa0
tcp_output() at tcp_output+0x1009/frame 0xfffffe01ed2a1140
tcp_do_segment() at tcp_do_segment+0x2af5/frame 0xfffffe01ed2a1240
tcp_input() at tcp_input+0xf5b/frame 0xfffffe01ed2a13a0
ip_input() at ip_input+0x141/frame 0xfffffe01ed2a1400
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe01ed2a1450
ether_demux() at ether_demux+0x140/frame 0xfffffe01ed2a1480
ng_ether_rcv_upper() at ng_ether_rcv_upper+0x8e/frame 0xfffffe01ed2a14a0
ng_apply_item() at ng_apply_item+0x163/frame 0xfffffe01ed2a1530
ng_snd_item() at ng_snd_item+0x2e7/frame 0xfffffe01ed2a1570
ng_apply_item() at ng_apply_item+0x163/frame 0xfffffe01ed2a1600
ng_snd_item() at ng_snd_item+0x2e7/frame 0xfffffe01ed2a1640
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe01ed2a1670
ether_nh_input() at ether_nh_input+0x289/frame 0xfffffe01ed2a16d0
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe01ed2a1720
ether_input() at ether_input+0x26/frame 0xfffffe01ed2a1740
vlan_input() at vlan_input+0x215/frame 0xfffffe01ed2a17f0
ether_demux() at ether_demux+0x129/frame 0xfffffe01ed2a1820
ether_nh_input() at ether_nh_input+0x336/frame 0xfffffe01ed2a1880
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe01ed2a18d0
ether_input() at ether_input+0x26/frame 0xfffffe01ed2a18f0
igb_rxeof() at igb_rxeof+0x721/frame 0xfffffe01ed2a1990
igb_msix_que() at igb_msix_que+0x117/frame 0xfffffe01ed2a19e0
intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe01ed2a1a20
ithread_loop() at ithread_loop+0xe7/frame 0xfffffe01ed2a1a70
fork_exit() at fork_exit+0x83/frame 0xfffffe01ed2a1ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01ed2a1ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt06000011113507354303  7214 ustarrootwheelsbsndptr: sockbuf 0xfffff801779641b8 and mbuf 0xfffff8000c7fc600 clashingversion.txt0600007013507354303  7533 ustarrootwheelFreeBSD 11.2-RELEASE-p10-HBSD  5e5adf26fc3(stable/19.1)


It looks like the queue workaround didn't work.

This bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213257 details why that "workaround" works for some. I guess it's not ALTQ here that's causing the problem?

OK, I've tried something new. One of the commonalities was that the DSL modem was "glitching" it's network connection - igb3 is directly connected to the modem in previous log traces. As soon as I disconnect the physical wire, or turn off and on the modem, I get the crash.

Anyway, I've put an old switch between modem and OPNsense box, lets see if this is more resilient now.

Be careful what kind of switch you put between the router NIC and the modem NIC. I used a cheap Cisco managed switch from ebay (2940/2950/2960 models, not the very latest models though), and it seems for some reason the PPPoE frames are not going through this setup. I created a new VLAN between the router WAN NIC and the modem, turned off Spanning tree, CDP, VTP etc. all the unnecessary junk, the interface config was basically empty apart from being set to static L2 access port in that specific VLAN. Not even PPPoE connect could be established. As soon I removed the UTP cable from the switch, and plugged it from the router straight to the modem, PPPoE session was established. All the glorious cisco CCIE experts colleagues around me couldnt figure out why a static L2 access port does not send through PPPoE frames (or at least they did not try it hard enough).
The LAN side of the router connecting to the same switch in another VLAN was working correctly for the rest of the  LAN clients

It's a very old linksys jobbie. It's about as dumb as a dumb switch gets. It seems to be working absolutely fine.

Well, in that case you are safe. My cisco did not allow PPPoE frame to pass-through, so the opnsense router constantly tried to connect, but the modem never received those packets at all.

I know its totally out of scope for this community, but should I try to open a new thread for this tricky subject. Or all people will just ignore it here?

Interestingly, I get the same error (using pppoe) when trying to move the WAN interface from em3 (intel 4 port card) to the inbuilt realtek adapter with ppoe profile.

I will try the suggestions here first, but just wanted to let you know that you are not alone!

So, it sometimes it's just realtek, sometimes the Q# of nic :/

Threads like this unencourage me to put my new Qotom-box with opnsense in place.

I have six i211 in my box, as I've heard there are no problems with Intel NICs.

Well I do know, people tend to talk about problems in forums, but are there any good stories about using pppoe with opnsense?

I cannot afford being cutoff from my mailserver when I am travelling....

Quote from: donald24 on August 13, 2019, 08:24:58 PM

Well I do know, people tend to talk about problems in forums, but are there any good stories about using pppoe with opnsense?

I cannot afford being cutoff from my mailserver when I am travelling....

No stroys are good storys since everyone who posts a thread usually has a problem. Never had issues with pppoe the last year ..

This is an ongoing and severe issue for me.  I've tried a bunch of potential fixes mentioned in this and some other threads but nothing has worked.  The moment my PPPoE refreshes my OPNsense self-immolates and requires a hard reboot.  I've tried this on a fresh install and it is the same.  I have yet to upgrade to 19.7, I'll likely try this shortly and see what happens.  If it's still an issue I'll have to jump ship for another option.

September 10, 2019, 02:41:04 AM #56 Last Edit: September 14, 2019, 06:02:48 AM by flames
I would like to add, with virtualized OPNSense 19.7.3 i have the same issue (since 17.x).
Hardware:...
Proxmox 5.1.x - 6.0.6, MultiWAN, Virtio and Intel E1000 virtual NICs tested (not tested with virtualized Realtek or Vmxnet3)
The physical nodes tested HP DL580g7, DL380g6/g7/g8, DL360g8/g9
Physical NICs: onboard (whatever those nodes have: broadcom, netxen, intel) + different intel pci-e cards (currently nodes are running with intel x520, tested with x540 and some other cards, also with some silly cards like chelsio, which have different issues at all).
as soon as PPPoE reloads -> crash (reproduce: interfaces / overview / wan / reload OR disconnect)
since my ISP is not forcing regularly reconnects, its not that critical, but still painful :)

Edit: changed the ISP to one w/o pppoe.

Quote from: JDtheHutt on August 17, 2019, 09:41:35 PM
This is an ongoing and severe issue for me.  I've tried a bunch of potential fixes mentioned in this and some other threads but nothing has worked.  The moment my PPPoE refreshes my OPNsense self-immolates and requires a hard reboot.  I've tried this on a fresh install and it is the same.  I have yet to upgrade to 19.7, I'll likely try this shortly and see what happens.  If it's still an issue I'll have to jump ship for another option.

I upgraded to latest 19.7 version and still occurring, so definitely not fixed there.  I have tried some suggested fixes here and on other BSD related forums and nothing has worked so far.

The problem still persists on my machine  :'(

As already mentioned here, the panic is often triggered when using the "disconnect/connect" buttons in the webgui. Furthermore, the same occurs when the cron job performs a "periodic interface reset". So, I disabled the related cron job to check, whether its more stable.
OPNsense 24.7.11_2-amd64

This problem is occurring here with the 19.7.4_1 too!

The crash occurs when:

-The cable from the PPPoE modem is manually disconnected
-The modem is turned off
-The Interfaces/Overview Disconnect button is pressed

Cloudfence Open Source Team