OPNsense Forum

Archive => 19.1 Legacy Series => Topic started by: cpw on May 23, 2019, 06:46:28 am

Title: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on May 23, 2019, 06:46:28 am
Hi

So, I've been struggling to get my system stable - I have jerry rigged some crazy cooling, to try and regulate temperature, and cannibalized an old laptop for memory sticks, but the system still crashes regularly.

Conincidentally, it always crashes right after the PPPOE interface is being rebuilt (due to an ISP dropped line glitch perhaps?). The error is exactly as is described in this forum thread: https://forum.opnsense.org/index.php?topic=5697.0 from a couple of years ago, and "closed with solution".

Is there a weird fundamental incompatibility with some hardware that is triggered by the PPPOE activity? That's pretty :o

Anyway, I don't know if I should file a bug, or where I should file such a bug. I've clicked the submit report button a few times to send you the full details.

One thing I noticed - it doesn't always crash the box. One time, the box got stuck in a weird state where it had no network connectivity at all, but didn't reboot or do anything. It just sat. Very odd.

I can share precise hardware details if you need them. This hardware is the ultimate botch job, but is a proof of concept before I invest real money in quality hardware.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on May 23, 2019, 11:56:18 am
What happens when you fix the cooling with a ventilator in front?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on May 23, 2019, 01:18:14 pm
No change. I have two big fans, according to cputemp stats it never runs more than 10C above ambient now. It's crashed twice since, both times when pppoe refreshed.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 04, 2019, 07:31:22 pm
Update: it crashed again today, as soon as the PPPOE connection reset. The correlation is exact and causal. It caused a 5 minute outage while OPNSense rebooted. This seems to be a fairly critical flaw - I would never expect a simple activity like bouncing a PPPOE interface to cause a complete fatal crash of the OS layer.

Sadly, it is not without resultant corruption as well. It seems that all the RRD reports (netflow, health, netdata) have lost all data as well. It also appears netflow hasn't properly restarted after the crash.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 04, 2019, 09:55:27 pm
Update update: another DSL connection wobble, another 5 minute reboot of the OPN sense firewall. Is there any idea how we could fix this? I'm happy to help diagnose, it seems that all I need do is disconnect the DSL modem temporarily. It is extremely frustrating to have repeated outages on something that's supposed to be extremely reliable, and was, until I chose OPNsense.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 04, 2019, 10:00:00 pm
What does the console say? Any stack traces? What hardware is this? I ran OPNsense on so many devices, never had any issue like this.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 04, 2019, 10:13:13 pm
Yup. There is always a core message, before the firewall completely restarts. I can't seem to find it in any of the log files anymore.

The hardware is "Intel(R) Atom(TM) CPU D2700 @ 2.13GHz (4 cores)". It has 8GB of memory (barely using 1G according to the dashboard).

It's an old zotac mini PC. I've equipped it with two additional ethernet devices via a mini PCIE card. As I say, it seems to work 100% reliably, except when the PPPOE disconnects/reconnects. I have a multiwan setup, with both DSL/PPPOE and non-DSL/DHCP services upstream (each on a physical NIC) as well as a segmented LAN with 6 VLANs.

Everything works fine, except when the PPPOE restarts - due to dropped connection upstream, I believe - it's very windy today and the phone line is a little weather vulnerable.

I'll get the error report next time it crashes or trigger one manually shortly, so I can give you the exact kernel core dump message.

Note: the error is the exact same kernel crash as identified in the thread I linked at the top. Nothing in the thread has helped, however.


Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 05, 2019, 06:51:25 am
Can you capture the screen an make a video? You need to find the reason. My guess is PCI card causes BSD to crash. Then we can search for tunings maybe. If it doesnt work out at all I see 3 options:
Replace Hardware
Replace OPNsense
Let Modem do the dialin and OPNsense behind
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 05, 2019, 02:01:00 pm
Here's the crashdump from the last crash. It looks like it happened, again, last night.

Code: [Select]
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address = 0x188
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80d253ac
stack pointer         = 0x28:0xfffffe022f196940
frame pointer         = 0x28:0xfffffe022f196990
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (swi5: fast taskq)

What would you like me to video? It crashing and rebooting? There's not much to show, it just flashes the error and then goes to the boot screen.

Why would a particular PCI card cause a complete hard crash? It seems it's specifically related to something inside the kernel, that only happens when the PPPOE restarts. The uptime was nearly 2 weeks, when the PPPOE recycled and took it out yesterday. There is a definite cause/effect relationship here, and it's not hardware, that I can tell?

Replacing hardware isn't an option right now, because $$$
Replacing software: I'm reluctantly looking for alternatives at present.
Modem handoff: I've looked into it. It looks like if I do that, I have to deal with 99 other flavours of awful that my modem "provides" as well.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 05, 2019, 02:12:15 pm
No, it's a relation between (3 of 5):

- FreeBSD
- PPPoE software (mpd5)
- Hardware
- Maybe ISP sending strange packets

There are thousands of PPPoE installations out there without this problem.

Now it's time to exclude one after another to find the problematic combination.

The easiest one is installing pfSense, if this also crashes and it doesn't happen with e.g IPFire, it's FreeBSD
Next one is using different hardware, but OPN and your provider, perhaps you can borrow some piece of hardware and test. If it happens, it's not your hardware, if not, it has something to do with your hardware.

And so on ..
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: keropiko on June 05, 2019, 02:13:22 pm
I run pppoe with opnsense. It happened that i wanted to change public ip, clicked the "reload" button on the wan interfaces overview page and all od sudden it crashed immediately. running 19.1.8.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 05, 2019, 02:15:24 pm
I run pppoe with opnsense. It happened that i wanted to change public ip, clicked the "reload" button on the wan interfaces overview page and all od sudden it crashed immediately. running 19.1.8.

Then you should open a new thread, exact hardware, if it happens on every reload, system.log / ppps.log and stack trace while crashing.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 05, 2019, 03:12:56 pm
No, it's a relation between (3 of 5):

- FreeBSD
- PPPoE software (mpd5)
- Hardware
- Maybe ISP sending strange packets

There are thousands of PPPoE installations out there without this problem.
Agree that this is unusual. I don't believe it's unique, however. There are clear reports of others with this issue, spanning several years.

Now it's time to exclude one after another to find the problematic combination.

The easiest one is installing pfSense, if this also crashes and it doesn't happen with e.g IPFire, it's FreeBSD

OK. I can probably pull that off fairly easily, assuming I can recover OPNsense configuration.

I do know that the previous (not same hardware) Linux setup handling PPPOE, never experienced this issue in several years of running.

I believe that rules out the ISP being weird?


Next one is using different hardware, but OPN and your provider, perhaps you can borrow some piece of hardware and test. If it happens, it's not your hardware, if not, it has something to do with your hardware.

And so on ..

Are hardware compatibility problems like this prevalent in the BSD community? I mean, I'm not a fan of spending many many days troubleshooting an issue to find that I have to invest hundreds (or thousands) of dollars in a hoped-for resolution. I'm a Linux guy, and I've not seen behaviour like this since the really early days of Linux (like 1995-8 or so).
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 05, 2019, 05:05:35 pm
- When it spans over serveral years, what does that mean? Only a few in several year, or it still wasn't fixed by upstream? Or provider-related. Honestly, I'm unsure

- Just install pfSense, put in user/pw and you're good, there's no real config import, it costs time for sure. That linux on different hardware works doesn't rule out provider since it must be a combination of it. You can also give IPFire a shot, it's not that hard to set up.

- I'm also not a fan of testing things for days/ages .. but when all ppl think this way the problem will not be solved (and it seems noone was digging deeper successfully). So there's also no reason for posts like "it's now on 19.1.8 and still crashes". Only the ppl affected can help to investigate since it's to hard to guess from remote, sorry.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 05, 2019, 06:04:19 pm
- When it spans over serveral years, what does that mean? Only a few in several year, or it still wasn't fixed by upstream? Or provider-related. Honestly, I'm unsure

I doubt it's provider related. I can't see much in common between a Canadian ISP and a german ISP (the previous linked issue seemed to be german?)

- Just install pfSense, put in user/pw and you're good, there's no real config import, it costs time for sure. That linux on different hardware works doesn't rule out provider since it must be a combination of it. You can also give IPFire a shot, it's not that hard to set up.

I'll try running them from a usbstick on the same hardware. That way I can do the test without wiping out OPNsense, hopefully.

- I'm also not a fan of testing things for days/ages .. but when all ppl think this way the problem will not be solved (and it seems noone was digging deeper successfully). So there's also no reason for posts like "it's now on 19.1.8 and still crashes". Only the ppl affected can help to investigate since it's to hard to guess from remote, sorry.

That's why I'm still here. I'm not going to walk away immediately. But I don't have $ to invest in potential solutions right now, and I can't tolerate week long investigation outages. But point tests, they're quite doable.

I've seen various reports over the years of this issue happening, and people seem to have narrowed it down in the past to the 'interface rename' that occurs at the kernel level. I can see from the logs I have collected so far, that that seems to correspond to my issue as well. I suspect that "hardware" is a factor only insofar as speed of said hardware is a factor. This isn't a speedy computer. Hardware slowness often reveals hidden race conditions.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: franco on June 05, 2019, 09:42:49 pm
I'll chime in here as well. There's no developer that has a PPPoE connection, so when somebody tries to help it's just me doing remote debugging sessions and IRC talks and discussing on GitHub and auditing and improving interface code bottom-up.

It's a problem for PPPoE for sure. There are better options out there for sure.

What is needed is just one person to step up and fix this for everyone. :/


Cheers,
Franco
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 05, 2019, 10:00:27 pm
I'm happy to help. Suggestions for how to diagnose would be greatly appreciated. I can supply whatever crash reports I get (I've been submitting them regularly through the reporter tool). Is there anything I can turn on to get a better picture of the situation, such as debugging flags?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 05, 2019, 10:18:49 pm
Is this a realtek NIC?
https://github.com/opnsense/core/issues/3227
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 06, 2019, 01:25:15 am
Looks like it.

Code: [Select]
re0@pci0:1:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x06 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
re1@pci0:6:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x07 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
re2@pci0:7:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x07 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

That's all 3 NICs in this box. I don't have the luxury of new hardware right now. Do I just put up with frequent outages?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 06, 2019, 03:14:03 am
I've added a couple of tunables that others have noted can have an impact:

Code: [Select]
hw.re.msi_disable 1
hw.re.msix_disable 1
They seem to have taken effect (no log message about MSIX anymore).

Let's see if the stability improves.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 06, 2019, 02:52:54 pm
I've added a couple of tunables that others have noted can have an impact:

Code: [Select]
hw.re.msi_disable 1
hw.re.msix_disable 1
They seem to have taken effect (no log message about MSIX anymore).

Let's see if the stability improves.

It did not.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 06, 2019, 03:23:21 pm
Do you run IDS/IPS or Shaper?

https://github.com/opnsense/core/issues/1481

Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 06, 2019, 04:27:42 pm
No, I am not.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: schnipp on June 06, 2019, 06:29:07 pm
Today, I have updated my opnsense from version 19.1.7 to 19.1.9. After trying to re-establish the pppoe connection (testing the pppoe reconnect bug patch (https://github.com/opnsense/core/issues/2267)) my system also crashed.

But this only occured when using the "disconnect/connect" buttons in the webgui and not when re-establishing the pppoe connection using the system console by sending the right system signal to the mpd5 daemon process. It seems like a kernel bug, when removing a lock from a filedescripter of a closing socket or a bug somewhere in the call stack:

db:0:kdb.enter.default>  bt
Tracing pid 52650 tid 100123 td 0xfffff80012870000
sbcut_internal() at sbcut_internal+0x40/frame 0xfffffe023697a710
sbdestroy() at sbdestroy+0x28/frame 0xfffffe023697a730
sofree() at sofree+0x123/frame 0xfffffe023697a760
soclose() at soclose+0x35a/frame 0xfffffe023697a7b0
closef() at closef+0x251/frame 0xfffffe023697a840
closefp() at closefp+0x99/frame 0xfffffe023697a880
amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe023697a9b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe023697a9b0


@cpw: what occurs when calling "kill -s USR2 <pid>" and subsequently "kill -s USR1 <pid>" on the console?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 07, 2019, 03:43:36 pm
My kernel crash has always been around the rename of the temporary pppoe interface to <pppoe0>.

It looks like the connection works fine, but the realtek driver just falls over when that happens, but only after the first time (otherwise, trivially, it'd never have worked at all).

It's clearly an invalid pointer somehow in the logic of the interface driver, but it could be the ppp daemon or the driver causing it.

I'll try 19.1.9 now, see what happens.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 20, 2019, 02:07:00 pm
Update: completely new hardware, using Intel nics. Fundamentally the same exact crash happened last night. Pppoe connection dropped, box crashed during reconnect attempt. Guess we can rule out realtek nics. I wonder if Amazon will refund me my $$$
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 20, 2019, 02:15:12 pm
Link to Amazon product please
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 20, 2019, 03:29:14 pm
https://www.amazon.ca/gp/product/B074PK8ZVG

Happened again about 10 minutes ago. PPPOE wobbled, box crashed.

Hypothesis - it seems that the problem happens because it's trying to route a packet to the now dead pppoe interface, and crashes with a kernel segfault? Is it possible that the routing system can't cope with a dead PPPOE interface?

Latest crash report attached from a few minutes ago. Same Trap 12 error. Note: PPPOE interface is atop igb2 now. igb0 is my LAN (with VLANs) and igb3 is the direct cable connection.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 20, 2019, 08:56:56 pm
This is really strange, this device is known to work perfect for *sense. Next would be to test against Vanilla FreeBSD. But may be worth to exchange it back.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 20, 2019, 09:15:57 pm
Clearly it's not. It's crashed again this afternoon. I think the problem lies in the PPPoE somewhere. The crash is associated with the PPPoE interface resetting, due to external factors (noise on the DSL line probably).

I have no idea how I can troubleshoot, but it's really frustrating. OPNsense would be pretty much spot on, were it not for the very poor reliability (I'm running at about an average of 2 days uptime, though the outages cluster).

I'm curious what "vanilla BSD" would tell you? I mean, it wouldn't be a functional router in that state. But if you have a "livecd" I can run from USB stick, I'll happily give it a try, see if I can reproduce in that state (I'm pretty sure just pulling the network cable from my DSL modem will cause the problem).
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: schnipp on June 20, 2019, 10:42:34 pm
We know the panic occurs while disconnecting the pppoe interface. Together with the stack trace and trap number 12 (page fault while in kernel mode) it looks for me like a race condition in the kernel which results in accessing an invalid pointer.

A similar reported but already fixed bug (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220076) mentioned the same (missing locks to synchronize smp).
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 21, 2019, 04:04:53 am
Diving the kernel bug db, I see a couple of things that pop up.
1. I got a new more explicit panic today:
Code: [Select]
sbsndptr: sockbuf 0xfffff800bc34e878 and mbuf 0xfffff80034f66500 clashing2. Looking at the kernel bug reports, I found a couple of interesting ones: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=148807 and https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218270

The latter suggests setting hw.igb.num_queues=1, so I'm going to try that. It also seems that several people in the former bug are seeing problems related to ipv6. I wonder if the combo of ipv6 that does but also doesn't work and bouncing pppoe might be the magic sauce to make this crash.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: schnipp on June 21, 2019, 03:19:36 pm
For testing purposes it may help to disable smp (simultaneous multiprocessing). The idea behind is to mitigate concurrency within kernel mode.


Further details, see here (https://www.freebsd.org/cgi/man.cgi?smp(4))
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: schnipp on June 24, 2019, 10:04:11 pm
@cpw: Did you make any progress in testing?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 25, 2019, 03:00:16 am
I'll be working on this the next day or so. I don't want to kill the network while others are using it ;)
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on June 30, 2019, 02:56:08 pm
So, a small, positive (maybe?) update on this. I set hw.igb.num_queues to 1 in the tunables section. It seems the box has remained up, over an extended period, including a full reset of the pppoe connection. I am not 100% confident yet, but this seems like a massive improvement relative to where I was previously.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on June 30, 2019, 04:20:11 pm
Nice, good progress!
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on June 30, 2019, 08:03:18 pm
I have been experiencing this issue for a while now, and I also use PPPoE.  It's been driving me crazy and I lack the technical experience to solve it myself.  I thought it was due to my use of the earlier Wireguard packages, or maybe my hardware was faulty.  However, I returned to OpenVPN and removed WG, and did a fresh install on top of that, and also tested my hardware and didn't see any faults occurring there, but the issue has kept occurring.

I have also set the tunable as detailed by cpw and I'll report back in a few days as to how it is going.  I usually see at least one kernel panic a day, sometimes multiple, so I should know quite quickly.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on July 02, 2019, 12:07:14 pm
I've stayed up for 50 hours without failure, however my PPPoE connection has not reset during that. I forced a reload of my PPPoE and OPNsense immediately died and required a reboot. So at least I know it is due to PPPoE, but that tunable has not fixed it.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on July 02, 2019, 12:08:58 pm
For testing purposes it may help to disable smp (simultaneous multiprocessing). The idea behind is to mitigate concurrency within kernel mode.

  • Disable smp: loader tunable kern.smp.disabled=1
  • Disable specific cpu: loader tunable hint.lapic.X.disabled with "X" as the apic id of the cpu

Further details, see here (https://www.freebsd.org/cgi/man.cgi?smp(4))

I'll try this next and report back in a few days
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on July 02, 2019, 04:10:24 pm
JDTheHutt are you using realtek NICs, or intel NICs? That tuneable only affects intel NICs.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on July 02, 2019, 06:17:57 pm
I am using Intel NICs. I use a Supermicro X10SBA board and just went to their support page to confirm just in case I was mistaken.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on July 02, 2019, 07:18:43 pm
Intel I211?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on July 02, 2019, 09:16:32 pm
Intel I211?

Intel i210AT is what is listed for them. I hope you're not about to tell me that those are bored in BSD!
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on July 02, 2019, 09:55:39 pm
No, there's also a system known to stop service cause of 211, but not for 210.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on July 03, 2019, 05:35:07 pm
Code: [Select]
igb0@pci0:1:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb1@pci0:2:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb2@pci0:3:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
igb3@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

Great. What is the problem with I211?

Note: I got a complete lockup of network today, nothing to do with PPPOE at all. Network just completely stopped working.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on July 04, 2019, 03:27:06 pm
Update: I had the same crash as previously overnight.

Code: [Select]
<5>igb3: link state changed to DOWN
<5>igb3: link state changed to UP
panic: sbsndptr: sockbuf 0xfffff801779641b8 and mbuf 0xfffff8000c7fc600 clashing
cpuid = 3
__HardenedBSD_version = 1100056 __FreeBSD_version = 1102000
version = FreeBSD 11.2-RELEASE-p10-HBSD  5e5adf26fc3(stable/19.1)
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01ed2a0ed0
vpanic() at vpanic+0x17c/frame 0xfffffe01ed2a0f30
panic() at panic+0x43/frame 0xfffffe01ed2a0f90
sbsndptr() at sbsndptr+0xd5/frame 0xfffffe01ed2a0fa0
tcp_output() at tcp_output+0x1009/frame 0xfffffe01ed2a1140
tcp_do_segment() at tcp_do_segment+0x2af5/frame 0xfffffe01ed2a1240
tcp_input() at tcp_input+0xf5b/frame 0xfffffe01ed2a13a0
ip_input() at ip_input+0x141/frame 0xfffffe01ed2a1400
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe01ed2a1450
ether_demux() at ether_demux+0x140/frame 0xfffffe01ed2a1480
ng_ether_rcv_upper() at ng_ether_rcv_upper+0x8e/frame 0xfffffe01ed2a14a0
ng_apply_item() at ng_apply_item+0x163/frame 0xfffffe01ed2a1530
ng_snd_item() at ng_snd_item+0x2e7/frame 0xfffffe01ed2a1570
ng_apply_item() at ng_apply_item+0x163/frame 0xfffffe01ed2a1600
ng_snd_item() at ng_snd_item+0x2e7/frame 0xfffffe01ed2a1640
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe01ed2a1670
ether_nh_input() at ether_nh_input+0x289/frame 0xfffffe01ed2a16d0
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe01ed2a1720
ether_input() at ether_input+0x26/frame 0xfffffe01ed2a1740
vlan_input() at vlan_input+0x215/frame 0xfffffe01ed2a17f0
ether_demux() at ether_demux+0x129/frame 0xfffffe01ed2a1820
ether_nh_input() at ether_nh_input+0x336/frame 0xfffffe01ed2a1880
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe01ed2a18d0
ether_input() at ether_input+0x26/frame 0xfffffe01ed2a18f0
igb_rxeof() at igb_rxeof+0x721/frame 0xfffffe01ed2a1990
igb_msix_que() at igb_msix_que+0x117/frame 0xfffffe01ed2a19e0
intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe01ed2a1a20
ithread_loop() at ithread_loop+0xe7/frame 0xfffffe01ed2a1a70
fork_exit() at fork_exit+0x83/frame 0xfffffe01ed2a1ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01ed2a1ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt06000011113507354303  7214 ustarrootwheelsbsndptr: sockbuf 0xfffff801779641b8 and mbuf 0xfffff8000c7fc600 clashingversion.txt0600007013507354303  7533 ustarrootwheelFreeBSD 11.2-RELEASE-p10-HBSD  5e5adf26fc3(stable/19.1)

It looks like the queue workaround didn't work.

This bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213257 details why that "workaround" works for some. I guess it's not ALTQ here that's causing the problem?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on July 05, 2019, 05:16:55 am
OK, I've tried something new. One of the commonalities was that the DSL modem was "glitching" it's network connection - igb3 is directly connected to the modem in previous log traces. As soon as I disconnect the physical wire, or turn off and on the modem, I get the crash.

Anyway, I've put an old switch between modem and OPNsense box, lets see if this is more resilient now.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: Ricardo on July 05, 2019, 11:18:56 am
Be careful what kind of switch you put between the router NIC and the modem NIC. I used a cheap Cisco managed switch from ebay (2940/2950/2960 models, not the very latest models though), and it seems for some reason the PPPoE frames are not going through this setup. I created a new VLAN between the router WAN NIC and the modem, turned off Spanning tree, CDP, VTP etc. all the unnecessary junk, the interface config was basically empty apart from being set to static L2 access port in that specific VLAN. Not even PPPoE connect could be established. As soon I removed the UTP cable from the switch, and plugged it from the router straight to the modem, PPPoE session was established. All the glorious cisco CCIE experts colleagues around me couldnt figure out why a static L2 access port does not send through PPPoE frames (or at least they did not try it hard enough).
The LAN side of the router connecting to the same switch in another VLAN was working correctly for the rest of the  LAN clients
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: cpw on July 05, 2019, 06:55:10 pm
It's a very old linksys jobbie. It's about as dumb as a dumb switch gets. It seems to be working absolutely fine.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: Ricardo on July 06, 2019, 10:38:41 am
Well, in that case you are safe. My cisco did not allow PPPoE frame to pass-through, so the opnsense router constantly tried to connect, but the modem never received those packets at all.

I know its totally out of scope for this community, but should I try to open a new thread for this tricky subject. Or all people will just ignore it here?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: spants on July 11, 2019, 02:25:48 pm
Interestingly, I get the same error (using pppoe) when trying to move the WAN interface from em3 (intel 4 port card) to the inbuilt realtek adapter with ppoe profile.

I will try the suggestions here first, but just wanted to let you know that you are not alone!
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on July 11, 2019, 05:00:10 pm
So, it sometimes it's just realtek, sometimes the Q# of nic :/
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: donald24 on August 13, 2019, 08:24:58 pm
Threads like this unencourage me to put my new Qotom-box with opnsense in place.

I have six i211 in my box, as I've heard there are no problems with Intel NICs.

Well I do know, people tend to talk about problems in forums, but are there any good stories about using pppoe with opnsense?

I cannot afford being cutoff from my mailserver when I am travelling....
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: mimugmail on August 14, 2019, 06:33:04 am

Well I do know, people tend to talk about problems in forums, but are there any good stories about using pppoe with opnsense?

I cannot afford being cutoff from my mailserver when I am travelling....

No stroys are good storys since everyone who posts a thread usually has a problem. Never had issues with pppoe the last year ..
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on August 17, 2019, 09:41:35 pm
This is an ongoing and severe issue for me.  I've tried a bunch of potential fixes mentioned in this and some other threads but nothing has worked.  The moment my PPPoE refreshes my OPNsense self-immolates and requires a hard reboot.  I've tried this on a fresh install and it is the same.  I have yet to upgrade to 19.7, I'll likely try this shortly and see what happens.  If it's still an issue I'll have to jump ship for another option.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: flames on September 10, 2019, 02:41:04 am
I would like to add, with virtualized OPNSense 19.7.3 i have the same issue (since 17.x).
Hardware:...
Proxmox 5.1.x - 6.0.6, MultiWAN, Virtio and Intel E1000 virtual NICs tested (not tested with virtualized Realtek or Vmxnet3)
The physical nodes tested HP DL580g7, DL380g6/g7/g8, DL360g8/g9
Physical NICs: onboard (whatever those nodes have: broadcom, netxen, intel) + different intel pci-e cards (currently nodes are running with intel x520, tested with x540 and some other cards, also with some silly cards like chelsio, which have different issues at all).
as soon as PPPoE reloads -> crash (reproduce: interfaces / overview / wan / reload OR disconnect)
since my ISP is not forcing regularly reconnects, its not that critical, but still painful :)

Edit: changed the ISP to one w/o pppoe.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on September 11, 2019, 08:34:48 pm
This is an ongoing and severe issue for me.  I've tried a bunch of potential fixes mentioned in this and some other threads but nothing has worked.  The moment my PPPoE refreshes my OPNsense self-immolates and requires a hard reboot.  I've tried this on a fresh install and it is the same.  I have yet to upgrade to 19.7, I'll likely try this shortly and see what happens.  If it's still an issue I'll have to jump ship for another option.

I upgraded to latest 19.7 version and still occurring, so definitely not fixed there.  I have tried some suggested fixes here and on other BSD related forums and nothing has worked so far.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: schnipp on October 23, 2019, 06:53:10 pm
The problem still persists on my machine  :'(

As already mentioned here (https://forum.opnsense.org/index.php?topic=12828.msg60265#msg60265), the panic is often triggered when using the "disconnect/connect" buttons in the webgui. Furthermore, the same occurs when the cron job performs a "periodic interface reset". So, I disabled the related cron job to check, whether its more stable.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: juliocbc on October 28, 2019, 05:46:04 pm
This problem is occurring here with the 19.7.4_1 too!

The crash occurs when:

-The cable from the PPPoE modem is manually disconnected
-The modem is turned off
-The Interfaces/Overview Disconnect button is pressed

Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: maikel on November 08, 2019, 02:04:28 pm
The problem still persists on my machine  :'(

As already mentioned here (https://forum.opnsense.org/index.php?topic=12828.msg60265#msg60265), the panic is often triggered when using the "disconnect/connect" buttons in the webgui. Furthermore, the same occurs when the cron job performs a "periodic interface reset". So, I disabled the related cron job to check, whether its more stable.

We have the same issue, any result on disabling the cron?
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: JDtheHutt on November 23, 2019, 10:41:01 pm
As this is confirmed as a continuing fault in 19.7, should this thread be moved there or a new one started and linked back to here, with 19.7 discussion there.
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: schnipp on December 11, 2019, 08:14:41 pm
The problem still persists on my machine  :'(

As already mentioned here (https://forum.opnsense.org/index.php?topic=12828.msg60265#msg60265), the panic is often triggered when using the "disconnect/connect" buttons in the webgui. Furthermore, the same occurs when the cron job performs a "periodic interface reset". So, I disabled the related cron job to check, whether its more stable.

We have the same issue, any result on disabling the cron?

Sorry for my late response. I made some progress with a hack, but this can only be a temporary solution. As I mentioned before (#23 (https://forum.opnsense.org/index.php?topic=12828.msg60265#msg60265), #30 (https://forum.opnsense.org/index.php?topic=12828.msg61008#msg61008)) it looks like a kernel bug.

I replaced the cron triggered periodic interface reset with a solution, which only sends a system signal to the "mpd5" process. This allows me to re-establish the PPPoE connection without disassembling the netgraph (which probably crashes the kernel).

For the dirty hack, please create a new file (/usr/local/opnsense/service/conf/actions.d/actions_reconnect_pppoe.conf) with the following content:

Code: [Select]
[reload]
command:/root/reconnect_pppoe_ipv6.sh
parameters:
type:script
message:Force PPPoE Reconnect with IPv6 cleanup
description: Force PPPoE Reconnect with IPv6 cleanup

Afterwards, create a file (/root/reconnect_pppoe_ipv6.sh) with the following content and give the file executable permissions:

Code: [Select]
#!/bin/sh

kill -s USR2 `pgrep mpd5`
sleep 3

while i="`ifconfig pppoe0 | grep inet6 | grep -m 1 -v '%' | cut -f2 -d ' ' | tr -d '[:space:]'`"; do
 if [ -n "$i" ]
  then
   #echo "IPv6 Address found"
   ifconfig pppoe0 inet6 $i delete
  else
   #echo "NO IPv6"
   break
 fi
done

sleep 3
kill -s USR1 `pgrep mpd5`

This script is performing the following 3 steps:


My system is now stable more than one month :-)
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: tokade on December 27, 2019, 09:37:36 am
Hi all,

it seems I have the same problem since the last update to 19.7.8

I can't force the crash with the "disconnect/connect" buttons, only with edit or add button under point-to-point device. The nightly freezes of my system might have the same reason. This morning I could connect to the VNC console where I found lots of messages: "xn_txeof: WARNING: response is -1!"

Where can I find / disable the cron job you mentioned?

I will implement the last workaround and see if it helps. What time do you start your script? Before or after the "freeze" - mine happens between 4 and 5 am.

Kind regards
Torsten
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: schnipp on December 28, 2019, 07:59:39 pm
Where can I find / disable the cron job you mentioned?

I will implement the last workaround and see if it helps. What time do you start your script? Before or after the "freeze" - mine happens between 4 and 5 am.

Because my ISP drops internet connection for renewing my ip address every 24 hours, I'll force renewing the ip address at a fixed point in time. Under "System --> Settings --> Cron" you can periodically call a pre-programmed script "periodic interface reset", which drops and re-eastablishes the pppoe connection. Calling this script can crash the kernel due to modifications of the pppoe related netgraph. Instead of calling this script, I call my own script (see #62 (https://forum.opnsense.org/index.php?topic=12828.msg69564#msg69564)).
Title: Re: CRASH in 19.1.8 when PPPOE refreshes
Post by: tokade on December 28, 2019, 10:54:02 pm
Hi,

the workaround doesn't prevent my system from freezing. My ISP doesn't drop connection and i haven't got the 'periodic interface reset' in use. So there must be another reason for the nightly freeze.

By further testing I found another command that crashes my system: sysctl -a

Bugs with crahses by sysctl are reported for freebsd and maybe something else uses sysctl during the night...

Kind regards
Torsten