[SOLVED][Fix included in 17.7.1] PPPOE Crash

Started by jwe, August 10, 2017, 02:13:23 AM

Previous topic - Next topic
Hello,

Lookie who is here! I hope you are doing fine these days! :D

The upgrade path was moved to 17.7.1 for this particular reason, but the 17.1 GUI can't know that, so it shows the one that it is pointing to, but the mirror has the symlink switched.


Cheers,
Franco

I moved from pfsense straight to opnsense 17.7.  I have to say that while I like it in general, this bug has been a complete nightmare.  It has been so unstable that I have been experiencing crashes and reboots so constantly that I can barely get past logging into the GUI before it goes down.  I live in an area where I have 0 mobile signal so without some means to access my wired internet I am a bit screwed.  After a lot of perseverence I managed to get it to update to 17.7.1 before it crashed again and now it is running rock solid.  I think you need to consider removing 17.7 as the provided download and put 17.7.1 up there instead, or plaster some kind of massive warning across the 17.7 download that it has a serious flaw and that users need to immediately by any means update to 17.7.1 before it experiences a crash.

September 05, 2017, 06:35:29 PM #47 Last Edit: September 05, 2017, 06:37:47 PM by tillsense
Quote from: JDtheHutt on September 05, 2017, 05:27:13 PM
...I think you need to consider removing 17.7 as the provided download and put 17.7.1 up there instead...

@franco
100% ACK

cheers
till

Hi guys,

We agree. Yet we have more changes that we would like to see in images, so we are one or two 17.7.x releases away. It's a bet of sorts... And in the meantime 17.1 works too. :)

End of September is realistic.


Cheers,
Franco

Would it be best just to pull 17.7 as a downloadable or upgradeable option then?  Only allow 17.1 to be downloaded by default and any upgrades from that within the system to skip right past to 17.7.1?  Because 17.7 itself does not seem to merit being defined as a stable production ready system.  I know it's hard to rollback on something as big as a release, but that's better than having users tearing their hair out because the system won't stay up even long enough to perform an upgrade.  You'll lose people otherwise.  With me having no mobile access and unable to get 17.7 to stay up, I was on the verge of just returning to pfsense and not looking back, which would be a shame as I think opnsense is a great system other than for that fault.  Now I'm on 17.7.1 I am rock steady, not dropped whatsoever since then.

@franco The fact that I don't post doesn't mean I'm not here, it means the product has been rock solid for my use cases :-). This is the first "serious" issue that I have seen, and it did not even apply to our entire fleet of appliances. Only appliances that were behind bridged modems were actually affected by this, because PPPoE is handled by opnsense, instead of PPPoE being handled by the modem if it is in routing mode.

Tried the 17.1.x to 17.7 upgrade, upgraded smoothly, although it did take a while to come back and gave me a slight scare there :-). It then found the 17.7.1 upgrade and sailed through that as well  ;D

@JDtheHutt every software has bugs, and this bug as far as I understand it wasn't in opnsense, but upstream. I've been running opnsense since day 0. I actually had to reconfigure one of our routers because it was upgraded from a 32bit machine to a 64bit machine and wanted to "start fresh" with it. I had to skim through the old configuration, it was *that* long that I had to fiddle with it that I forgot how it was configured (subnets/interfaces). So far, since day 0, it was always use>update>use, didn't notice anything serious. By the time I updated, the serious issues had already been pulled from the mirrors (eg VLANs). And no, I'm not one of those "IT experts" that never update, I try to update everything (from servers down to clients' access points) every week. YMMV of course, because there is always that one person that will say "but I upgraded and now everything is broken! how did you miss that?"  :o

@Franco Unfortunately, this fault does not appear to be fixed in 17.7.1. Since I installed it, the last 48 hours it has run without any issue. Then it died again, same behaviour as before. If I boot with my WAN cable connected then it fails to boot. Without the WAN cable I can boot but the second I insert it, the whole system goes offline. Please let me know if there are any specific logs you want for this, however with no working means to access the internet other than walking down the street to get mobile signal, or trying 17.1 instead or rolling back to pfsense entirely, it might take me a while.

It's unlikely the same issue with multiple confirms that the issue was solved.

Do you have a crash report?


Cheers,
Franco

I'll take a look when I get home from this shift. However, I did notice that each time it rebooted and I logged in, the usual crash report notification which I have seen at the top on previous occurrences was not present. Is there a specific place these are logged? I can manually grab them and send over to you.

If nothing shows up it's not a proper crash. Does uptime reset a.k.a. forced reboot? Also check the System: Log File page for clues.

@Franco I had a bit of a look around.  To provide some context for my setup, I was using a fresh install of 17.7 which was immediately upgraded to 17.7.1.  I use IVPN and have configured the VPN service for that as per their current pfsense instructions, as they seem to match OPNsense and have been working fine.  Other than using some third party DNS and setting some static IPs for devices on my network, nothing else is changed.

The first oddity is that even with IPv6 disabled across all settings, telling it to use IPv4 even if IPv6 is available, putting block rules on IPv6 traffic etc, all my devices are still showing as being assigned IPv6 addresses at the client side.  At the server end I don't see any IPv6 addresses assigned and the DHCP Server for IPv6 is disabled, yet all devices are still receiving addresses and I can see IPv6 traffic constantly being blocked by my firewall block rule for it.

The system as configured at first rungs fine, but after a period of seemingly random time, though always by 48 hours it seems, everything drops, all devices fall off the network and show no IPv4 addresses, yet they still possess IPv6 addresses.  I cannot ping anything, my clients show that their connections are down.  I can still see that they are connected physically as they are detecting a potential 1000Mbit/s link, but no actual connectivity is available.  Going manually to the OPNsense box and checking output, it is just hung, no response.

Rebooting, if the WAN cable is left in, the system fails to boot, reports config errors and just hangs on startup.  Removing the WAN cable, it boots and everything works perfectly with no further drops, though all devices still receive IPv6 addresses.  Plugging the WAN cable back in results in an immediate failure again and everything drops off the network.

One thing I noticed it that OPNsense keeps switching my default gateway back to my WAN rather than being on my VPN gateway.  All outbound traffic is configured to use the VPN gateway anyway but I have additionally set the VPN gateway as default and set the system to only use the default and not to fallback to any other gateway if the VPN gateway goes down.

Setting the default gateway back to the VPN gateway lets me plug the WAN cable back in and no drops then.  However, DNS is completely broken and nothing resolves whatsoever.  Due to this, my VPN cannot establish as it cannot recognise the hostname in order to bring it up.  I tried setting separate DNS options for every gateway on the system but no success.  Manually entering in the IP for my VPN hostname, I can then establish the VPN and I can ping external addresses again but DNS is still completely dead, whether I use my manually defined DNS services or whether I select to allow my ISP DNS to override my locally defined services.

I actually made a backup of my configuration when I first set it all up and it was working, but restoring that does not work.  Resetting to factory default and then restoring the configuration also does not work.  I have to do a complete disk wipe and reinstall from USB to get it running and it then goes through all of the above again.

I have had to decide to fully fallback to using pfsense 2.4.0-RC as I have an angry wife, two children and a mother-in-law here who have demanded I give them their internet back, so I am afraid I now do not have an OPNsense box available for further testing.  I can confirm that the pfsense install is working perfectly and has so far not experienced the same issues as on OPNsense, though it has not yet been running for 48 hours.  To also add, the same settings on pfsense result in no IPv6 activity at all and none of my devices receive IPv6 addresses.  I am sorry I can't provide more information than that currently, but I hope it helps and I would like to give OPNsense another go, maybe when they go to visit relatives for the week and I am left behind due to work!  Let me know if you need any other details which I may have forgotten and I will see if I can add them.  Thanks.