1
19.7 Legacy Series / kernel panics through multiple releases
« on: December 21, 2019, 01:56:18 am »
Hi all - I was pretty happy with opnsense as a concept. Coming from pfsense, I was a bit saddened by how that project has changed over the years, especially the move to require AES-NI CPU support in the future (which they seem to have backed off from). So opnsense looked like a good option, and the fact that you've already started the process of "cleaning house" on old code was a big deal to me.
That said, last week I moved back to pfsense. It became necessary because no matter what I did (replacing hardware, turning off "big" features like IDS/IPS, clean reinstalls, etc.) I was just getting fairly regular kernel panics. The more I watched this, the more I realized that with UFS I was getting serious data corruption each time (as shown by the built-in 'health check') and for a time I thought perhaps that was the root of my problem - some prior release paniced once and then subsequent panics were the result of corruption in some kernel module or something. I eventually moved to ZFS using the nice bootstrapping tool provided and I saw a few panics, the last of which left the system unbootable (panic during mountroot).
A few threads where I brought up the panics, but didn't really find any resolution, mostly me talking to myself at some point:
https://forum.opnsense.org/index.php?topic=14323.0 (configd)
https://forum.opnsense.org/index.php?topic=12267.msg68445#msg68445 (zfs install)
So I yanked the drive, put in an old drive (one that also had opnsense on it that I'd swapped out to test if the corruption was a drive failure), and installed pfsense w/the zfs install option. A week later and it's still going (and thankfully aliases and dhcp static mappings are pretty easy to export/import across platforms) and it's still working without any panics. This is great, but I'm also on a platform that promises to obsolete my hardware with the next major release (which may not come given how much time their other linux-based project is getting).
So what's my point in posting?
Just calling attention to the issue, giving people with similar hardware a chance to find this via google, whatever. My gut feeling is that while HardenedBSD is great, it sees WAY less hardware than mainline FreeBSD and it's just not happy with my old Core2Duo (E7500, 2.93GHz) Dell. It reminds me of the early days of OpenBSD - secure, but as you add more protections, you end up with less stability because you're bailing out whenever you hit an unexpected condition. This is GOOD - it means your protections and correctness in following spec is working. It's bad if you have users that hit the bugs and don't have the manpower to follow up. Anyhow, I've done the "submit a bug" thing after each of these panics for the last year or so so there's a record for anyone wanting to look at it. And I have plenty of spare drives around and a copy of my last config so if anyone ever wants to troubleshoot with me, I have no problem flipping over to opnsense again for testing.
From my end though, I've hit a dead end - the built-in Dell diagnostics all pass, memtest86 passes, SMART passes on all drives I've tried (after a "long" self-test), pegging the cpu with benchmarkers doesn't trigger the bug, CPU fan is fine, so not sure what else I could do.
That said, last week I moved back to pfsense. It became necessary because no matter what I did (replacing hardware, turning off "big" features like IDS/IPS, clean reinstalls, etc.) I was just getting fairly regular kernel panics. The more I watched this, the more I realized that with UFS I was getting serious data corruption each time (as shown by the built-in 'health check') and for a time I thought perhaps that was the root of my problem - some prior release paniced once and then subsequent panics were the result of corruption in some kernel module or something. I eventually moved to ZFS using the nice bootstrapping tool provided and I saw a few panics, the last of which left the system unbootable (panic during mountroot).
A few threads where I brought up the panics, but didn't really find any resolution, mostly me talking to myself at some point:
https://forum.opnsense.org/index.php?topic=14323.0 (configd)
https://forum.opnsense.org/index.php?topic=12267.msg68445#msg68445 (zfs install)
So I yanked the drive, put in an old drive (one that also had opnsense on it that I'd swapped out to test if the corruption was a drive failure), and installed pfsense w/the zfs install option. A week later and it's still going (and thankfully aliases and dhcp static mappings are pretty easy to export/import across platforms) and it's still working without any panics. This is great, but I'm also on a platform that promises to obsolete my hardware with the next major release (which may not come given how much time their other linux-based project is getting).
So what's my point in posting?
Just calling attention to the issue, giving people with similar hardware a chance to find this via google, whatever. My gut feeling is that while HardenedBSD is great, it sees WAY less hardware than mainline FreeBSD and it's just not happy with my old Core2Duo (E7500, 2.93GHz) Dell. It reminds me of the early days of OpenBSD - secure, but as you add more protections, you end up with less stability because you're bailing out whenever you hit an unexpected condition. This is GOOD - it means your protections and correctness in following spec is working. It's bad if you have users that hit the bugs and don't have the manpower to follow up. Anyhow, I've done the "submit a bug" thing after each of these panics for the last year or so so there's a record for anyone wanting to look at it. And I have plenty of spare drives around and a copy of my last config so if anyone ever wants to troubleshoot with me, I have no problem flipping over to opnsense again for testing.
From my end though, I've hit a dead end - the built-in Dell diagnostics all pass, memtest86 passes, SMART passes on all drives I've tried (after a "long" self-test), pegging the cpu with benchmarkers doesn't trigger the bug, CPU fan is fine, so not sure what else I could do.