Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - lattera

#46
19.1 Legacy Series / Re: Kernel panic after upgrade
March 08, 2019, 09:37:34 PM
Quote from: franco on March 08, 2019, 07:47:36 PM
I hope this episode shows that we don't always get it 100% right but with a reasonable amount of patience and a level head we can move past almost anything together. :)

We're humans, we make mistakes occasionally. This also shows the importance of crowd testing beta and release candidates. :)
#47
19.1 Legacy Series / Re: Kernel panic after upgrade
March 08, 2019, 05:32:16 PM
Quote from: RGijsen on March 08, 2019, 10:25:16 AM
I totally agree on that. But I still feel some of the devs could be more professional in their communication as well.

If I've made a mistake in my community interactions, please let me know. I'd like to learn from the experience in order to serve the community better. Life is a journey, mistakes are made, and hopefully learned from. :)
#48
Getting closer! I'm still getting a kernel panic, but this time in a different place.There's probably more commits to MFC. I've posted the ISO in case anyone wants to test: https://hardenedbsd.org/~shawn/opnsense/2019-03-07_hbsd_11-stable_disc1-02.iso

For those not running Hyper-V (bare metal systems especially) and getting the fpuinit crash, please test this new ISO and report back.

Screenshot attached.
#49
Yup. Kernel panic with that ISO is to be expected. There's only two modifications from vanilla HardenedBSD 11-STABLE in that build:

1. Debug mode for "ALL THE THINGS!" (CFLAGS+="-g -O0", removal of -O2 CFLAG, STRIP="")
2. Enable WITNESS

I'm building a new version of 11-STABLE with the proper commits from upstream MFC'd. I hope to have this new build tested on my systems before I snore loudly in my wife's right ear tonight. If all goes well, I'll push the commits up to GitHub and publish the new ISO for wider testing.
#50
Quote from: bunchofreeds on March 07, 2019, 08:38:48 PM
Stunning work lattera!

I have downloaded the ISO, but will need guidance on what to do you help from here.

I also understand what you have said and see that it will require some input from other members to correctly resolve it.

Again... brilliant work!

Will wait and check back in to for any progress or if I can test any fixes.

If you have Hyper-V with Generation 2 support available, or have an affected physical system, test booting the ISO. If you're booting a physical system, go ahead and burn that ISO to a DVD and boot it. If you can, try booting in both BIOS and UEFI mode. There's no need to complete an installation, simply booting the ISO will suffice.

Funny enough, pfSense's Jim Thompson hinted to me that we might be missing a couple or a few commits in our 11.2-RELEASE branch: https://reviews.freebsd.org/D14768

I'll MFC those commits, rebuild a new ISO, and retest on my end. I can share the ISO just like I did in my previous post.
#51
19.1 Legacy Series / Re: Kernel panic after upgrade
March 07, 2019, 07:02:53 PM
Quote from: AdSchellevis on March 07, 2019, 06:34:02 PM

fpuinit_bsp1 () at /usr/src/sys/amd64/amd64/fpu.c:241
fpuinit () at /usr/src/sys/amd64/amd64/fpu.c:277
0xffffffff810adb3b in hammer_time (modulep=<optimized out>, physfree=<optimized out>) at /usr/src/sys/amd64/amd64/machdep.c:1801
0xffffffff80316024 in btext () at /usr/src/sys/amd64/amd64/locore.S:79


I would like to thank Franco, Shawn and anybody involved in actually pinning this issue down.

A kernel with debug options enabled is available on our website [2], but if Franco has some time available he can probably move it to a better spot, maybe build some iso with kernel.


Best regards,

Ad


Hey Ad,

I've been working on this for the past few days. Put in around 20 hours so far tracking down the issue. :)

We effectively have two forum topics for the same problem. I've documented the issue here: https://forum.opnsense.org/index.php?topic=11403.msg54432#msg54432

So, I've figured out the root cause. I need to do more research in order to write a patch. I'm hoping to have a patch ready within the next week or two.
#52
So, I figured out what's wrong. Now to figure out how to fix it.

What's happening is that the FPU (Floating Point Unit) boot-time initialization code is trying to patch another part of executable kernel code. However, this executable code was not marked as writable. The reason HardenedBSD fails to boot is because HardenedBSD enforces the lack of the write permission whereas FreeBSD does not.

I'll talk with Oliver Pinter, my cofounder with HardenedBSD, who wrote our W^X implementation. I may also bring in some FreeBSD developers, the ones who work on the UEFI loader.

I believe there's a discrepancy in how the BIOS loader and the UEFI loader load the kernel into memory. The BIOS loader may very well set the writable permission for the kernel whereas the UEFI loader may not.

Here's how I reached this conclusion (links valid as of 07 Mar 2019 11:33 EST):

The failure happens here: https://github.com/HardenedBSD/hardenedBSD/blob/hardened/11-stable/master/sys/amd64/amd64/fpu.c#L241

The `ctx_switch_xsave` symbol is a function, defined here:

https://github.com/HardenedBSD/hardenedBSD/blob/hardened/11-stable/master/sys/amd64/amd64/cpu_switch.S#L141

So the kernel is trying to modify the third byte in the `ctx_switch_xsave` function, doing a bitwise OR with value 0x10 (16 decimal).

The kernel panic states that the memory cannot be modified, which means that the write permission is disabled for that location in memory.

I hope this makes sense. If it doesn't, please let me know. I'll try to explain differently.
#53
I've uploaded the custom, debug build of HardenedBSD 11-STABLE/amd64 here: https://hardenedbsd.org/~shawn/opnsense/2019-03-07_hbsd_11-stable_disc1.iso
#54
19.1 Legacy Series / Re: Kernel panic after upgrade
March 07, 2019, 04:09:04 PM
Quote from: peter008 on March 07, 2019, 09:19:40 AM
franco complained about not being payed enough for his work, the admin wants an Intel NUC from the community for 550 € just to test Hyper-V.

FYI: it takes resources to debug issues. No resources means no debugging. My employer is awesome and lent me a laptop on which I can do the necessary debugging. That's what happens when one looks for potential solutions rather than griping with feelings of entitlement. ;P

If you have a better suggestion, rather than a gripe, I'm all ears.
#55
19.1 Legacy Series / Re: Kernel panic after upgrade
March 07, 2019, 02:38:26 AM
Quote from: bitwolf on March 07, 2019, 12:21:37 AM
I have just done some tests on our lab DELL Poweredge R340; it's not currently available for me to use, but as long as I don't touch the HDDs, and do it out of hours, I can reboot it as many times as I want, at least for now.

Given the constraint above I focussed on trying to narrow down the conditions for the kernel trap using just a bootable iso.

Franco says the problem is upstream, so instead of booting the OPNsense iso (which for some reason takes half an hour to get to the point of the crash when mounted as virtual ISO via the iDRAC) I used unmodified OS ISOs.

Here are the results so far:

UEFI ENABLED
HardenedBSD-11-STABLE-v1100056.13-amd64-bootonly.iso
doesn't even manage to boot from the iso

UEFI DISABLED
HardenedBSD-11-STABLE-v1100056.13-amd64-bootonly.iso <-
kernel trap 12

FreeBSD-11.1-RELEASE-amd64-bootonly.iso
boots all the way to the installer

HardenedBSD-12-STABLE-v1200058.3-amd64-bootonly.iso
boots all the way to the installer

I'm seeing the same type of results in Hyper-V as well. However, it's with UEFI enabled due to being Generation 2. Generation 1 works fine for me.

It's possible that the issue with the Dell systems is related to the issue with the Hyper-V systems.

Quote from: bitwolf on March 07, 2019, 12:21:37 AM
ISo at least in the case of Dell bare metal it seems that UEFI is not the culprit, as disabling it doesn't stop the kernel traps. It also seems not to be a FreeBSD 11 problem, as the vanilla FBSD iso works. This leaves changes between FreeBSD 11 and HardenedBSD 11 as the most likely cause for the kernel trap, but looking at the repo it seems the classical needle in a haystack. The interesting result from this testing is that HardenedBSD 12 works, so maybe an easier investigation path could be to look at the changes between HBSD 11 and 12 that are not merged from FBSD? Shawn what do you think?

Another option to collect more data could be to have a 19.1 debug iso (ie one with DDB enabled in the kernel) so we can actually collect core dumps for these crashes. I am sure that given enough time many of us, me included, could set up a HBSD dev environment and build the image myself, but if this can be a useful investigation avenue it seems better if one of the lead devs could just run the existing build workflow with the kernel option set.

I'm building a custom version of HardenedBSD 11-STABLE/amd64 with DDB/KDB and remote KGDB along with CFLAGS="-g -O0" for "ALL THE THINGS!" I can upload the installation media once they're built.

As far as attempting to see what needs to be backported from 12-STABLE to 11-STABLE, that would entail _A LOT_ of work. More work than I have time for. However, if someone in the community wants to take that on, I'm definitely not going to stop him/her and would love to review patches. ;)

Quote from: bitwolf on March 07, 2019, 12:21:37 AM
I see further up the thread that a number of people complained about the same crashes on ESXi; our own production firewalls run on ESXi 6 but upgraded to 19.1 successfully, I can do some tests in that sense tomorrow, as this seems to imply there might be a simple workaround in the VM settings for the people running ESXi. This could also be a way forward for the people who have kernel traps on overspecced bare metal, at least up to the point the upstream issue is fixed, or OPNSense has moved to HBSD 12 (but that's at least a year away).

OPNsense's move to HardenedBSD 12 is eight months away, assuming Franco does the initial import of the source code soon. :)
#56
I'm seeing the same issue. I've also reproduced on vanilla HardenedBSD 11-STABLE. I'm now building a custom build of HardenedBSD 11-STABLE with extra special debugging stuffs. :)
#57
Quote from: bunchofreeds on March 06, 2019, 10:21:56 PM
NUC's are awesome, but expensive.
Any consideration for second hand like an HP 8300 Elite with an i7, or an HP Z230 with i7 or E3-1200 which is a little closer to a server with ECC.
Not as cool obviously, but cheap, reliable and quiet with built in PSU etc.

Still no luck with HardenedBSD? Did you get to work with your colleague using Hyper-V?

HardenedBSD's definitely open to hardware donations of any kind, especially those that fit our needs. :)

My employer has lent me a laptop on which I installed an evaluation version of Windows Server 2019. I've got Hyper-V on it. As of a few hours ago, I was able to reproduce the issue. This is gonna be a tough bug to figure out, especially since the kernel debugger freezes and doesn't accept input. But, I'll do my best to figure this out and hopefully provide a patch.
#58
Quote from: peter008 on March 06, 2019, 09:13:04 PM
So you need a 550 €-NUC with an i5 for testing? Really?

All Core i* CPUs have the SLAT instruction, which Hyper-V requires. The Intel NUC uses Core i* processors.
#59
Quote from: RGijsen on March 06, 2019, 10:17:34 AM
Well, as stated by multiple people, any somehow Windows 10 Pro machine will do, as you can just enable Hyper-V and installing a Gen2 VM with OPNSense shows the same issue. While I expect the issue for Gen2 VM's the same as with UEFI enabled bare-metal hardware, I can't of course not sure of that. HAving said that, it's extremely easy to create an environment that manifestates the issue.

The problem is that all my systems run HardenedBSD. ;P

My employer has lent me a laptop on which I can install the 180 day Windows Server 2019 eval. I'll be working on that today. I hope to have some results to report back within the next week or two.

For full Hyper-V debugging support, I'd need a permanent system. I've set up an Amazon Wishlist for HardenedBSD: https://smile.amazon.com/registry/wishlist/2AKXCIOXYO28N/ref=cm_sw_r_cp_ep_ws_MuDECbATM7CVZ
#60
19.1 Legacy Series / Re: Kernel panic after upgrade
March 06, 2019, 07:48:23 PM
Never fear, for lattera is here!

I'm at least looking into the Hyper-V regression(s). I, too, am doing this in my spare time, but it's worth it. :)