17.1.r1 Kernel Panic

Started by Dean E. Weimer, January 22, 2017, 02:40:37 PM

Previous topic - Next topic
I went ahead and upgraded to the 17.1 production release after the reboot, so far it has survived its just after 7 crash time that it normally did. I am waiting until 8 local time to restart my failed backups from overnight. Bacula doesn't work so well when it loses access to the database due to DNS resolution failures.

I have the same hardware you do, with a similar config, but haven't updated it yet to the 17.x releases.

Definitely hoping the production software works better for you, or I'll be using 16.x for a long while yet.

Probably should also note, I switched over to the LibreSSL option after the update to production. As I was running that in the 16.7 branch prior to the upgrade. I did however switch to OpenSSL and reinstall necessary packages prior to the upgrade to 17.1.rc1.

I re-enabled os-smart plugin as well since disabling it didn't help. I left the Suricata disabled for now.

So far so good with production release, made it over 24 hours.

# uptime
7:46AM  up 1 day,  1:37, 1 users, load averages: 0.38, 0.39, 0.34


Re-Enabling Suricata with the same settings I had before hopefully it continues to run another day.

Crashed Sometime between noon and 1pm today, removed os-smart plugin and disabled Suricata again.

racing pid 54070 tid 100149 td 0xfffff8003abe4a00
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119d81460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119d81490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119d81510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119d81540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119d815d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119d816e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119d81a50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119d81ad0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119d81bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119d81bf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x58f49f5cdfa, rsp = 0x6188be6ae848, rbp = 0x6188be6ae990 ---

I am starting to wonder if its more of coincidence that it was at first happening around the same time of day. It crashed again this afternoon.

Tracing pid 24954 tid 100149 td 0xfffff80123e46500
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119d81460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119d81490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119d81510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119d81540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119d815d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119d816e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119d81a50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119d81ad0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119d81bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119d81bf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x4701264adfa, rsp = 0x63d138394508, rbp = 0x63d138394650 ---

This is starting to be suspiciously precise and matches this one:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213903

Sadly, no progress there since October. If anyone has a clue that would be great.

Quote from: franco on February 02, 2017, 01:34:56 PM
This is starting to be suspiciously precise and matches this one:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213903

Sadly, no progress there since October. If anyone has a clue that would be great.

At first, I suspected ASLR, but I think you hit the nail right on the head. I'm glad to see that no one has had a single issue with HardenedBSD's ASLR. Shows how robust our implementation is. ;)

I'll ping upstream FreeBSD about it and see if there are any updates regarding that issue.

Hi Dean,

A FreeBSD developer provided a test kernel:

# opnsense-update -kr 17.1.1-rwdebug
# /usr/local/etc/rc.reboot

It still panics, but will print vital debug information when doing so.

Running it and providing the output will hopefully help resolve this.


Cheers,
Franco


Dean,

A patch from FreeBSD was reverted in 17.1.2. If you can, report back even on positive results.


Thanks,
Franco