OPNsense Forum

Archive => 17.1 Legacy Series => Topic started by: Dean E. Weimer on January 22, 2017, 02:40:37 pm

Title: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 22, 2017, 02:40:37 pm
I updated my System on Friday from 16.7.13 to 17.1.r1, so far both Saturday and Sunday it has had a Kernel Panic at just after 7:00am Local time. I was able to capture the console output on today's crash.

Code: [Select]
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bfcd0c
stack pointer           = 0x28:0xfffffe0119c09430
frame pointer           = 0x28:0xfffffe0119c09460
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 87062 (sh)
[ thread pid 87062 tid 100087 ]
Stopped at      turnstile_broadcast+0x9c:       movq    0x20(%rbx,%rax,1),%rcx
db>

I looked through the crontab -l output and the /etc/crontab file I can't see anything set to run at about that time. But the fact that It did apparently at the same time, make me really suspicious and of course being that the process was sh, it could be any shell script I haven't been able to find a log message on the system indicating what happened.
Unfortunately later today I am going to have to roll it back until next weekend, this is installed at my house, for my home network and mail server, on weekdays I am in the office by 7, so if the system crashes just afterwards my mail server will be offline all day.
Anyone have any ideas on what I could do to try to narrow this down today as to the cause.
Title: Re: 17.1.r1 Kernel Panic
Post by: weust on January 22, 2017, 04:22:01 pm
I think some information on what hardware you use would be helpful here.
Or a virtual machine, which hypervisor.

Title: Re: 17.1.r1 Kernel Panic
Post by: franco on January 22, 2017, 04:59:35 pm
This is what I could find. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214344

Can you post the output of "bt" as well?
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 22, 2017, 05:25:09 pm
Hardware is a Jetway Intel Celeron N2930 Quad Core Dual Intel LAN Fanless - HBJC311U93W-2930-B https://www.amazon.com/gp/product/B00OY8Q0QC/ref=oh_aui_search_detailpage?ie=UTF8&psc=1 (https://www.amazon.com/gp/product/B00OY8Q0QC/ref=oh_aui_search_detailpage?ie=UTF8&psc=1) with 4G of ram and a 30G MSATA

The only plugin installed is the os-smart, here is the model information from smart on the MSATA.
Model Family:     Intel 525 Series SSDs
Device Model:     INTEL SSDMCEAC030B3)

If it crashes again today I will get the output of bt under the ddb, I wasn't familiar with it and had no way to look up the information online until I rebooted it. I will have to roll it back to the 16.7 release though before the end of the day. As I won't be able to work around it to test during the weekdays.
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 29, 2017, 02:22:58 pm
I reinstalled Friday, it did make it through Saturday morning ok, but crashed again on Sunday.

Code: [Select]
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bfcd0c
stack pointer           = 0x28:0xfffffe0119d5e430
frame pointer           = 0x28:0xfffffe0119d5e460
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 56207 (sh)
[ thread pid 56207 tid 100142 ]
Stopped at      turnstile_broadcast+0x9c:       movq    0x20(%rbx,%rax,1),%rcx
db> where
Tracing pid 56207 tid 100142 td 0xfffff8003b47aa00
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119d5e460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119d5e490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119d5e510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119d5e540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119d5e5d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119d5e6e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119d5ea50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119d5ead0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119d5ebf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119d5ebf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x2eb86e92dfa, rsp = 0x617648f9ee48, rbp = 0x617648f9ef90 ---
db> where /u
Tracing pid 56207 tid 100142 td 0xfffff8003b47aa00
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119d5e460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119d5e490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119d5e510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119d5e540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119d5e5d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119d5e6e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119d5ea50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119d5ead0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119d5ebf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119d5ebf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x2eb86e92dfa, rsp = 0x617648f9ee48, rbp = 0x617648f9ef90 ---
Title: Re: 17.1.r1 Kernel Panic
Post by: franco on January 29, 2017, 04:48:55 pm
Can you try this without os-smart?


Thanks,
Franco
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 29, 2017, 05:08:02 pm
I went ahead and removed the os-smart plugin. it didn't actually delete the smartmontools pkg so I went ahead and removed that from the command line with pkg remove command.

I have managed to setup a back door into my network using SSH on a second interface to a FreeBSD server with locally installed ip filter firewall only routing one external IP from one of our proxy servers at work through that interface. So that I can test on weekdays as well, and if it crashes after I leave I can ssh to that host and access the serial console to reboot it.
Title: Re: 17.1.r1 Kernel Panic
Post by: franco on January 30, 2017, 08:48:27 am
Ok, thank you.
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 30, 2017, 12:19:46 pm
This time it crashed much earlier, appears to be about 3am.

Code: [Select]
Tracing pid 67294 tid 100272 td 0xfffff800a1756000
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119e8b460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119e8b490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119e8b510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119e8b540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119e8b5d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119e8b6e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119e8ba50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119e8bad0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119e8bbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119e8bbf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x57b76a57dfa, rsp = 0x7cea88a94bf8, rbp = 0x7cea88a94d40 ---
Title: Re: 17.1.r1 Kernel Panic
Post by: lattera on January 30, 2017, 03:58:40 pm
Are you running 32-bit or 64-bit?
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 30, 2017, 05:21:39 pm
64 bit, it did crash again just after 7 unfortunately my back door failed. I used my wireless adapter to create the second network connection. I forgot by Ubiquiti UniFi APs go into isolation mode and shutdown wireless when they lose access to their default gateway. Which in turn shutdown the new SSID I setup connected to my Internet VLAN. I will swing by my house and grab the output from db during my lunch hour.
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 30, 2017, 07:28:11 pm
In case its helpful, here's the uname -a output.

# uname -a
FreeBSD opnsense.dweimer.local 11.0-RELEASE-p7 FreeBSD 11.0-RELEASE-p7 #0 175886459(master): Mon Jan 16 02:00:58 CET 2017     root@sensey64:/usr/obj/usr/src/sys/SMP  amd64

Code: [Select]
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bfcd0c
stack pointer           = 0x28:0xfffffe0119dc2430
frame pointer           = 0x28:0xfffffe0119dc2460
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 62022 (sh)
[ thread pid 62022 tid 100164 ]
Stopped at      turnstile_broadcast+0x9c:       movq    0x20(%rbx,%rax,1),%rcx
db> where
Tracing pid 62022 tid 100164 td 0xfffff8003ef07000
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119dc2460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119dc2490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119dc2510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119dc2540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119dc25d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119dc26e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119dc2a50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119dc2ad0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119dc2bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119dc2bf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x4c20ce58dfa, rsp = 0x7520364f0aa8, rbp = 0x7520364f0bf0 ---

Title: Re: 17.1.r1 Kernel Panic
Post by: lattera on January 30, 2017, 08:40:19 pm
Do you have Suricata enabled in IPS mode?
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 30, 2017, 09:57:25 pm
It was enabled, but not in IPS mode. I went ahead and disabled it, the attachment shows the settings only thing changed with the disable was the unchecking of the enabled option.
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 31, 2017, 01:04:22 pm
No change, crashed just after midnight.

Code: [Select]
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bfcd0c
stack pointer           = 0x28:0xfffffe0119c77430
frame pointer           = 0x28:0xfffffe0119c77460
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 70014 (sh)
[ thread pid 70014 tid 100087 ]
Stopped at      turnstile_broadcast+0x9c:       movq    0x20(%rbx,%rax,1),%rcx
db> where
Tracing pid 70014 tid 100087 td 0xfffff80004ca0000
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119c77460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119c77490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119c77510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119c77540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119c775d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119c776e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119c77a50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119c77ad0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119c77bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119c77bf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x6025cfe3dfa, rsp = 0x6676bfa9b668, rbp = 0x6676bfa9b7b0 ---
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 31, 2017, 02:50:52 pm
I went ahead and upgraded to the 17.1 production release after the reboot, so far it has survived its just after 7 crash time that it normally did. I am waiting until 8 local time to restart my failed backups from overnight. Bacula doesn't work so well when it loses access to the database due to DNS resolution failures.
Title: Re: 17.1.r1 Kernel Panic
Post by: silent_mastodon on January 31, 2017, 03:05:37 pm
I have the same hardware you do, with a similar config, but haven't updated it yet to the 17.x releases.

Definitely hoping the production software works better for you, or I'll be using 16.x for a long while yet.
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on January 31, 2017, 03:13:47 pm
Probably should also note, I switched over to the LibreSSL option after the update to production. As I was running that in the 16.7 branch prior to the upgrade. I did however switch to OpenSSL and reinstall necessary packages prior to the upgrade to 17.1.rc1.

I re-enabled os-smart plugin as well since disabling it didn't help. I left the Suricata disabled for now.
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on February 01, 2017, 02:48:05 pm
So far so good with production release, made it over 24 hours.

Code: [Select]
# uptime
 7:46AM  up 1 day,  1:37, 1 users, load averages: 0.38, 0.39, 0.34

Re-Enabling Suricata with the same settings I had before hopefully it continues to run another day.
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on February 01, 2017, 09:30:49 pm
Crashed Sometime between noon and 1pm today, removed os-smart plugin and disabled Suricata again.

Code: [Select]
racing pid 54070 tid 100149 td 0xfffff8003abe4a00
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119d81460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119d81490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119d81510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119d81540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119d815d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119d816e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119d81a50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119d81ad0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119d81bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119d81bf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x58f49f5cdfa, rsp = 0x6188be6ae848, rbp = 0x6188be6ae990 ---
Title: Re: 17.1.r1 Kernel Panic
Post by: Dean E. Weimer on February 01, 2017, 11:27:35 pm
I am starting to wonder if its more of coincidence that it was at first happening around the same time of day. It crashed again this afternoon.

Code: [Select]
Tracing pid 24954 tid 100149 td 0xfffff80123e46500
turnstile_broadcast() at turnstile_broadcast+0x9c/frame 0xfffffe0119d81460
__rw_wunlock_hard() at __rw_wunlock_hard+0x8f/frame 0xfffffe0119d81490
vm_map_delete() at vm_map_delete+0x3dc/frame 0xfffffe0119d81510
vm_map_remove() at vm_map_remove+0x47/frame 0xfffffe0119d81540
exec_new_vmspace() at exec_new_vmspace+0x225/frame 0xfffffe0119d815d0
exec_elf64_imgact() at exec_elf64_imgact+0xa50/frame 0xfffffe0119d816e0
kern_execve() at kern_execve+0x7f9/frame 0xfffffe0119d81a50
sys_execve() at sys_execve+0x4c/frame 0xfffffe0119d81ad0
amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe0119d81bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0119d81bf0
--- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x4701264adfa, rsp = 0x63d138394508, rbp = 0x63d138394650 ---
Title: Re: 17.1.r1 Kernel Panic
Post by: franco on February 02, 2017, 01:34:56 pm
This is starting to be suspiciously precise and matches this one:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213903

Sadly, no progress there since October. If anyone has a clue that would be great.
Title: Re: 17.1.r1 Kernel Panic
Post by: lattera on February 02, 2017, 04:11:15 pm
This is starting to be suspiciously precise and matches this one:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213903

Sadly, no progress there since October. If anyone has a clue that would be great.

At first, I suspected ASLR, but I think you hit the nail right on the head. I'm glad to see that no one has had a single issue with HardenedBSD's ASLR. Shows how robust our implementation is. ;)

I'll ping upstream FreeBSD about it and see if there are any updates regarding that issue.
Title: Re: 17.1.r1 Kernel Panic
Post by: franco on February 17, 2017, 08:23:33 am
Hi Dean,

A FreeBSD developer provided a test kernel:

# opnsense-update -kr 17.1.1-rwdebug
# /usr/local/etc/rc.reboot

It still panics, but will print vital debug information when doing so.

Running it and providing the output will hopefully help resolve this.


Cheers,
Franco
Title: Re: 17.1.r1 Kernel Panic
Post by: franco on February 17, 2017, 08:25:42 am
PS: Please direct further info to the main thread: https://forum.opnsense.org/index.php?topic=4414.0
Title: Re: 17.1.r1 Kernel Panic
Post by: franco on February 22, 2017, 12:18:10 pm
Dean,

A patch from FreeBSD was reverted in 17.1.2. If you can, report back even on positive results.


Thanks,
Franco