[SOLVED] Kernel Panic after creating Carp VIP

Started by Edge, March 23, 2015, 07:56:54 AM

Previous topic - Next topic
March 23, 2015, 07:56:54 AM Last Edit: April 03, 2015, 06:01:45 PM by franco
Hello,
yesterday i was curious if Opnsense is ready for my working Environment. So I configured two Sun Blades and installed Opnsense on them.
I configured my Firewall, some IPSec Tunnels and some other small things. Then i wanted to created a HA Environment, so i can reboot or modify one Firewall when it is needed.
But after I created the first Carp Virtual Interface and gave it a IP, my Opnsense Box suddenly wasn't pingable any more. So i had a look at the console via IPMI and there it was: a Kernel Panic. When i reboot the Server, i can work on it again, but only for a few seconds, then the System crashed again.
Here is what i did exactly:
Created some VLANs on my Main NIC (Intel^® Ethernet Converged Network Adapter X540-T1 driver is the Intel ix driver)
Then i created a Carp VIP on one of these VLANs and voila, kernel panic.
I wanted to send you the Bug Report, but this function does not work for me either, i can only click No after a Login.
So here is an excerpt of the Log:
<6>carp: demoted by -240 to 0 (pfsync bulk fail)
<6>carp: VHID 142@ix1_vlan3820: BACKUP -> MASTER (preempting a slower master)
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x17
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80a33600
stack pointer         = 0x28:0xfffffe085ec043e0
frame pointer         = 0x28:0xfffffe085ec04450
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 12 (irq265: ix1:que 0)
version.txt06000016412503550450  7613 ustarrootwheelFreeBSD 10.1-RELEASE-p6 #0 5aa5ada(master): Thu Feb 26 16:26:03 CET 2015
    root@sensey64:/usr/obj/usr/src/sys/SMP

If you are interessted in the full log, i can send it via E-Mail to you.
For now, is my NIC incompatible or can i fix this Problem somehow?

Best Regards...
Edge

The official Intel website reports, the Intel® Ethernet-Converged-Network-Adapter X540 being supported since FreeBSD 9* ...

Well, i know that the Card is supported.
The Problem is indeed the Intel ix(gbe) driver. I've tuned some variables in /boot/loader.conf.local:
kern.ipc.nmbclusters="1000000"
kern.ipc.nmbjumbop="524288"
But this does not effect my kernel panic.
Even ifconfig ix0 -vlanhwfilter -vlanhwtso -tso which disables TSO does not have an effect.
The behaviour is always the same:
As soon as my Carp IF will be promoted as master, the kernel panic comes up immediately. On PfSense Forum there are quite other people which have the same Problems, but i did not found a working solution yet.
Does anyone have some experience with the Intel ix(gbe) Driver and FreeBSD with Carp and VLAN?

Best Regards

Edge, please send a mail with the full panic to franco@ project website. We've disabled the crash reporter send but will put the feature back soon.

I am not sure if it is a problem with the NIC driver or CARP, or maybe a very bad mix of all of them including VLAN. I'd suspect a stock FreeBSD has similar issues as the modifications are few and the kernel panic is not a domain we have much to say about as a "distribution" of sorts.

Hey franco,

i was curious about this issue, so i tested the exact same Hardware with the exact same config on pfsense. There i can activate the Carp Master state without a kernel panic.
With FreeBSD without any Appliance on it, it works too. I've even tested it with OpenBSD, and i got the same Results -> Everything works like a charm.
I will send you the crash report as soon as i can.

Regards,
Edge

Small update: we are currently trying to pin this down.

Panics are gone, XML RPC was almost completely rewritten in the process. 15.1.9 is going to be interesting.

April 03, 2015, 07:29:18 PM #7 Last Edit: April 04, 2015, 04:59:30 AM by Pulsar
This issue is perfectly reproducible in a VirtualBox machine (4.3.26) with emulated Intel cards and VirtIO Net interfaces (AMD untested): while having no problems to define and apply a CARP configuration for the WAN side, doing the same with the LAN interface triggers the kernel panic. If you define a CARP address for the LAN side then the WAN side, same result: whenever the CARP IP configuration is applied on the WAN interface -> kernel panic. All of my NICs are of the same type.

PS: I did not see the previous answer prior posting...

April 03, 2015, 09:18:27 PM #8 Last Edit: April 03, 2015, 09:44:12 PM by franco
Pulsar, if you feel like verifying the panics are gone you can update to a test version using:

# opnsense-update -r 15.8.3_pfsync && reboot

All testing is greatly appreciated, thanks for your comments. :)

April 04, 2015, 01:30:43 PM #9 Last Edit: April 04, 2015, 08:37:19 PM by Pulsar
Tried, but the files are not there,  https://pkg.opnsense.org/sets  (where opnsense-update grabs from) only contains 15.1.x related stuff :-\  I see a _pfsync  kernel (15.1.8 ) dated from April 03rd, I guess this is the update or should I use another repository location? After updating, I have the exact same version number shown on the console....

UPDATE: Applied opnsense-update -r 15.1.8_pfsync then I have upgraded to the 15.1.8.3,  kernel crashes are now gone. I do have minors issues but I will double check they are linked to my own setup.

So opnsense-update works, nice. :D You issues could be related to configuration issues in the core.git. Ad has pushed quite a bit of changes the last few days WTR carp/pfsync/ha.

Is it supposed to be fixed in 15.1.8.4? I did a fresh install then a direct update to 15.1.8.4 and the kernel panics as soon as the second CARP VIP is applied...

No. if you have previously applied the testing kernel you'll have to do so again as the system always tries to upgrade to the latest known version. 15.1.8 kernel/base hasn't changed. 15.1.9 will have the official fix.

my system is in boot loop. how do i get it to stop and apply an update?

A clean reinstall with the old version and an upgrade to 15.1.9 before you configure HA today is one way, or you wait for the 15.1.9 install media and use the "import configuration" feature from the installer on reinstall before doing the actual installation.