Kernel Panics

Started by tuaris, June 27, 2017, 11:20:07 PM

Previous topic - Next topic
July 19, 2017, 02:54:40 AM #15 Last Edit: July 19, 2017, 03:09:33 AM by tuaris
I've replaced the net6501-70 with a PC Engines APU2C0.



This looks like it's a slightly more powerful device compared to the net6501.  Temperature is again okay.



After about 6 hours, it kernel panicked.

Can we have another backtrace ("bt" at prompt) of the panic just to be sure? It sounds like a programming error indeed then.

We need to see if it has been previously recorded over at https://bugs.freebsd.org/bugzilla/


Cheers,
Franco


I upgraded the PC Engines APU2C0 to 17.7 and about 1 hour later there was a kernel panic.  Also, the net6501-70 was left powered on but disconnected from the network.  The uptime on the net6501-70 is 18 days.

I think the problem may be related to WAN interface. 

I tried to remove the LAGG configuration but every time it gets to restarting the WAN interface, it kernel panics.  I successfully repeated this 3 times.


*** stargate.morante.com: OPNsense 17.7 (i386/OpenSSL) ***

LAN (igb0_vlan1) -> v4: 192.168.0.100/24
VMWARE (igb0_vlan3) -> v4: 10.8.8.1/24
WAN (igb1_vlan100) -> v4: X.X.X.X/28

FreeBSD/i386 (stargate.morante.com) (ttyu0)

login: root
Password:
Last login: Sat Aug  5 20:29:43 on ttyu0
----------------------------------------------
|      Hello, this is OPNsense 17.7          |         @@@@@@@@@@@@@@@
|                                            |        @@@@         @@@@
| Website:      https://opnsense.org/        |         @@@\\\   ///@@@
| Handbook:     https://docs.opnsense.org/   |       ))))))))   ((((((((
| Forums:       https://forum.opnsense.org/  |         @@@///   \\\@@@
| Lists:        https://lists.opnsense.org/  |        @@@@         @@@@
| Code:         https://github.com/opnsense  |         @@@@@@@@@@@@@@@
----------------------------------------------

  0) Logout                              7) Ping host
  1) Assign interfaces                   8) Shell
  2) Set interface IP address            9) pfTop
  3) Reset the root password            10) Firewall log
  4) Reset to factory defaults          11) Reload all services
  5) Power off system                   12) Upgrade from console
  6) Reboot system                      13) Restore a backup

Enter an option: 1


Valid interfaces are:
igb0             00:0d:b9:46:74:2c Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
igb1             00:0d:b9:46:74:2d Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
igb0_vlan3       00:0d:b9:46:74:2c
igb1_vlan100     00:0d:b9:46:74:2d
igb0_vlan1       00:0d:b9:46:74:2c

You now have the opportunity to configure VLANs.  If you don't require VLANs
for initial connectivity, say no here and use the GUI to configure VLANs later.

Do you want to set up VLANs now? [y/N]: n


VLAN interfaces:

igb0_vlan3      VLAN tag 3, parent interface igb0
igb1_vlan100    VLAN tag 100, parent interface igb1
igb0_vlan1      VLAN tag 1, parent interface igb0

If you do not know the names of your interfaces, you may choose to use
auto-detection. In that case, disconnect all interfaces now before
hitting 'a' to initiate auto detection.

Enter the WAN interface name or 'a' for auto-detection: igb1_vlan100

Enter the LAN interface name or 'a' for auto-detection
NOTE: this enables full Firewalling/NAT mode.
(or nothing if finished): igb0

Optional interface 1 description found: VMWARE
Enter the Optional 1 interface name or 'a' for auto-detection
(or nothing if finished): igb0_vlan3

Enter the Optional 2 interface name or 'a' for auto-detection
(or nothing if finished):

The interfaces will be assigned as follows:

WAN  -> igb1_vlan100
LAN  -> igb0
OPT1 -> igb0_vlan3

Do you want to proceed? [y/N]: y

Writing configuration...done.
Configuring loopback interface...done.
Creating wireless clone interfaces...done.
Configuring LAGG interfaces...done.
Configuring VLAN interfaces...done.
Configuring LAN interface...done.
Configuring VMWARE interface...done.
Configuring WAN interface...!▒▒K▒▒▒|    ▒܌k▒r+▒!DJ▒▒a▒H#)▒▒c!▒ic▒#Zj▒I)
                                                                       ci▒ici#▒
                                                                               Z)�#▒
                                                                                    F▒#a!#I
                                                                                           /+)▒!▒
                                                                                                 ▒!ai▒!!i▒!!#!i▒`a##!#i▒+#!#▒a▒I▒▒iɬ#▒B
                                                                                                                                       Ζx▒R)K!!▒!▒^▒a▒▒Z!S+▒ia▒#  X)!)▒▒
                              c+X▒
                                  a#▒J▒!▒Fkc*cA▒##z▒i▒c▒▒jB(c▒▒#!#!F▒#  a!#!ca
                                                                              ▒!ٌ+!i▒j~c▒ha▒Z▒▒/'#NH▒+!k!)▒#aaC▒)H▒▒▒▒▒߰L▒▒▒r▒▒▒g=c▒LC▒▒▒ .)Y!▒3.▒▒▒▒Qiq▒BH+▒▒▒▒
                     ▒▒1▒B𒌀▒`▒▒▒▒9▒O^
                                     ,!▒
                                        H,h6+▒▒m▒I▒^▒K~▒]

Wasn't LAGG.  Now trying without VLAN's (this is a deal breaker btw)

We suspect a configuration order issue, not a kernel change. Did you send in a crash report?

Can you send /attach  a config.xml that would crash with this VLAN + WAN combination?


Thanks,
Franco

Yes, I sent in the crash report. 
Here is the original config.xml section with LAGG + VLAN


  <interfaces>
    <wan>
      <enable>1</enable>
      <if>lagg1_vlan100</if>
      <ipaddr>63.X.X.X</ipaddr>
      <ipaddrv6>dhcpv6</ipaddrv6>
      <subnet>28</subnet>
      <gateway>WANGW</gateway>
      <blockpriv>on</blockpriv>
      <blockbogons>on</blockbogons>
      <media/>
      <mediaopt/>
      <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
      <descr>WAN</descr>
    </wan>
    <lan>
      <if>lagg0</if>
      <descr>LAN</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>192.168.0.100</ipaddr>
      <subnet>24</subnet>
    </lan>
    <opt1>
      <if>lagg0_vlan3</if>
      <descr>VMWARE</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>10.8.8.1</ipaddr>
      <subnet>24</subnet>
    </opt1>
    <enc0>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <if>enc0</if>
      <descr>IPsec</descr>
      <type>none</type>
      <virtual>1</virtual>
    </enc0>
    <pptp>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <networks>
        <network>192.168.0.192</network>
        <mask>28</mask>
      </networks>
      <virtual>1</virtual>
      <if>pptp</if>
      <type>group</type>
      <descr>pptp</descr>
    </pptp>
  </interfaces>
...
  <gateways>
    <gateway_item>
      <interface>wan</interface>
      <gateway>63.X.X.X</gateway>
      <name>WANGW</name>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval/>
      <descr>WAN Gateway</descr>
      <avg_delay_samples/>
      <avg_loss_samples/>
      <avg_loss_delay_samples/>
      <monitor_disable>1</monitor_disable>
      <defaultgw>1</defaultgw>
    </gateway_item>
  </gateways>
  <laggs>
    <lagg>
      <members>igb0</members>
      <descr>Uplink to Switch</descr>
      <laggif>lagg0</laggif>
      <proto>lacp</proto>
    </lagg>
    <lagg>
      <members>igb1</members>
      <descr>Uplink to Internet</descr>
      <laggif>lagg1</laggif>
      <proto>lacp</proto>
    </lagg>
  </laggs>
  <vlans>
    <vlan>
      <if>lagg0</if>
      <tag>3</tag>
      <pcp>0</pcp>
      <descr>VMWare</descr>
      <vlanif>lagg0_vlan3</vlanif>
    </vlan>
    <vlan>
      <if>lagg1</if>
      <tag>100</tag>
      <pcp>0</pcp>
      <descr>Internet</descr>
      <vlanif>lagg1_vlan100</vlanif>
    </vlan>
  </vlans>


Here's the version without LAGG.  I should point out that when attempting to apply this specific configuration, it would never get to the point of saving the config (kernel panic).  I had to manually edit config.xml and do a restore.


  <interfaces>
    <wan>
      <enable>1</enable>
      <if>igb1_vlan100</if>
      <ipaddr>63.X.X.X</ipaddr>
      <ipaddrv6>dhcpv6</ipaddrv6>
      <subnet>28</subnet>
      <gateway>WANGW</gateway>
      <blockpriv>on</blockpriv>
      <blockbogons>on</blockbogons>
      <media/>
      <mediaopt/>
      <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
      <descr>WAN</descr>
    </wan>
    <lan>
      <if>igb0</if>
      <descr>LAN</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>192.168.0.100</ipaddr>
      <subnet>24</subnet>
    </lan>
    <opt1>
      <if>igb0_vlan3</if>
      <descr>VMWARE</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>10.8.8.1</ipaddr>
      <subnet>24</subnet>
    </opt1>
    <enc0>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <if>enc0</if>
      <descr>IPsec</descr>
      <type>none</type>
      <virtual>1</virtual>
    </enc0>
    <pptp>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <networks>
        <network>192.168.0.192</network>
        <mask>28</mask>
      </networks>
      <virtual>1</virtual>
      <if>pptp</if>
      <type>group</type>
      <descr>pptp</descr>
    </pptp>
  </interfaces>
...
  <gateways>
    <gateway_item>
      <interface>wan</interface>
      <gateway>63.X.X.X</gateway>
      <name>WANGW</name>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval/>
      <descr>WAN Gateway</descr>
      <avg_delay_samples/>
      <avg_loss_samples/>
      <avg_loss_delay_samples/>
      <monitor_disable>1</monitor_disable>
      <defaultgw>1</defaultgw>
    </gateway_item>
  </gateways>
  <vlans>
    <vlan>
      <if>igb0</if>
      <tag>3</tag>
      <pcp>0</pcp>
      <descr>VMWare</descr>
      <vlanif>igb0_vlan3</vlanif>
    </vlan>
    <vlan>
      <if>igb1</if>
      <tag>100</tag>
      <pcp>0</pcp>
      <descr>Internet</descr>
      <vlanif>igb1_vlan100</vlanif>
    </vlan>
  </vlans>


Finally, here is the current running config without VLAN on the WAN


  <interfaces>
    <wan>
      <enable>1</enable>
      <if>igb1</if>
      <ipaddr>63.X.X.X</ipaddr>
      <ipaddrv6>dhcpv6</ipaddrv6>
      <subnet>28</subnet>
      <gateway>WANGW</gateway>
      <blockpriv>on</blockpriv>
      <blockbogons>on</blockbogons>
      <media/>
      <mediaopt/>
      <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
      <descr>WAN</descr>
    </wan>
    <lan>
      <if>igb0</if>
      <descr>LAN</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>192.168.0.100</ipaddr>
      <subnet>24</subnet>
    </lan>
    <opt1>
      <if>igb0_vlan3</if>
      <descr>VMWARE</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>10.8.8.1</ipaddr>
      <subnet>24</subnet>
    </opt1>
    <enc0>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <if>enc0</if>
      <descr>IPsec</descr>
      <type>none</type>
      <virtual>1</virtual>
    </enc0>
    <pptp>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <networks>
        <network>192.168.0.192</network>
        <mask>28</mask>
      </networks>
      <virtual>1</virtual>
      <if>pptp</if>
      <type>group</type>
      <descr>pptp</descr>
    </pptp>
  </interfaces>
...
  <gateways>
    <gateway_item>
      <interface>wan</interface>
      <gateway>63.X.X.X</gateway>
      <name>WANGW</name>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval/>
      <descr>WAN Gateway</descr>
      <avg_delay_samples/>
      <avg_loss_samples/>
      <avg_loss_delay_samples/>
      <monitor_disable>1</monitor_disable>
      <defaultgw>1</defaultgw>
    </gateway_item>
  </gateways>
  <vlans>
    <vlan>
      <if>igb0</if>
      <tag>3</tag>
      <pcp>0</pcp>
      <descr>VMWare</descr>
      <vlanif>igb0_vlan3</vlanif>
    </vlan>
  </vlans>


So far the router has an uptime of 16:36:47, I have a good feeling that VLAN on the WAN might be the cause.

The uptime is now 1 days 01:36:25.  I think that was it.  Having VLAN setup on the WAN interface is causing something to kernel panic.

It just had a kernel panic.  I sent in the report.

Since the last kernel panic, it's been getting progressively worse.  I'm experiencing panics 4+ times daily.  Not sure what else to try at this point.

Here is a patch to try for the PPPoE+LAGG+VLAN issue on 17.7:

https://github.com/opnsense/core/commit/065244edf

Apply with

# opnsense-patch 065244edf

We could also try an older kernel, but at this point it seems to be a dormant bug that we simply trigger due to our changes in the interface configuration code...


Cheers,
Franco

Seems like i'm having the similar issue

Quote from: interkrome on August 25, 2017, 10:35:44 AM
Seems like i'm having the similar issue

Patch applied and then I rebooted.

opnsense-patch 065244edf
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|From 065244edf60aede23224f73732a8b18e494d46bf Mon Sep 17 00:00:00 2001
|From: Franco Fichtner <franco@opnsense.org>
|Date: Thu, 10 Aug 2017 15:15:42 +0200
|Subject: [PATCH] interfaces: the renaming in one ifconfig may be unstable
|
|(cherry picked from commit a7ca1661302bd200dbbcf8ba700fed36a167ad98)
|(cherry picked from commit 713f8b8d487d965f76b803e14f6a70fe51124f80)
|---
| src/etc/inc/interfaces.lib.inc | 18 +++++++++++++++---
| 1 file changed, 15 insertions(+), 3 deletions(-)
|
|diff --git a/src/etc/inc/interfaces.lib.inc b/src/etc/inc/interfaces.lib.inc
|index 198237abd..1ad59b7fb 100644
|--- a/src/etc/inc/interfaces.lib.inc
|+++ b/src/etc/inc/interfaces.lib.inc
--------------------------
Patching file etc/inc/interfaces.lib.inc using Plan A...
Hunk #1 succeeded at 72.
Hunk #2 succeeded at 97.
done
All patches have been applied successfully.  Have a nice day.
root@stargate:~ # reboot

Just had another kernel panic.  Looks like the patch didn't work.