OPNsense Forum

Archive => 17.1 Legacy Series => Topic started by: tuaris on June 27, 2017, 11:20:07 pm

Title: Kernel Panics
Post by: tuaris on June 27, 2017, 11:20:07 pm
I'm on a Sokeris net6501 and after updating to OPNsense 17.1.8-i386 the firewall is kernel panicking at random intervals (sometimes 9 hours sometimes 2, sometimes a few minutes.).

Code: [Select]
Fatal double fault:
eip = 0xc0a30252
esp = 0xeba10fc0
ebp = 0xeba11518
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper(c188419d,ff,c1b0c1e0,c1b0c1f0,c796b680,...) at db_trace_self_wrapper+0x2a/frame 0xc1d7e328
kdb_backtrace(c1a5ffcd,0,c1a56ed3,c1d7e3e4,0,...) at kdb_backtrace+0x2d/frame 0xc1d7e390
vpanic(c1a56ed3,c1d7e3e4,c1d7e3e4,c1d7e3ec,c14922b6,...) at vpanic+0x114/frame 0xc1d7e3c4
panic(c1a56ed3,0,0,0,0,...) at panic+0x1b/frame 0xc1d7e3d8
dblfault_handler() at dblfault_handler+0xa6/frame 0xc1d7e3d8
--- trap 0x17, eip = 0xc0a30252, esp = 0xeba10fc0, ebp = 0xeba11518 ---
random_fortuna_pre_read(73a4bcb5,eba11530,46,8c3680,c796b680,...) at random_fortuna_pre_read+0x22/frame 0xeba11518
read_random(eba11680,100,c0defb8f,c1db6600,41474b,...) at read_random+0x26/frame 0xeba11640
arc4rand(eba117d0,2,0,1,c78a4e00,...) at arc4rand+0x74/frame 0xeba11798
ip_fillid(c8b0a810,c8b0a810,14,2,1,...) at ip_fillid+0x103/frame 0xeba117f0
pfsync_sendout(c78a4e84,0,c2172130,683,0,...) at pfsync_sendout+0xbb/frame 0xeba11844
pfsync_insert_state(c92b0340,0,8000,0,10000000,...) at pfsync_insert_state+0x118/frame 0xeba11880
pf_state_insert(c796e000,c922cbc0,c922cbc0,c92b0340,8603,...) at pf_state_insert+0x87d/frame 0xeba118d8
pf_test_rule(1,c796d800,c82ae900,14,eba11c00,...) at pf_test_rule+0x397c/frame 0xeba11bb0
pf_test(1,c79c5400,eba11d04,0,c1dc5954,...) at pf_test+0x855/frame 0xeba11cb8
pf_check_in(0,eba11d04,c79c5400,1,0,...) at pf_check_in+0x29/frame 0xeba11cd8
pfil_run_hooks(c1dc5954,eba11e24,c79c5400,1,0,...) at pfil_run_hooks+0x88/frame 0xeba11d38
enc_hhook(3,2,c79c1b30,eba11e10,0,...) at enc_hhook+0x217/frame 0xeba11d80
hhook_run_hooks(c793ec80,eba11e10,0,c8d0ca40,eba11e78,...) at hhook_run_hooks+0xa1/frame 0xeba11dd8
ipsec_run_hhooks(eba11e10,3,10,1,c2427cac,...) at ipsec_run_hhooks+0x58/frame 0xeba11df0
ipsec4_common_input_cb(c82ae900,c8ff8500,14,9,40,...) at ipsec4_common_input_cb+0x512/frame 0xeba11e78
esp_input_cb(c90babf4,eba12658,c832908a,eba11fb8,c11550f9,...) at esp_input_cb+0x88f/frame 0xeba11f80
crypto_done(c90babf4,c832908a,8,eba12060,eba12070,...) at crypto_done+0x1b9/frame 0xeba11fb8
swcr_process(c75abc80,c90babf4,0,c27be8c0,80,...) at swcr_process+0xd97/frame 0xeba126b8
crypto_invoke(0,c8329092,c8feb038,c,c,...) at crypto_invoke+0x73/frame 0xeba126f0
crypto_dispatch(c90babf4,c18ac6b8,1ad,c8feb038,c20ab55a,...) at crypto_dispatch+0x65/frame 0xeba12718
esp_input(c82ae900,c8ff8500,14,9,d4,...) at esp_input+0x556/frame 0xeba127f8
ipsec_common_input(c82ae900,14,9,2,32,...) at ipsec_common_input+0x6e7/frame 0xeba1288c
esp4_input(eba128f4,eba128f0,32,1,0,...) at esp4_input+0x34/frame 0xeba128a8
ip_input(c82ae900,c0e69bf8,b7debbe1,80015188,5f5e9218,...) at ip_input+0x32b/frame 0xeba12918
netisr_dispatch_src(1,0,c82ae900) at netisr_dispatch_src+0xd0/frame 0xeba12960
netisr_dispatch(1,c82ae900,0,c82ae900,2,...) at netisr_dispatch+0x20/frame 0xeba12974
ether_demux(c7f69400,c82ae900,6,0,7470c88c,...) at ether_demux+0x131/frame 0xeba129a0
ether_nh_input(c82ae900,c0e69bf8,dc675435,80015188,5f5e9218,...) at ether_nh_input+0x383/frame 0xeba129f0
netisr_dispatch_src(5,0,c82ae900) at netisr_dispatch_src+0xd0/frame 0xeba12a38
netisr_dispatch(5,c82ae900,c78a7400,eba12ab4,c0f88053,...) at netisr_dispatch+0x20/frame 0xeba12a4c
ether_input(c7f69400,c82ae900,1,0,10000200,...) at ether_input+0x2a/frame 0xeba12a60
vlan_input(c78a7400,c82ae900,0,c82ae900,2,...) at vlan_input+0x223/frame 0xeba12ab4
ether_demux(c78a7400,c82ae900,6,0,c8684800,...) at ether_demux+0x9a/frame 0xeba12ae0
ether_nh_input(c82ae900,801,eba12b90,eba12b8c,c8632d00,...) at ether_nh_input+0x383/frame 0xeba12b2c
netisr_dispatch_src(5,0,c82ae900) at netisr_dispatch_src+0xd0/frame 0xeba12b74
netisr_dispatch(5,c82ae900,c796b680,eba12bac,c0f79729,...) at netisr_dispatch+0x20/frame 0xeba12b88
ether_input(c7981400,c82ae900,eba12c0c,c0790343,c7981400,...) at ether_input+0x2a/frame 0xeba12b9c
if_input(c7981400,c82ae900,1,0,c827d9c0,...) at if_input+0x19/frame 0xeba12bac
em_rxeof(c7981400,c1d4ef00,c793b5c8,0,c7935680,...) at em_rxeof+0x343/frame 0xeba12c0c
em_msix_rx(c7970900,c0e6523f,c796b680,0,109,...) at em_msix_rx+0x2f/frame 0xeba12c28
intr_event_execute_handlers(109,c793b580,c187b89f,555,aa55aa55,...) at intr_event_execute_handlers+0x299/frame 0xeba12c64
ithread_loop(c7971da0,eba12ce8,aa55aa55,aa55aa55,aa55aa55,...) at ithread_loop+0xc0/frame 0xeba12ca4
fork_exit(c0e084b0,c7971da0,eba12ce8) at fork_exit+0x71/frame 0xeba12cd4
fork_trampoline() at fork_trampoline+0x8/frame 0xeba12cd4
--- trap 0, eip = 0, esp = 0xeba12d20, ebp = 0 ---
KDB: enter: panic
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: franco on July 04, 2017, 03:44:51 pm
Which version did you come from prior to the update?
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 04, 2017, 09:42:32 pm
It was 17.1.4.
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 05, 2017, 05:14:11 am
I gave up (having the firewall reboot in the middle of phone calls is a deal breaker!), got a new SSD, and installed 17.1.4.  Restored my configuration, reinstalled my plugins, and rebooted.

(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-5.png)

I discovered that OPNSense doesn't like multi-boot.  I was hoping to have both SSD's installed and have the ability to boot into different firmware, but even if I boot off the second SSD, the firmware on the first still gets loaded, very odd.

Interestingly I noticed something on the dashboard that wasn't working in 17.1.8...

(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-5-Dashboard-Lobby-stargate.png)

The gateway status panel has content were as it previously did not.  I do remember this working before doing the update.  So perhaps there is indeed something broken in the latest firmware.
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 07, 2017, 10:24:50 am
Sadly the kernel panics continue.  I managed to capture the full output from the console. 

http://bin.morante.net/?a7abaf76b27003b4#tzPdX3k8gOD2+SGl4w7uvU2Bxi4/hsYlQcACBAuh1HI=
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 08, 2017, 09:13:34 pm
I'm starting to notice a pattern with the kernel panics.  They seem to happen regularly at ~7:30 UTC and ~13:00 UTC
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 10, 2017, 03:15:39 am
I'm assuming the lack of additional response is either this is a known bug, no one knows what is wrong, or no one wants to help?

Really hoping I can make this work.
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: franco on July 11, 2017, 05:33:07 pm
Hi tuaris,

No responses from me means not enough time for helping out here.

I don't expect the update is the issue. You could easily go back to an older kernel (it crashes there after all):

# opnsense-update -kr 17.1.4
# /usr/local/etc/rc.reboot

If the crashes continue this is due to heavy traffic and / or heat.

Your stack trace is also interesting in that it includes Firewall State Sync, IPsec and VLANs at the same time.

Also, how many services are you running? IPS? Web proxy? How is your RAM usage?

random_fortuna_pre_read() at the top is not a networking subsystem, the box crashes trying get random bytes for the kernel for an IP packet it tries to send out.

You could also also try to shape your traffic a bit to take the edge off... The Soekris net6501 isn't the fastest hardware around anymore.


Cheers,
Franco

PS: How is 17.1.9 performing?
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 12, 2017, 12:49:41 pm
Thanks I didn't mean to sound too negative.  That last post was made after the box crashed at the worst possible moment :). 

I have begun to notice a  pattern.  Whenever I put stress on it (by means of heavy VPN, VLAN, and sometimes traffic usage) it does seem to trigger the problem.  I use several VLANS, a few IPSec tunnels, the PPTP, uPNP plugins, and interface bonding with LAGG.  There are a several services running behind it using port forwards, VoIP, multiple HTTP services, mail, etc..

I totally understand it's a pretty taxing setup.  Interestingly enough with the exception of interface bonding, the previous device (a net4801) handled the load using m0n0wall (it's currently got an uptime of 780 days!).  I also get the difference between OPNSense vs m0n0wall is significant.

I purchased the higher end net6501-70 expecting that it would be more than capable of handling my needs (50mbits up/down and 200+ nodes).  I will try the packet shaping, I had started it but I found it a little harder to use than what I was used to with m0n0wall.

17.1.9 is performing well but still panics, but not as often.  I even shut off some logging and stats collection and it has improved slightly.
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: weust on July 12, 2017, 01:43:31 pm
Out of curiousity, how old is your net6501-70?
I bought mine fairly soon after they came out, and it died a few years later.
Was a known issue. Something with heat, iirc.

Mine was bought by franco, and afaik still work? ;-)
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: franco on July 12, 2017, 02:45:20 pm
It sounds like a heat problem indeed, it's summer-time after all. A fan might already help...

The Soekris from you is still up and running in a remote branch, dutifully pushing IPsec, but not doing any heavy lifting. :)


Cheers,
Franco
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: weust on July 12, 2017, 05:14:54 pm
I should have mentioned mine was within warrenty, so the board got replaced.
Had the newer/bigger heatsink on it.
But I had a -30, and the -70 has a fan on the heatsink, iirc?

Good to hear it's useful :-)
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 12, 2017, 05:17:12 pm
Out of curiousity, how old is your net6501-70?
I bought mine fairly soon after they came out, and it died a few years later.
Was a known issue. Something with heat, iirc.

Mine was bought by franco, and afaik still work? ;-)

Mine is no more than a month old.  Purchased brand new directly from Sokeris EU.   I've already contacted them about a possible hardware issue, but they are saying it's software related.  I guess the only way to really know for sure is to do some tests.
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: tuaris on July 12, 2017, 05:21:10 pm
It sounds like a heat problem indeed, it's summer-time after all. A fan might already help...

The Soekris from you is still up and running in a remote branch, dutifully pushing IPsec, but not doing any heavy lifting. :)


Cheers,
Franco

Currently at 67 C.   
(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-12DashboardLobbystargate%20morante%20com.png)
Title: Re: Kernel Panics after Updating to OPNsense 17.1.8
Post by: weust on July 12, 2017, 05:41:25 pm
Ok. Then you have the newer revision.
Temp is fine too. That CPU runs a bit hot, which is normal.
Title: Re: Kernel Panics
Post by: tuaris on July 19, 2017, 02:54:40 am
I've replaced the net6501-70 with a PC Engines APU2C0.

(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-19DashboardLobby.png)

This looks like it's a slightly more powerful device compared to the net6501.  Temperature is again okay.

(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-19DashboardLobby2.png)

After about 6 hours, it kernel panicked.
Title: Re: Kernel Panics
Post by: franco on July 19, 2017, 10:03:52 am
Can we have another backtrace ("bt" at prompt) of the panic just to be sure? It sounds like a programming error indeed then.

We need to see if it has been previously recorded over at https://bugs.freebsd.org/bugzilla/


Cheers,
Franco
Title: Re: Kernel Panics
Post by: tuaris on July 19, 2017, 12:16:36 pm
http://bin.morante.net/?1faf5a20ecc19403#3u0cf10sgz5Njd/dh5Hyz4m3fNOC36TO7jav85sa1L4=
Title: Re: Kernel Panics
Post by: tuaris on August 06, 2017, 01:54:36 am
I upgraded the PC Engines APU2C0 to 17.7 and about 1 hour later there was a kernel panic.  Also, the net6501-70 was left powered on but disconnected from the network.  The uptime on the net6501-70 is 18 days.
Title: Re: Kernel Panics
Post by: tuaris on August 06, 2017, 03:14:38 am
I think the problem may be related to WAN interface. 

I tried to remove the LAGG configuration but every time it gets to restarting the WAN interface, it kernel panics.  I successfully repeated this 3 times.

Code: [Select]
*** stargate.morante.com: OPNsense 17.7 (i386/OpenSSL) ***

 LAN (igb0_vlan1) -> v4: 192.168.0.100/24
 VMWARE (igb0_vlan3) -> v4: 10.8.8.1/24
 WAN (igb1_vlan100) -> v4: X.X.X.X/28

FreeBSD/i386 (stargate.morante.com) (ttyu0)

login: root
Password:
Last login: Sat Aug  5 20:29:43 on ttyu0
----------------------------------------------
|      Hello, this is OPNsense 17.7          |         @@@@@@@@@@@@@@@
|                                            |        @@@@         @@@@
| Website:      https://opnsense.org/        |         @@@\\\   ///@@@
| Handbook:     https://docs.opnsense.org/   |       ))))))))   ((((((((
| Forums:       https://forum.opnsense.org/  |         @@@///   \\\@@@
| Lists:        https://lists.opnsense.org/  |        @@@@         @@@@
| Code:         https://github.com/opnsense  |         @@@@@@@@@@@@@@@
----------------------------------------------

  0) Logout                              7) Ping host
  1) Assign interfaces                   8) Shell
  2) Set interface IP address            9) pfTop
  3) Reset the root password            10) Firewall log
  4) Reset to factory defaults          11) Reload all services
  5) Power off system                   12) Upgrade from console
  6) Reboot system                      13) Restore a backup

Enter an option: 1


Valid interfaces are:
igb0             00:0d:b9:46:74:2c Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
igb1             00:0d:b9:46:74:2d Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
igb0_vlan3       00:0d:b9:46:74:2c
igb1_vlan100     00:0d:b9:46:74:2d
igb0_vlan1       00:0d:b9:46:74:2c

You now have the opportunity to configure VLANs.  If you don't require VLANs
for initial connectivity, say no here and use the GUI to configure VLANs later.

Do you want to set up VLANs now? [y/N]: n


VLAN interfaces:

igb0_vlan3      VLAN tag 3, parent interface igb0
igb1_vlan100    VLAN tag 100, parent interface igb1
igb0_vlan1      VLAN tag 1, parent interface igb0

If you do not know the names of your interfaces, you may choose to use
auto-detection. In that case, disconnect all interfaces now before
hitting 'a' to initiate auto detection.

Enter the WAN interface name or 'a' for auto-detection: igb1_vlan100

Enter the LAN interface name or 'a' for auto-detection
NOTE: this enables full Firewalling/NAT mode.
(or nothing if finished): igb0

Optional interface 1 description found: VMWARE
Enter the Optional 1 interface name or 'a' for auto-detection
(or nothing if finished): igb0_vlan3

Enter the Optional 2 interface name or 'a' for auto-detection
(or nothing if finished):

The interfaces will be assigned as follows:

WAN  -> igb1_vlan100
LAN  -> igb0
OPT1 -> igb0_vlan3

Do you want to proceed? [y/N]: y

Writing configuration...done.
Configuring loopback interface...done.
Creating wireless clone interfaces...done.
Configuring LAGG interfaces...done.
Configuring VLAN interfaces...done.
Configuring LAN interface...done.
Configuring VMWARE interface...done.
Configuring WAN interface...!▒▒K▒▒▒|    ▒܌k▒r+▒!DJ▒▒a▒H#)▒▒c!▒ic▒#Zj▒I)
                                                                       ci▒ici#▒
                                                                               Z)�#▒
                                                                                    F▒#a!#I
                                                                                           /+)▒!▒
                                                                                                 ▒!ai▒!!i▒!!#!i▒`a##!#i▒+#!#▒a▒I▒▒iɬ#▒B
                                                                                                                                       Ζx▒R)K!!▒!▒^▒a▒▒Z!S+▒ia▒#  X)!)▒▒
                              c+X▒
                                  a#▒J▒!▒Fkc*cA▒##z▒i▒c▒▒jB(c▒▒#!#!F▒#  a!#!ca
                                                                              ▒!ٌ+!i▒j~c▒ha▒Z▒▒/'#NH▒+!k!)▒#aaC▒)H▒▒▒▒▒߰L▒▒▒r▒▒▒g=c▒LC▒▒▒ .)Y!▒3.▒▒▒▒Qiq▒BH+▒▒▒▒
                     ▒▒1▒B𒌀▒`▒▒▒▒9▒O^
                                     ,!▒
                                        H,h6+▒▒m▒I▒^▒K~▒]
Title: Re: Kernel Panics
Post by: tuaris on August 07, 2017, 05:48:23 pm
Wasn't LAGG.  Now trying without VLAN's (this is a deal breaker btw)
Title: Re: Kernel Panics
Post by: franco on August 08, 2017, 08:53:54 am
We suspect a configuration order issue, not a kernel change. Did you send in a crash report?

Can you send /attach  a config.xml that would crash with this VLAN + WAN combination?


Thanks,
Franco
Title: Re: Kernel Panics
Post by: tuaris on August 08, 2017, 10:23:48 am
Yes, I sent in the crash report. 
Here is the original config.xml section with LAGG + VLAN

Code: [Select]
  <interfaces>
    <wan>
      <enable>1</enable>
      <if>lagg1_vlan100</if>
      <ipaddr>63.X.X.X</ipaddr>
      <ipaddrv6>dhcpv6</ipaddrv6>
      <subnet>28</subnet>
      <gateway>WANGW</gateway>
      <blockpriv>on</blockpriv>
      <blockbogons>on</blockbogons>
      <media/>
      <mediaopt/>
      <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
      <descr>WAN</descr>
    </wan>
    <lan>
      <if>lagg0</if>
      <descr>LAN</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>192.168.0.100</ipaddr>
      <subnet>24</subnet>
    </lan>
    <opt1>
      <if>lagg0_vlan3</if>
      <descr>VMWARE</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>10.8.8.1</ipaddr>
      <subnet>24</subnet>
    </opt1>
    <enc0>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <if>enc0</if>
      <descr>IPsec</descr>
      <type>none</type>
      <virtual>1</virtual>
    </enc0>
    <pptp>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <networks>
        <network>192.168.0.192</network>
        <mask>28</mask>
      </networks>
      <virtual>1</virtual>
      <if>pptp</if>
      <type>group</type>
      <descr>pptp</descr>
    </pptp>
  </interfaces>
...
  <gateways>
    <gateway_item>
      <interface>wan</interface>
      <gateway>63.X.X.X</gateway>
      <name>WANGW</name>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval/>
      <descr>WAN Gateway</descr>
      <avg_delay_samples/>
      <avg_loss_samples/>
      <avg_loss_delay_samples/>
      <monitor_disable>1</monitor_disable>
      <defaultgw>1</defaultgw>
    </gateway_item>
  </gateways>
  <laggs>
    <lagg>
      <members>igb0</members>
      <descr>Uplink to Switch</descr>
      <laggif>lagg0</laggif>
      <proto>lacp</proto>
    </lagg>
    <lagg>
      <members>igb1</members>
      <descr>Uplink to Internet</descr>
      <laggif>lagg1</laggif>
      <proto>lacp</proto>
    </lagg>
  </laggs>
  <vlans>
    <vlan>
      <if>lagg0</if>
      <tag>3</tag>
      <pcp>0</pcp>
      <descr>VMWare</descr>
      <vlanif>lagg0_vlan3</vlanif>
    </vlan>
    <vlan>
      <if>lagg1</if>
      <tag>100</tag>
      <pcp>0</pcp>
      <descr>Internet</descr>
      <vlanif>lagg1_vlan100</vlanif>
    </vlan>
  </vlans>

Here's the version without LAGG.  I should point out that when attempting to apply this specific configuration, it would never get to the point of saving the config (kernel panic).  I had to manually edit config.xml and do a restore.

Code: [Select]
  <interfaces>
    <wan>
      <enable>1</enable>
      <if>igb1_vlan100</if>
      <ipaddr>63.X.X.X</ipaddr>
      <ipaddrv6>dhcpv6</ipaddrv6>
      <subnet>28</subnet>
      <gateway>WANGW</gateway>
      <blockpriv>on</blockpriv>
      <blockbogons>on</blockbogons>
      <media/>
      <mediaopt/>
      <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
      <descr>WAN</descr>
    </wan>
    <lan>
      <if>igb0</if>
      <descr>LAN</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>192.168.0.100</ipaddr>
      <subnet>24</subnet>
    </lan>
    <opt1>
      <if>igb0_vlan3</if>
      <descr>VMWARE</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>10.8.8.1</ipaddr>
      <subnet>24</subnet>
    </opt1>
    <enc0>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <if>enc0</if>
      <descr>IPsec</descr>
      <type>none</type>
      <virtual>1</virtual>
    </enc0>
    <pptp>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <networks>
        <network>192.168.0.192</network>
        <mask>28</mask>
      </networks>
      <virtual>1</virtual>
      <if>pptp</if>
      <type>group</type>
      <descr>pptp</descr>
    </pptp>
  </interfaces>
...
  <gateways>
    <gateway_item>
      <interface>wan</interface>
      <gateway>63.X.X.X</gateway>
      <name>WANGW</name>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval/>
      <descr>WAN Gateway</descr>
      <avg_delay_samples/>
      <avg_loss_samples/>
      <avg_loss_delay_samples/>
      <monitor_disable>1</monitor_disable>
      <defaultgw>1</defaultgw>
    </gateway_item>
  </gateways>
  <vlans>
    <vlan>
      <if>igb0</if>
      <tag>3</tag>
      <pcp>0</pcp>
      <descr>VMWare</descr>
      <vlanif>igb0_vlan3</vlanif>
    </vlan>
    <vlan>
      <if>igb1</if>
      <tag>100</tag>
      <pcp>0</pcp>
      <descr>Internet</descr>
      <vlanif>igb1_vlan100</vlanif>
    </vlan>
  </vlans>

Finally, here is the current running config without VLAN on the WAN

Code: [Select]
  <interfaces>
    <wan>
      <enable>1</enable>
      <if>igb1</if>
      <ipaddr>63.X.X.X</ipaddr>
      <ipaddrv6>dhcpv6</ipaddrv6>
      <subnet>28</subnet>
      <gateway>WANGW</gateway>
      <blockpriv>on</blockpriv>
      <blockbogons>on</blockbogons>
      <media/>
      <mediaopt/>
      <dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
      <descr>WAN</descr>
    </wan>
    <lan>
      <if>igb0</if>
      <descr>LAN</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>192.168.0.100</ipaddr>
      <subnet>24</subnet>
    </lan>
    <opt1>
      <if>igb0_vlan3</if>
      <descr>VMWARE</descr>
      <enable>1</enable>
      <spoofmac/>
      <ipaddr>10.8.8.1</ipaddr>
      <subnet>24</subnet>
    </opt1>
    <enc0>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <if>enc0</if>
      <descr>IPsec</descr>
      <type>none</type>
      <virtual>1</virtual>
    </enc0>
    <pptp>
      <internal_dynamic>1</internal_dynamic>
      <enable>1</enable>
      <networks>
        <network>192.168.0.192</network>
        <mask>28</mask>
      </networks>
      <virtual>1</virtual>
      <if>pptp</if>
      <type>group</type>
      <descr>pptp</descr>
    </pptp>
  </interfaces>
...
  <gateways>
    <gateway_item>
      <interface>wan</interface>
      <gateway>63.X.X.X</gateway>
      <name>WANGW</name>
      <weight>1</weight>
      <ipprotocol>inet</ipprotocol>
      <interval/>
      <descr>WAN Gateway</descr>
      <avg_delay_samples/>
      <avg_loss_samples/>
      <avg_loss_delay_samples/>
      <monitor_disable>1</monitor_disable>
      <defaultgw>1</defaultgw>
    </gateway_item>
  </gateways>
  <vlans>
    <vlan>
      <if>igb0</if>
      <tag>3</tag>
      <pcp>0</pcp>
      <descr>VMWare</descr>
      <vlanif>igb0_vlan3</vlanif>
    </vlan>
  </vlans>

So far the router has an uptime of 16:36:47, I have a good feeling that VLAN on the WAN might be the cause.
Title: Re: Kernel Panics
Post by: tuaris on August 08, 2017, 07:21:21 pm
The uptime is now 1 days 01:36:25.  I think that was it.  Having VLAN setup on the WAN interface is causing something to kernel panic.
Title: Re: Kernel Panics
Post by: tuaris on August 11, 2017, 09:38:58 am
It just had a kernel panic.  I sent in the report.
Title: Re: Kernel Panics
Post by: tuaris on August 15, 2017, 02:32:35 am
Since the last kernel panic, it's been getting progressively worse.  I'm experiencing panics 4+ times daily.  Not sure what else to try at this point.
Title: Re: Kernel Panics
Post by: franco on August 16, 2017, 08:36:39 am
Here is a patch to try for the PPPoE+LAGG+VLAN issue on 17.7:

https://github.com/opnsense/core/commit/065244edf

Apply with

# opnsense-patch 065244edf

We could also try an older kernel, but at this point it seems to be a dormant bug that we simply trigger due to our changes in the interface configuration code...


Cheers,
Franco
Title: Re: Kernel Panics
Post by: interkrome on August 25, 2017, 10:35:44 am
Seems like i'm having the similar issue
Title: Re: Kernel Panics
Post by: tuaris on September 03, 2017, 08:06:36 am
Seems like i'm having the similar issue

Patch applied and then I rebooted.

Code: [Select]
opnsense-patch 065244edf
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|From 065244edf60aede23224f73732a8b18e494d46bf Mon Sep 17 00:00:00 2001
|From: Franco Fichtner <franco@opnsense.org>
|Date: Thu, 10 Aug 2017 15:15:42 +0200
|Subject: [PATCH] interfaces: the renaming in one ifconfig may be unstable
|
|(cherry picked from commit a7ca1661302bd200dbbcf8ba700fed36a167ad98)
|(cherry picked from commit 713f8b8d487d965f76b803e14f6a70fe51124f80)
|---
| src/etc/inc/interfaces.lib.inc | 18 +++++++++++++++---
| 1 file changed, 15 insertions(+), 3 deletions(-)
|
|diff --git a/src/etc/inc/interfaces.lib.inc b/src/etc/inc/interfaces.lib.inc
|index 198237abd..1ad59b7fb 100644
|--- a/src/etc/inc/interfaces.lib.inc
|+++ b/src/etc/inc/interfaces.lib.inc
--------------------------
Patching file etc/inc/interfaces.lib.inc using Plan A...
Hunk #1 succeeded at 72.
Hunk #2 succeeded at 97.
done
All patches have been applied successfully.  Have a nice day.
root@stargate:~ # reboot
Title: Re: Kernel Panics
Post by: tuaris on September 04, 2017, 08:34:16 am
Just had another kernel panic.  Looks like the patch didn't work.

Title: Re: Kernel Panics
Post by: tuaris on September 09, 2017, 08:03:50 pm
The main inconvenience here is how long the device takes to actually shutdown the OS and start the reboot.   It takes about 5 minutes to do the dump for the stack trace.  Anyway we can disable that and speed up the process a bit?  I'm willing to live with this if the recovery period can be speed-up.
Title: Re: Kernel Panics
Post by: franco on September 11, 2017, 03:17:48 pm
Add this as a file under /usr/local/etc/rc.syshook.d/00-reset.early

Code: [Select]
#!/bin/sh
cp /etc/ddb.conf /etc/ddb.conf.bak
(grep -v kdb.enter.panic /etc/ddb.conf.bak; echo "script kdb.enter.default=reset") > /etc/ddb.conf
rm -f /etc/ddb.conf.bak
ddb /etc/ddb.conf

# chmod 700 /usr/local/etc/rc.syshook.d/00-reset.early
# /usr/local/etc/rc.syshook.d/00-reset.early

It should automatically apply after reboot.

You can emulate a crash with the following command to confirm:

# sysctl debug.kdb.panic=1


Cheers,
Franco