I'm on a Sokeris net6501 and after updating to OPNsense 17.1.8-i386 the firewall is kernel panicking at random intervals (sometimes 9 hours sometimes 2, sometimes a few minutes.).
Fatal double fault:
eip = 0xc0a30252
esp = 0xeba10fc0
ebp = 0xeba11518
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper(c188419d,ff,c1b0c1e0,c1b0c1f0,c796b680,...) at db_trace_self_wrapper+0x2a/frame 0xc1d7e328
kdb_backtrace(c1a5ffcd,0,c1a56ed3,c1d7e3e4,0,...) at kdb_backtrace+0x2d/frame 0xc1d7e390
vpanic(c1a56ed3,c1d7e3e4,c1d7e3e4,c1d7e3ec,c14922b6,...) at vpanic+0x114/frame 0xc1d7e3c4
panic(c1a56ed3,0,0,0,0,...) at panic+0x1b/frame 0xc1d7e3d8
dblfault_handler() at dblfault_handler+0xa6/frame 0xc1d7e3d8
--- trap 0x17, eip = 0xc0a30252, esp = 0xeba10fc0, ebp = 0xeba11518 ---
random_fortuna_pre_read(73a4bcb5,eba11530,46,8c3680,c796b680,...) at random_fortuna_pre_read+0x22/frame 0xeba11518
read_random(eba11680,100,c0defb8f,c1db6600,41474b,...) at read_random+0x26/frame 0xeba11640
arc4rand(eba117d0,2,0,1,c78a4e00,...) at arc4rand+0x74/frame 0xeba11798
ip_fillid(c8b0a810,c8b0a810,14,2,1,...) at ip_fillid+0x103/frame 0xeba117f0
pfsync_sendout(c78a4e84,0,c2172130,683,0,...) at pfsync_sendout+0xbb/frame 0xeba11844
pfsync_insert_state(c92b0340,0,8000,0,10000000,...) at pfsync_insert_state+0x118/frame 0xeba11880
pf_state_insert(c796e000,c922cbc0,c922cbc0,c92b0340,8603,...) at pf_state_insert+0x87d/frame 0xeba118d8
pf_test_rule(1,c796d800,c82ae900,14,eba11c00,...) at pf_test_rule+0x397c/frame 0xeba11bb0
pf_test(1,c79c5400,eba11d04,0,c1dc5954,...) at pf_test+0x855/frame 0xeba11cb8
pf_check_in(0,eba11d04,c79c5400,1,0,...) at pf_check_in+0x29/frame 0xeba11cd8
pfil_run_hooks(c1dc5954,eba11e24,c79c5400,1,0,...) at pfil_run_hooks+0x88/frame 0xeba11d38
enc_hhook(3,2,c79c1b30,eba11e10,0,...) at enc_hhook+0x217/frame 0xeba11d80
hhook_run_hooks(c793ec80,eba11e10,0,c8d0ca40,eba11e78,...) at hhook_run_hooks+0xa1/frame 0xeba11dd8
ipsec_run_hhooks(eba11e10,3,10,1,c2427cac,...) at ipsec_run_hhooks+0x58/frame 0xeba11df0
ipsec4_common_input_cb(c82ae900,c8ff8500,14,9,40,...) at ipsec4_common_input_cb+0x512/frame 0xeba11e78
esp_input_cb(c90babf4,eba12658,c832908a,eba11fb8,c11550f9,...) at esp_input_cb+0x88f/frame 0xeba11f80
crypto_done(c90babf4,c832908a,8,eba12060,eba12070,...) at crypto_done+0x1b9/frame 0xeba11fb8
swcr_process(c75abc80,c90babf4,0,c27be8c0,80,...) at swcr_process+0xd97/frame 0xeba126b8
crypto_invoke(0,c8329092,c8feb038,c,c,...) at crypto_invoke+0x73/frame 0xeba126f0
crypto_dispatch(c90babf4,c18ac6b8,1ad,c8feb038,c20ab55a,...) at crypto_dispatch+0x65/frame 0xeba12718
esp_input(c82ae900,c8ff8500,14,9,d4,...) at esp_input+0x556/frame 0xeba127f8
ipsec_common_input(c82ae900,14,9,2,32,...) at ipsec_common_input+0x6e7/frame 0xeba1288c
esp4_input(eba128f4,eba128f0,32,1,0,...) at esp4_input+0x34/frame 0xeba128a8
ip_input(c82ae900,c0e69bf8,b7debbe1,80015188,5f5e9218,...) at ip_input+0x32b/frame 0xeba12918
netisr_dispatch_src(1,0,c82ae900) at netisr_dispatch_src+0xd0/frame 0xeba12960
netisr_dispatch(1,c82ae900,0,c82ae900,2,...) at netisr_dispatch+0x20/frame 0xeba12974
ether_demux(c7f69400,c82ae900,6,0,7470c88c,...) at ether_demux+0x131/frame 0xeba129a0
ether_nh_input(c82ae900,c0e69bf8,dc675435,80015188,5f5e9218,...) at ether_nh_input+0x383/frame 0xeba129f0
netisr_dispatch_src(5,0,c82ae900) at netisr_dispatch_src+0xd0/frame 0xeba12a38
netisr_dispatch(5,c82ae900,c78a7400,eba12ab4,c0f88053,...) at netisr_dispatch+0x20/frame 0xeba12a4c
ether_input(c7f69400,c82ae900,1,0,10000200,...) at ether_input+0x2a/frame 0xeba12a60
vlan_input(c78a7400,c82ae900,0,c82ae900,2,...) at vlan_input+0x223/frame 0xeba12ab4
ether_demux(c78a7400,c82ae900,6,0,c8684800,...) at ether_demux+0x9a/frame 0xeba12ae0
ether_nh_input(c82ae900,801,eba12b90,eba12b8c,c8632d00,...) at ether_nh_input+0x383/frame 0xeba12b2c
netisr_dispatch_src(5,0,c82ae900) at netisr_dispatch_src+0xd0/frame 0xeba12b74
netisr_dispatch(5,c82ae900,c796b680,eba12bac,c0f79729,...) at netisr_dispatch+0x20/frame 0xeba12b88
ether_input(c7981400,c82ae900,eba12c0c,c0790343,c7981400,...) at ether_input+0x2a/frame 0xeba12b9c
if_input(c7981400,c82ae900,1,0,c827d9c0,...) at if_input+0x19/frame 0xeba12bac
em_rxeof(c7981400,c1d4ef00,c793b5c8,0,c7935680,...) at em_rxeof+0x343/frame 0xeba12c0c
em_msix_rx(c7970900,c0e6523f,c796b680,0,109,...) at em_msix_rx+0x2f/frame 0xeba12c28
intr_event_execute_handlers(109,c793b580,c187b89f,555,aa55aa55,...) at intr_event_execute_handlers+0x299/frame 0xeba12c64
ithread_loop(c7971da0,eba12ce8,aa55aa55,aa55aa55,aa55aa55,...) at ithread_loop+0xc0/frame 0xeba12ca4
fork_exit(c0e084b0,c7971da0,eba12ce8) at fork_exit+0x71/frame 0xeba12cd4
fork_trampoline() at fork_trampoline+0x8/frame 0xeba12cd4
--- trap 0, eip = 0, esp = 0xeba12d20, ebp = 0 ---
KDB: enter: panic
Which version did you come from prior to the update?
It was 17.1.4.
I gave up (having the firewall reboot in the middle of phone calls is a deal breaker!), got a new SSD, and installed 17.1.4. Restored my configuration, reinstalled my plugins, and rebooted.
(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-5.png)
I discovered that OPNSense doesn't like multi-boot. I was hoping to have both SSD's installed and have the ability to boot into different firmware, but even if I boot off the second SSD, the firmware on the first still gets loaded, very odd.
Interestingly I noticed something on the dashboard that wasn't working in 17.1.8...
(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-5-Dashboard-Lobby-stargate.png)
The gateway status panel has content were as it previously did not. I do remember this working before doing the update. So perhaps there is indeed something broken in the latest firmware.
Sadly the kernel panics continue. I managed to capture the full output from the console.
http://bin.morante.net/?a7abaf76b27003b4#tzPdX3k8gOD2+SGl4w7uvU2Bxi4/hsYlQcACBAuh1HI=
I'm starting to notice a pattern with the kernel panics. They seem to happen regularly at ~7:30 UTC and ~13:00 UTC
I'm assuming the lack of additional response is either this is a known bug, no one knows what is wrong, or no one wants to help?
Really hoping I can make this work.
Hi tuaris,
No responses from me means not enough time for helping out here.
I don't expect the update is the issue. You could easily go back to an older kernel (it crashes there after all):
# opnsense-update -kr 17.1.4
# /usr/local/etc/rc.reboot
If the crashes continue this is due to heavy traffic and / or heat.
Your stack trace is also interesting in that it includes Firewall State Sync, IPsec and VLANs at the same time.
Also, how many services are you running? IPS? Web proxy? How is your RAM usage?
random_fortuna_pre_read() at the top is not a networking subsystem, the box crashes trying get random bytes for the kernel for an IP packet it tries to send out.
You could also also try to shape your traffic a bit to take the edge off... The Soekris net6501 isn't the fastest hardware around anymore.
Cheers,
Franco
PS: How is 17.1.9 performing?
Thanks I didn't mean to sound too negative. That last post was made after the box crashed at the worst possible moment :).
I have begun to notice a pattern. Whenever I put stress on it (by means of heavy VPN, VLAN, and sometimes traffic usage) it does seem to trigger the problem. I use several VLANS, a few IPSec tunnels, the PPTP, uPNP plugins, and interface bonding with LAGG. There are a several services running behind it using port forwards, VoIP, multiple HTTP services, mail, etc..
I totally understand it's a pretty taxing setup. Interestingly enough with the exception of interface bonding, the previous device (a net4801) handled the load using m0n0wall (it's currently got an uptime of 780 days!). I also get the difference between OPNSense vs m0n0wall is significant.
I purchased the higher end net6501-70 expecting that it would be more than capable of handling my needs (50mbits up/down and 200+ nodes). I will try the packet shaping, I had started it but I found it a little harder to use than what I was used to with m0n0wall.
17.1.9 is performing well but still panics, but not as often. I even shut off some logging and stats collection and it has improved slightly.
Out of curiousity, how old is your net6501-70?
I bought mine fairly soon after they came out, and it died a few years later.
Was a known issue. Something with heat, iirc.
Mine was bought by franco, and afaik still work? ;-)
It sounds like a heat problem indeed, it's summer-time after all. A fan might already help...
The Soekris from you is still up and running in a remote branch, dutifully pushing IPsec, but not doing any heavy lifting. :)
Cheers,
Franco
I should have mentioned mine was within warrenty, so the board got replaced.
Had the newer/bigger heatsink on it.
But I had a -30, and the -70 has a fan on the heatsink, iirc?
Good to hear it's useful :-)
Quote from: weust on July 12, 2017, 01:43:31 PM
Out of curiousity, how old is your net6501-70?
I bought mine fairly soon after they came out, and it died a few years later.
Was a known issue. Something with heat, iirc.
Mine was bought by franco, and afaik still work? ;-)
Mine is no more than a month old. Purchased brand new directly from Sokeris EU. I've already contacted them about a possible hardware issue, but they are saying it's software related. I guess the only way to really know for sure is to do some tests.
Quote from: franco on July 12, 2017, 02:45:20 PM
It sounds like a heat problem indeed, it's summer-time after all. A fan might already help...
The Soekris from you is still up and running in a remote branch, dutifully pushing IPsec, but not doing any heavy lifting. :)
Cheers,
Franco
Currently at 67 C.
(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-12DashboardLobbystargate%20morante%20com.png)
Ok. Then you have the newer revision.
Temp is fine too. That CPU runs a bit hot, which is normal.
I've replaced the net6501-70 with a PC Engines APU2C0.
(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-19DashboardLobby.png)
This looks like it's a slightly more powerful device compared to the net6501. Temperature is again okay.
(http://venus.morante.net/downloads/unibia/screenshots/Screenshot-2017-7-19DashboardLobby2.png)
After about 6 hours, it kernel panicked.
Can we have another backtrace ("bt" at prompt) of the panic just to be sure? It sounds like a programming error indeed then.
We need to see if it has been previously recorded over at https://bugs.freebsd.org/bugzilla/
Cheers,
Franco
http://bin.morante.net/?1faf5a20ecc19403#3u0cf10sgz5Njd/dh5Hyz4m3fNOC36TO7jav85sa1L4=
I upgraded the PC Engines APU2C0 to 17.7 and about 1 hour later there was a kernel panic. Also, the net6501-70 was left powered on but disconnected from the network. The uptime on the net6501-70 is 18 days.
I think the problem may be related to WAN interface.
I tried to remove the LAGG configuration but every time it gets to restarting the WAN interface, it kernel panics. I successfully repeated this 3 times.
*** stargate.morante.com: OPNsense 17.7 (i386/OpenSSL) ***
LAN (igb0_vlan1) -> v4: 192.168.0.100/24
VMWARE (igb0_vlan3) -> v4: 10.8.8.1/24
WAN (igb1_vlan100) -> v4: X.X.X.X/28
FreeBSD/i386 (stargate.morante.com) (ttyu0)
login: root
Password:
Last login: Sat Aug 5 20:29:43 on ttyu0
----------------------------------------------
| Hello, this is OPNsense 17.7 | @@@@@@@@@@@@@@@
| | @@@@ @@@@
| Website: https://opnsense.org/ | @@@\\\ ///@@@
| Handbook: https://docs.opnsense.org/ | )))))))) ((((((((
| Forums: https://forum.opnsense.org/ | @@@/// \\\@@@
| Lists: https://lists.opnsense.org/ | @@@@ @@@@
| Code: https://github.com/opnsense | @@@@@@@@@@@@@@@
----------------------------------------------
0) Logout 7) Ping host
1) Assign interfaces 8) Shell
2) Set interface IP address 9) pfTop
3) Reset the root password 10) Firewall log
4) Reset to factory defaults 11) Reload all services
5) Power off system 12) Upgrade from console
6) Reboot system 13) Restore a backup
Enter an option: 1
Valid interfaces are:
igb0 00:0d:b9:46:74:2c Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
igb1 00:0d:b9:46:74:2d Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
igb0_vlan3 00:0d:b9:46:74:2c
igb1_vlan100 00:0d:b9:46:74:2d
igb0_vlan1 00:0d:b9:46:74:2c
You now have the opportunity to configure VLANs. If you don't require VLANs
for initial connectivity, say no here and use the GUI to configure VLANs later.
Do you want to set up VLANs now? [y/N]: n
VLAN interfaces:
igb0_vlan3 VLAN tag 3, parent interface igb0
igb1_vlan100 VLAN tag 100, parent interface igb1
igb0_vlan1 VLAN tag 1, parent interface igb0
If you do not know the names of your interfaces, you may choose to use
auto-detection. In that case, disconnect all interfaces now before
hitting 'a' to initiate auto detection.
Enter the WAN interface name or 'a' for auto-detection: igb1_vlan100
Enter the LAN interface name or 'a' for auto-detection
NOTE: this enables full Firewalling/NAT mode.
(or nothing if finished): igb0
Optional interface 1 description found: VMWARE
Enter the Optional 1 interface name or 'a' for auto-detection
(or nothing if finished): igb0_vlan3
Enter the Optional 2 interface name or 'a' for auto-detection
(or nothing if finished):
The interfaces will be assigned as follows:
WAN -> igb1_vlan100
LAN -> igb0
OPT1 -> igb0_vlan3
Do you want to proceed? [y/N]: y
Writing configuration...done.
Configuring loopback interface...done.
Creating wireless clone interfaces...done.
Configuring LAGG interfaces...done.
Configuring VLAN interfaces...done.
Configuring LAN interface...done.
Configuring VMWARE interface...done.
Configuring WAN interface...!▒▒K▒▒▒| ▒܌k▒r+▒!DJ▒▒a▒H#)▒▒c!▒ic▒#Zj▒I)
ci▒ici#▒
Z)�#▒
F▒#a!#I
/+)▒!▒
▒!ai▒!!i▒!!#!i▒`a##!#i▒+#!#▒a▒I▒▒iɬ#▒B
Ζx▒R)K!!▒!▒^▒a▒▒Z!S+▒ia▒# X)!)▒▒
c+X▒
a#▒J▒!▒Fkc*cA▒##z▒i▒c▒▒jB(c▒▒#!#!F▒# a!#!ca
▒!ٌ+!i▒j~c▒ha▒Z▒▒/'#NH▒+!k!)▒#aaC▒)H▒▒▒▒▒߰L▒▒▒r▒▒▒g=c▒LC▒▒▒ .)Y!▒3.▒▒▒▒Qiq▒BH+▒▒▒▒
▒▒1▒B𒌀▒`▒▒▒▒9▒O^
,!▒
H,h6+▒▒m▒I▒^▒K~▒]
Wasn't LAGG. Now trying without VLAN's (this is a deal breaker btw)
We suspect a configuration order issue, not a kernel change. Did you send in a crash report?
Can you send /attach a config.xml that would crash with this VLAN + WAN combination?
Thanks,
Franco
Yes, I sent in the crash report.
Here is the original config.xml section with LAGG + VLAN
<interfaces>
<wan>
<enable>1</enable>
<if>lagg1_vlan100</if>
<ipaddr>63.X.X.X</ipaddr>
<ipaddrv6>dhcpv6</ipaddrv6>
<subnet>28</subnet>
<gateway>WANGW</gateway>
<blockpriv>on</blockpriv>
<blockbogons>on</blockbogons>
<media/>
<mediaopt/>
<dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
<descr>WAN</descr>
</wan>
<lan>
<if>lagg0</if>
<descr>LAN</descr>
<enable>1</enable>
<spoofmac/>
<ipaddr>192.168.0.100</ipaddr>
<subnet>24</subnet>
</lan>
<opt1>
<if>lagg0_vlan3</if>
<descr>VMWARE</descr>
<enable>1</enable>
<spoofmac/>
<ipaddr>10.8.8.1</ipaddr>
<subnet>24</subnet>
</opt1>
<enc0>
<internal_dynamic>1</internal_dynamic>
<enable>1</enable>
<if>enc0</if>
<descr>IPsec</descr>
<type>none</type>
<virtual>1</virtual>
</enc0>
<pptp>
<internal_dynamic>1</internal_dynamic>
<enable>1</enable>
<networks>
<network>192.168.0.192</network>
<mask>28</mask>
</networks>
<virtual>1</virtual>
<if>pptp</if>
<type>group</type>
<descr>pptp</descr>
</pptp>
</interfaces>
...
<gateways>
<gateway_item>
<interface>wan</interface>
<gateway>63.X.X.X</gateway>
<name>WANGW</name>
<weight>1</weight>
<ipprotocol>inet</ipprotocol>
<interval/>
<descr>WAN Gateway</descr>
<avg_delay_samples/>
<avg_loss_samples/>
<avg_loss_delay_samples/>
<monitor_disable>1</monitor_disable>
<defaultgw>1</defaultgw>
</gateway_item>
</gateways>
<laggs>
<lagg>
<members>igb0</members>
<descr>Uplink to Switch</descr>
<laggif>lagg0</laggif>
<proto>lacp</proto>
</lagg>
<lagg>
<members>igb1</members>
<descr>Uplink to Internet</descr>
<laggif>lagg1</laggif>
<proto>lacp</proto>
</lagg>
</laggs>
<vlans>
<vlan>
<if>lagg0</if>
<tag>3</tag>
<pcp>0</pcp>
<descr>VMWare</descr>
<vlanif>lagg0_vlan3</vlanif>
</vlan>
<vlan>
<if>lagg1</if>
<tag>100</tag>
<pcp>0</pcp>
<descr>Internet</descr>
<vlanif>lagg1_vlan100</vlanif>
</vlan>
</vlans>
Here's the version without LAGG. I should point out that when attempting to apply this specific configuration, it would never get to the point of saving the config (kernel panic). I had to manually edit config.xml and do a restore.
<interfaces>
<wan>
<enable>1</enable>
<if>igb1_vlan100</if>
<ipaddr>63.X.X.X</ipaddr>
<ipaddrv6>dhcpv6</ipaddrv6>
<subnet>28</subnet>
<gateway>WANGW</gateway>
<blockpriv>on</blockpriv>
<blockbogons>on</blockbogons>
<media/>
<mediaopt/>
<dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
<descr>WAN</descr>
</wan>
<lan>
<if>igb0</if>
<descr>LAN</descr>
<enable>1</enable>
<spoofmac/>
<ipaddr>192.168.0.100</ipaddr>
<subnet>24</subnet>
</lan>
<opt1>
<if>igb0_vlan3</if>
<descr>VMWARE</descr>
<enable>1</enable>
<spoofmac/>
<ipaddr>10.8.8.1</ipaddr>
<subnet>24</subnet>
</opt1>
<enc0>
<internal_dynamic>1</internal_dynamic>
<enable>1</enable>
<if>enc0</if>
<descr>IPsec</descr>
<type>none</type>
<virtual>1</virtual>
</enc0>
<pptp>
<internal_dynamic>1</internal_dynamic>
<enable>1</enable>
<networks>
<network>192.168.0.192</network>
<mask>28</mask>
</networks>
<virtual>1</virtual>
<if>pptp</if>
<type>group</type>
<descr>pptp</descr>
</pptp>
</interfaces>
...
<gateways>
<gateway_item>
<interface>wan</interface>
<gateway>63.X.X.X</gateway>
<name>WANGW</name>
<weight>1</weight>
<ipprotocol>inet</ipprotocol>
<interval/>
<descr>WAN Gateway</descr>
<avg_delay_samples/>
<avg_loss_samples/>
<avg_loss_delay_samples/>
<monitor_disable>1</monitor_disable>
<defaultgw>1</defaultgw>
</gateway_item>
</gateways>
<vlans>
<vlan>
<if>igb0</if>
<tag>3</tag>
<pcp>0</pcp>
<descr>VMWare</descr>
<vlanif>igb0_vlan3</vlanif>
</vlan>
<vlan>
<if>igb1</if>
<tag>100</tag>
<pcp>0</pcp>
<descr>Internet</descr>
<vlanif>igb1_vlan100</vlanif>
</vlan>
</vlans>
Finally, here is the current running config without VLAN on the WAN
<interfaces>
<wan>
<enable>1</enable>
<if>igb1</if>
<ipaddr>63.X.X.X</ipaddr>
<ipaddrv6>dhcpv6</ipaddrv6>
<subnet>28</subnet>
<gateway>WANGW</gateway>
<blockpriv>on</blockpriv>
<blockbogons>on</blockbogons>
<media/>
<mediaopt/>
<dhcp6-ia-pd-len>0</dhcp6-ia-pd-len>
<descr>WAN</descr>
</wan>
<lan>
<if>igb0</if>
<descr>LAN</descr>
<enable>1</enable>
<spoofmac/>
<ipaddr>192.168.0.100</ipaddr>
<subnet>24</subnet>
</lan>
<opt1>
<if>igb0_vlan3</if>
<descr>VMWARE</descr>
<enable>1</enable>
<spoofmac/>
<ipaddr>10.8.8.1</ipaddr>
<subnet>24</subnet>
</opt1>
<enc0>
<internal_dynamic>1</internal_dynamic>
<enable>1</enable>
<if>enc0</if>
<descr>IPsec</descr>
<type>none</type>
<virtual>1</virtual>
</enc0>
<pptp>
<internal_dynamic>1</internal_dynamic>
<enable>1</enable>
<networks>
<network>192.168.0.192</network>
<mask>28</mask>
</networks>
<virtual>1</virtual>
<if>pptp</if>
<type>group</type>
<descr>pptp</descr>
</pptp>
</interfaces>
...
<gateways>
<gateway_item>
<interface>wan</interface>
<gateway>63.X.X.X</gateway>
<name>WANGW</name>
<weight>1</weight>
<ipprotocol>inet</ipprotocol>
<interval/>
<descr>WAN Gateway</descr>
<avg_delay_samples/>
<avg_loss_samples/>
<avg_loss_delay_samples/>
<monitor_disable>1</monitor_disable>
<defaultgw>1</defaultgw>
</gateway_item>
</gateways>
<vlans>
<vlan>
<if>igb0</if>
<tag>3</tag>
<pcp>0</pcp>
<descr>VMWare</descr>
<vlanif>igb0_vlan3</vlanif>
</vlan>
</vlans>
So far the router has an uptime of 16:36:47, I have a good feeling that VLAN on the WAN might be the cause.
The uptime is now 1 days 01:36:25. I think that was it. Having VLAN setup on the WAN interface is causing something to kernel panic.
It just had a kernel panic. I sent in the report.
Since the last kernel panic, it's been getting progressively worse. I'm experiencing panics 4+ times daily. Not sure what else to try at this point.
Here is a patch to try for the PPPoE+LAGG+VLAN issue on 17.7:
https://github.com/opnsense/core/commit/065244edf
Apply with
# opnsense-patch 065244edf
We could also try an older kernel, but at this point it seems to be a dormant bug that we simply trigger due to our changes in the interface configuration code...
Cheers,
Franco
Seems like i'm having the similar issue
Quote from: interkrome on August 25, 2017, 10:35:44 AM
Seems like i'm having the similar issue
Patch applied and then I rebooted.
opnsense-patch 065244edf
Hmm... Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|From 065244edf60aede23224f73732a8b18e494d46bf Mon Sep 17 00:00:00 2001
|From: Franco Fichtner <franco@opnsense.org>
|Date: Thu, 10 Aug 2017 15:15:42 +0200
|Subject: [PATCH] interfaces: the renaming in one ifconfig may be unstable
|
|(cherry picked from commit a7ca1661302bd200dbbcf8ba700fed36a167ad98)
|(cherry picked from commit 713f8b8d487d965f76b803e14f6a70fe51124f80)
|---
| src/etc/inc/interfaces.lib.inc | 18 +++++++++++++++---
| 1 file changed, 15 insertions(+), 3 deletions(-)
|
|diff --git a/src/etc/inc/interfaces.lib.inc b/src/etc/inc/interfaces.lib.inc
|index 198237abd..1ad59b7fb 100644
|--- a/src/etc/inc/interfaces.lib.inc
|+++ b/src/etc/inc/interfaces.lib.inc
--------------------------
Patching file etc/inc/interfaces.lib.inc using Plan A...
Hunk #1 succeeded at 72.
Hunk #2 succeeded at 97.
done
All patches have been applied successfully. Have a nice day.
root@stargate:~ # reboot
Just had another kernel panic. Looks like the patch didn't work.
The main inconvenience here is how long the device takes to actually shutdown the OS and start the reboot. It takes about 5 minutes to do the dump for the stack trace. Anyway we can disable that and speed up the process a bit? I'm willing to live with this if the recovery period can be speed-up.
Add this as a file under /usr/local/etc/rc.syshook.d/00-reset.early
#!/bin/sh
cp /etc/ddb.conf /etc/ddb.conf.bak
(grep -v kdb.enter.panic /etc/ddb.conf.bak; echo "script kdb.enter.default=reset") > /etc/ddb.conf
rm -f /etc/ddb.conf.bak
ddb /etc/ddb.conf
# chmod 700 /usr/local/etc/rc.syshook.d/00-reset.early
# /usr/local/etc/rc.syshook.d/00-reset.early
It should automatically apply after reboot.
You can emulate a crash with the following command to confirm:
# sysctl debug.kdb.panic=1
Cheers,
Franco