OPNsense Forum

Archive => 18.1 Legacy Series => Topic started by: taenzerme on March 15, 2018, 09:31:18 am

Title: em0 down for no reason
Post by: taenzerme on March 15, 2018, 09:31:18 am
Hello all,

we just recently switched from a virtualized pfSense (Proxmox cluster) to a dedicated Deciso A10 Quad Core SSD running latest 18.1.

This morning I came to the office finding the firewall in a bad mood: Not responding on the console, not responding to any requests on the lan interface (em0). I did a cold reset and everything came back up again.

Logs show that em0 started acting strange last night:

Code: [Select]
Mar 14 21:06:48 fw kernel: em0: Watchdog timeout Queue[0]-- resetting
Mar 14 21:06:48 fw kernel: Interface is RUNNING and ACTIVE
Mar 14 21:06:48 fw kernel: em0: TX Queue 0 ------
Mar 14 21:06:48 fw kernel: em0: hw tdh = 833, hw tdt = 716
Mar 14 21:06:48 fw kernel: em0: Tx Queue Status = -2147483648
Mar 14 21:06:48 fw kernel: em0: TX descriptors avail = 117
Mar 14 21:06:48 fw kernel: em0: Tx Descriptors avail failure = 0
Mar 14 21:06:48 fw kernel: em0: RX Queue 0 ------
Mar 14 21:06:48 fw kernel: em0: hw rdh = 79, hw rdt = 78
Mar 14 21:06:48 fw kernel: em0: RX discarded packets = 0
Mar 14 21:06:48 fw kernel: em0: RX Next to Check = 79
Mar 14 21:06:48 fw kernel: em0: RX Next to Refresh = 78
Mar 14 21:06:48 fw kernel: em0: link state changed to DOWN
Mar 14 21:06:48 fw configd.py: [79564c8a-0f17-4693-b41c-5da048d18dbb] Linkup stopping em0
Mar 14 21:06:49 fw opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for lan
Mar 14 21:06:52 fw kernel: em0: link state changed to UP
Mar 14 21:06:52 fw configd.py: [10082d83-9b6d-4776-97cb-aa7dd8f286f6] Linkup starting em0
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet attached event for lan
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: HOTPLUG: Configuring interface lan
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: ROUTING: entering configure using 'lan'
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: ROUTING: removing /tmp/em1_defaultgw
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: ROUTING: removing /tmp/em1_defaultgwv6
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: ROUTING: creating /tmp/em1_defaultgw
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: ROUTING: no IPv6 default gateway set, trying wan on 'em1' ()
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: ROUTING: skipping IPv4 default route to wan
Mar 14 21:06:53 fw opnsense: /usr/local/etc/rc.linkup: ROUTING: skipping IPv4 default route to wan

This happens about every 50 minutes from then on.

I checked the main switch connect to em0/lan and found nothing special.

Should I be looking for a hardware fault on either the switch or the firewall or is this something else?

Thanks for your help!

Sebastian
Title: Re: em0 down for no reason
Post by: tmp on March 15, 2018, 09:45:25 am
We seem to have the same problem: https://forum.opnsense.org/index.php?topic=7580.0
Title: Re: em0 down for no reason
Post by: taenzerme on March 15, 2018, 10:02:58 am
Actually, yes, this looks exactly the same.

I noticed the thing started happening around the time we do external backups of our storages to an off-site cloud. This leads to _a lot_ of traffic on wan and lan around that time.

Our firewall - as mentioned - is a fresh setup, no migration from pfSense directly, and is not virtualized, of course.

Murphy, I guess. I got a physical firewall to seperate WAN connection, DNS and DHCP from our Proxmox cluster for more failure safety and boom, 2 days after the switch this happens :-)
Title: Re: em0 down for no reason
Post by: tmp on March 16, 2018, 07:57:54 am
The problem occurs on our site also under high load.
In the meantime I've found the error message documented in the em(4)driver in FreeBSD:

Code: [Select]
     em%d: watchdog timeout -- resetting  The device has stopped responding to
     the network, or there is a problem with the network connection (cable).
https://www.freebsd.org/cgi/man.cgi?em(4) (https://www.freebsd.org/cgi/man.cgi?em(4))

I've also replaced the cable connection with a new one but as expected that didn't solve the problem.




Title: Re: em0 down for no reason
Post by: opnfwb on March 17, 2018, 01:01:30 pm
I have posted a reply in the 2nd thread with suggestions on some performance tweaks/settings for the EM driver. Let us know if this helps to resolve the issue?

https://forum.opnsense.org/index.php?topic=7580.msg34822#msg34822