I'm not sure what is causing this, I am wondering it if it is just the nics that are in this device. Please let me know your thoughts. Every couple of days I lose the ability to connect to the LAN IP and the logs fill with the following.
once this starts happening I have to reboot the machine to get it back.
Jan 2 01:00:49 configd.py: [c7e56997-aa8f-4f94-9c4f-2c070f67ab76] updating dyndns lan
Jan 2 01:00:46 opnsense: /usr/local/etc/rc.linkup: HOTPLUG: Configuring interface lan
Jan 2 01:00:46 opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet attached event for lan
Jan 2 01:00:46 configd.py: [7f6a2423-4cdc-49d5-8181-881fc54f12e0] Linkup starting re0
Jan 2 01:00:46 devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup start re0'
Jan 2 01:00:46 kernel: re0: link state changed to UP
Jan 2 01:00:42 opnsense: /usr/local/etc/rc.linkup: DEVD Ethernet detached event for lan
Jan 2 01:00:42 configd.py: [e29d0834-cbfb-4aff-9c82-5bf2df12a232] Linkup stopping re0
Jan 2 01:00:41 devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop re0'
Jan 2 01:00:41 kernel: re0: link state changed to DOWN
Jan 2 01:00:41 kernel: re0: watchdog timeout
Same here - afaik a known issue. Disabling all HW acceleration helps extending the time before failure.
Hi guys,
Watchdog timeout points to a hardware lockup. re(4) is not very good in general. Maybe it's better when we switch to FreeBSD 11.0 with 17.1, but in general migrating to better NICs is the best (and ironically) cheapest solution in the long run.
It could also be temperature concerns, malfunctioning lines / cables, etc.
The main question, though, what are you using the box for. How much traffic are you pushing? The more you approach the edge of the specification the more visible such cases can be.
Cheers,
Franco
Hi franco,
thank you for the reply.
Since the box does not have any extension capabilities it will be hard to replace the nics (usb would be possible).
To replicate the failure you just need to push about 30mb/s (megabyte) and starting at about 20 gig overall traffic one or both nics fail. dmesg tells you that they eventually come up again but the do not forward any traffic after the first failure.
Disabling the acceleration helps - even if you do not use the feature e.g.:
If i disable vlan hw acceleration in the options it only disables the hw feature on my lan side (i only use vlans on my lan side - it might be a bug but i did not test vlan on the wan side). If i disable the acceleration manually it takes much more time and traffic before the nics fail.
Btw. using windows as os (just test installation) the nics do not fail even under much heavier load.
If it helps to grant access to the box for debuging purpose i can add your public key next week (i wont make it earlier).
Cheers
Martin
Hi Martin,
Ok, let me rephrase: re(4) drivers on BSD are difficult. I checked the source code for fixes in newer versions and did not find a single one. This is not going to be fixed. ;(
If Windows works, maybe that's an option for the box, or a Linux there. But I don't know the state of it, maybe IPfire can do what you will expect without issues.
Sorry,
Franco
Same here.
kernel: re0: watchdog timeout endlessly and only power cycle helps.
Seems to be a trouble with the re(4) driver and/or Zotac hardware or combination thereof.
At the same time the system itself is responsive, you can log in from terminal, the error is logged, etc. And it never seems to be re1 for us, no matter which is assigned to LAN or WAN.
The most baffling thing is that its totally random. We had one crashing regurarly at one office. At the same time we have two ci323 working in different locations and one of them has never crashed in two months and the other one crashed several times for a week and now has not done it in more than month or so. Does not seem to be related to traffic intensity.
Generally disabling HW acceleration etc did not seem to help us. Temp also not an issue (cooled room, well below 20C). As the box might work for days, its pretty hard to diagnose.
After a couple of weeks we decided to replace ci323 with a different mini-pc that has 2 intel NICs and have had no trouble since.
Hi Franco,
windows is not an option ;)
I already ordered this box with intel nics: http://www.shuttle.eu/products/slim/ds68u/ (http://www.shuttle.eu/products/slim/ds68u/)
Would you be able to debug the problem if i send you the hardware? Or maybe someone else?
Cheers
Martin
Quote from: franco on January 05, 2017, 04:35:30 PM
Ok, let me rephrase: re(4) drivers on BSD are difficult. I checked the source code for fixes in newer versions and did not find a single one. This is not going to be fixed. ;(
Hi Martin,
That's a a very kind offer, but I cannot allocate time for this between OPNsense and my day job.
If someone else wants to take you up on this, that would be great. We can also try to reach more people on Twitter with this request if you want. :)
Cheers,
Franco
Hi Franco,
sure!
If that works and we find someone who could take care of the problem and try to fix the bug(s) that would be great.
Cheers
Martin
Hi Franco,
any news on this topic? I sent you a private message with some details.
Cheers
Martin
I do have such a CI323 box too, since about six month never had any problems... except once when I performed an iperf between two boxes, one on the LAN side, one on the WAN side. I had to poweroff and poweron that box to make it run again (didn't have a monitor attached).
If I can perform some tests for you to gather logfiles just ask!
If you just use the box at home it happens only under high load (over 100mbit).
As soon as i start using iperf or fast downloads the problems appear.
Seems that compiling a different realtek driver might help:
https://forum.pfsense.org/index.php?topic=103841.msg684436#msg684436
Removing re(4) from the kernel is not an easy task as the kernel gets reverted on firmware updates.
Let me look into this and provide an in-kernel driver test based on realtek's version 1.92...
https://github.com/opnsense/src/issues/15
If this works and the world doesn't end we can consider a full switch.
Cheers,
Franco
So, here's your code branch for the original re(4) driver from Realtek, slight adaptation for FreeBSD 11.0.
https://github.com/opnsense/src/commits/re
It builds fine but haven't tested this yet. If anybody wants to try... it will apply on any 17.1 for amd64 including prereleases:
# opnsense-update -kr 17.1-re
# /usr/local/etc/rc.reboot
Caveats: UNTESTED, amd64 only and netmap is not in native mode, only emulated.
Cheers,
Franco
Working kernel module for FreeBSD 10.3 and opnsense 16.7. Compiled from realtek source V1.92.
tested on opensense 16.7. Running for two days streaming directv now. Installer included.
Working kernel module for FreeBSD 11 and opnsense 17.1. Compiled from realtek source V1.92.
tested on opensense 17.1. It was tested for kernel loading and general i/o. not load tested. Installer included.
Question: How does this even compile under FreeBSD 11.0 when taskqueue_enqueue_fast() is not in the OS anymore? I don't see a replacement for taskqueue_enqueue(). ;)
It's also useful to know that the module cannot be loaded without a replacement kernel without re, right?
for bsd11 I found this patch, don't remember where ether pfsense or FreeBSD forums:
--- if_re.c 2016-07-19 13:50:27.716636000 -0400
+++ if_re.c.Patched 2016-07-19 13:52:06.534495000 -0400
@@ -47,6 +47,8 @@
* This driver also support Realtek 8139C+, 8110S/SB/SC, RTL8111B/C/CP/D and RTL8101E/8102E/8103E.
*/
+#define M_DONTWAIT M_NOWAIT
+
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/sockio.h>
@@ -57,6 +59,7 @@
#include <sys/taskqueue.h>
#include <net/if.h>
+#include <net/if_var.h>
#include <net/if_arp.h>
#include <net/ethernet.h>
#include <net/if_dl.h>
@@ -5529,7 +5532,7 @@
sc->re_desc.tx_last_index = (sc->re_desc.tx_last_index+1)%RE_TX_BUF_NUM;
txptr=&sc->re_desc.tx_desc[sc->re_desc.tx_last_index];
- ifp->if_opackets++;
+ if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
}
@@ -5672,7 +5675,7 @@
}
eh = mtod(m, struct ether_header *);
- ifp->if_ipackets++;
+ if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
#ifdef _DEBUG_
printf("Rcv Packet, Len=%d \n", m->m_len);
#endif
@@ -5747,7 +5750,7 @@
#if OS_VER < VERSION(7,0)
re_int_task(arg, 0);
#else
- taskqueue_enqueue_fast(taskqueue_fast, &sc->re_inttask);
+ taskqueue_enqueue(taskqueue_fast, &sc->re_inttask);
return (FILTER_HANDLED);
#endif
@@ -5827,7 +5830,7 @@
#if OS_VER>=VERSION(7,0)
if (CSR_READ_2(sc, RE_ISR) & RE_INTRS) {
- taskqueue_enqueue_fast(taskqueue_fast, &sc->re_inttask);
+ taskqueue_enqueue(taskqueue_fast, &sc->re_inttask);
return;
}
#endif
as for loading the module without replacing the kernel it woks as long as you set it to load (if_re_load="YES") in loader.conf.local. I verified this by checking dmesg, Version 1.92 loaded. You can not mix kernel modules with different OS's 10.3, 11. the kernel module must be compiled for that kernel/OS version but, I suspect you already know that.
I don't mean to impede any development. if you can get the proper driver built into 17.1 please do. Please, please!
Franco,
I did not apply the patch to the 10.3 module only the 11 module. Do you think I should have? Do you have a better Patch?
Hey,
Ok, better. :) I posted a similar patch: https://github.com/opnsense/src/commit/fc62dbeab5043
I built a full kernel with the new driver by replacing the FreeBSD one: https://github.com/opnsense/src/commit/9ab694091b
On OPNsense 17.1, this kernel can be installed by just running the command(s):
# opnsense-update -kr 17.1-re
# /usr/local/etc/rc.reboot
Last but not least, this kernel doesn't need a loader.conf fixup.
If it works for all testers we will consider merging it into an OPNsense 17.1.x release.
Cheers,
Franco
Great! Thanks! I will start testing your kernel when 17.1 is stable I guess on the 31th. My wife will kill me if the router crashes while I am at work. I hope this gets backported to FreeBSD. According the forums the pfsense and freenas guys are having trouble too.
I have a Qotom Thin Mini PC with Intel Celeron j1900 processor onboard, quad core 2.42 GHz, 4GB RAM 64GB SSD, dual LAN dual display serial port(thank goodness for copy and paste).
I initially installed OPNSense, but it kept failing. Wife and family would contact me, irate as they were unable to go online.
Then I tried Sophos, and that too failed.
Then I tried the one that starts with a P, and had pretty good luck with that. It would run dangerously warm and only locked up a few times.
I was bored this past Tuesday and took it back down and re-installed OPNSense, RC1 and have all of my traffic running through the proxy for one machine only. The four cores are running about 40 C and the sensor on the unit says 27 C.
So, with all of that in mind, is it possible that your unit locked up due to heat issues? Once they get warm, they do some really wonky things.
Quote from: franco on January 26, 2017, 10:31:33 PM
...
# opnsense-update -kr 17.1-re
# /usr/local/etc/rc.reboot
...
Cheers,
Franco
did the update of the kernel as you described - after rebooting the OPNsense-box - it was still working :) can browse the net and write this post!
now performing some iperf tests between two BananaPis, one at LAN (client), one at WAN (server). Seems like I have a "weak" network-cable since it only transfers with around 100Mbps - but it is still transferring!
How long should it take until it fails with the old driver/kernel?
EDIT: after replacing that 100Mpbs cable it was hitting the OPNsense box with iperf for more than 30 minutes and I still can post here without any reboot :)
I like the new driver - thank you franco!
Thanks for the report. :)
I'm still waiting for another user and a test on a hardware that I have here, but it looks good and the change will be queued up for the development track soon. And if that works out ok we may be looking at inclusion in a 17.1.x release in a month or so.
Cheers,
Franco
Quote from: franco on February 07, 2017, 09:06:31 AM
Thanks for the report. :)
I'm still waiting for another user and a test on a hardware that I have here, but it looks good and the change will be queued up for the development track soon. And if that works out ok we may be looking at inclusion in a 17.1.x release in a month or so.
Cheers,
Franco
Hi,
i have CI323 with 16GB memory and 128GB SSD.
I just found this post , and i will test today if is working.
Currently i'm using a gigabit pppoe connection and hang at upload.
Let's see after patch.
just tried latest kernel on ci323 and it's working great.
Also note that enabling powerd, with hiadadaptive it's working ok with pppoe connnection.
Thanks, I'll upload another kernel for 17.1.1 in a few days, likely changing to the driver by default in 17.1.2 if all goes well.
after about 48 hours still online with the new networkdriver/kernel...
rebooted after the "loadtest" with iperf to reset interface statistics and had around 2 GB traffic since then (didn't have the chance stream a movie this week...).
looking forward to the next update - are there special steps to do before the next upgrade or does the "manually changed" kernel stay?
Not yet, need to reinstall after upgrade to normal 17.1.1 kernel for safety reasons.
If IPS tests work well here locally I'm very sure 17.1.2 will switch re(4) permanently.
Cheers,
Franco
Now,
if i used:
# opnsense-update -kr 17.1-re
# /usr/local/etc/rc.reboot
next build will be updated with 17.1-re or 17.1.1 kernel ?
I think you did answer this question. :)
I'll provide a kernel for 17.1.1-re tomorrow.
FWIW, if you don't have issues with 17.1, you don't strictly need the 17.1.1 kernel.
Just put 17.1-re on one of my ci323 boxes that seemed to do the watchdog timeout thing on a weekly basis. Lets see if it lasts longer now.
17.1 upgrade lost the console with my VGA monitor though - neither VGA or EFI works. But i'd rather live without a console than without Internet :-)
Quote from: xofer on February 10, 2017, 03:20:45 PM
...
17.1 upgrade lost the console with my VGA monitor though - neither VGA or EFI works. But i'd rather live without a console than without Internet :-)
take a look at System>Setting>Administration - there is a setting named primary console - you might want to switch to VGA console and reboot
I don't see kernel-17.1.1-re-amd64.txz was this rolled into 17.1.1?
It's not there yet. Sorry.
Quote from: the-mk on February 10, 2017, 03:32:52 PM
Quote from: xofer on February 10, 2017, 03:20:45 PM
...
17.1 upgrade lost the console with my VGA monitor though - neither VGA or EFI works. But i'd rather live without a console than without Internet :-)
take a look at System>Setting>Administration - there is a setting named primary console - you might want to switch to VGA console and reboot
Well, as i said - neither VGA or EFI works - where do you suppose I changed from one to another?
But I don't actually want to bring new topics to this thread. If we can get rid of those pesky re0 watchdog timeouts on Zotac ci323, I will be over the moon and don't care that much about the console. Or I will drag an HDMI monitor to that particular room and try that.
Quote from: xofer on February 10, 2017, 05:41:22 PM
Quote from: the-mk on February 10, 2017, 03:32:52 PM
Quote from: xofer on February 10, 2017, 03:20:45 PM
...
17.1 upgrade lost the console with my VGA monitor though - neither VGA or EFI works. But i'd rather live without a console than without Internet :-)
take a look at System>Setting>Administration - there is a setting named primary console - you might want to switch to VGA console and reboot
Well, as i said - neither VGA or EFI works - where do you suppose I changed from one to another?
But I don't actually want to bring new topics to this thread. If we can get rid of those pesky re0 watchdog timeouts on Zotac ci323, I will be over the moon and don't care that much about the console. Or I will drag an HDMI monitor to that particular room and try that.
its not just the Zotac, its one or more Realtek lan chips on the whole FREEBSD platform.
opnsense-update -kr 17.1.1-re it's not working.
Also now on ci323 same issue with re1 watchdog time out.
I said I did not have the time to do it then. ;)
But now it's up for amd64:
# opnsense-update -kr 17.1.1-re
Cheers,
Franco
Fantastic, I just had time today to upgrade to 17.1.1 the upgrade was successful on my Zotac ci323.
I then ran #opnsense-update -kr 17.1.1-re which also ran successfully for me.
I am now up and running on the new version and will watch closely for any further timeouts.
I just want to say thank you, this is a random issue with this hardware that can be frustrating when it happens. You could go weeks with no issues or you could have it start timing out a few times in a day.
My fingers are crossed that the driver update resolves the issue. I will report back when I know more.
17.1-re has been up for 5 days now without a single watchdog timeout in the logs.
Even the occasional non-fatal timeout i used to have from time to time are gone now.
I vote for this realtek driver to be included in the next release.
Realtek released the FreeBSD driver version 1.93 with built-in support for FreeBSD 11.0. All the more reason to go forward, I've queued it up for 17.1.2.
Thanks to everyone for the discussion and testing!
Cheers,
Franco
I know this is closed but I just wanted to report back. It has been 8 days for me and I don't have a single time out in my logs. Thank you!!!
Hi Brady,
Thanks, really happy with the switch so far. :)
Cheers,
Franco