I've noticed a quirky issue that I'm hopeful someone has seen before or can suggest troubleshooting steps for. I've searched online but can't find any similar situations.
Almost exactly every 48 hours (around 3am every second day), I see a LAN detached event in OPNsense's logs, for example:
2020-12-25T03:00:14 opnsense[939] /usr/local/etc/rc.linkup: DEVD Ethernet detached event for lan
This causes dhcp6c to restart, so it goes through its process of sending a release, soliciting for an IPv6 address/prefix on the WAN, and getting an advertise of WAN GUA and prefix. Sometimes it gets to the point of requesting the address/prefix. It is interrupted though by another detached event, this time on one of the VLAN interfaces (say OPT1) that is on the same interface as the LAN.
The process then repeats, cycling through every VLAN (OPT2, OPT3 etc).
Then there is a series of attached events, again for the LAN interface and every VLAN in sequence. For each attached event, dhcp6c restarts and goes through its process (or part of its process).
This whole process of detached and attached events then repeats itself, sometimes once, sometimes two or more times.
This all lasts maybe 30 to 45 seconds in the logs. Most times it stops after a while and everything seems to return to normal.
But on occasion it causes dhcp6c to fail. A few minutes after the attached/detached events cycle stops, dhcp6c reports an "XID mismatch", and then dhclient goes into a cycle of "Creating resolv.conf" every 15 minutes.
The end result is that the WAN GUA and prefix disappear, and there is no external IPv6 connectivity. IPv4 is unaffected.
Any ideas?
Versions:
OPNsense 20.7.7_1-amd64
FreeBSD 12.1-RELEASE-p11-HBSD
OpenSSL 1.1.1i 8 Dec 2020
Hoping someone has some thoughts on this - happened again this morning on cue and I lost external IPv6 connectivity. Seems now to happen every 4 or 6 days
There must be something that is running to a schedule that causes the LAN detached event to happen on such a specific timetable. But I don't know whether it is OPNsense or something else
I've now resorted to running a custom cronjob a bit after 3am each day to check for external IPv6 connectivity and if there is none to restart dhcp6c. Bit hacky, but means I don't need to check and restart manually every few days.
I really would like to solve the underlying issue though. Still no-one out there with any thoughts?
Mind sharing the cron job. I too have noticed the ipv6 disconnection issue and having to manually restart radvd to get external ipv6 working again. I never dug into the logs to see what was happening when it started.
I'll keep an eye for it.
No problem. My script is a simplified version of something marjohn56 posted in the forum for an unrelated but similar issue. I run the cronjob every minute for 5 minutes from 3.03am each day.
Contents of /usr/local/sbin/ping6_check.sh:
#!/bin/sh
# Script to test IPv6 connectivity and restart dhcp6c if necessary
# Try a few pings to Cloudflare's IPv6 servers.
# Quit immediately if we get a single frame back.
# If neither server responds at all then restart dhcp6c.
counting=$(ping6 -o -c 10 2606:4700:4700::1111 | grep 'received' | awk -F',' '{ print $2 }' | awk '{ print $1 }')
if [ $counting -eq 0 ]; then
counting=$(ping6 -o -c 10 2606:4700:4700::1001 | grep 'received' | awk -F',' '{ print $2 }' | awk '{ print $1 }')
if [ $counting -eq 0 ]; then
# Restart dhcp6c
service dhcp6c restart
fi
fi
Contents of /usr/local/opnsense/service/conf/actions.d/actions_ping6_check.conf:
[load]
command:/usr/local/sbin/ping6_check.sh
parameters:
type:script
message:starting IPv6 connectivity check
description:Run IPv6 check
After setting these up, as root run the following to get the job to appear in the cronjob list in the GUI:
service configd restart
On a hunch yesterday I had an idea that Sensei might be behind this behaviour - for example it might be doing some sort of refresh or update or health check every 2 days. I had Sensei configured on the LAN interface. As I check I have disabled it for the time being and will see whether the behaviour continues.
After my hunch a search revealed your post about Sensei and IPv6 (https://forum.opnsense.org/index.php?topic=9521.msg55708#msg55708) which show logs somewhat similar to what I have been seeing. Which makes me think my hunch may be right.
Can confirm it is Sensei causing this issue. I was "due" for a LAN detached/attached sequence this morning and it didn't occur with Sensei off
I will raise this with the Sensei folks
Hi @Greelan, thanks for the heads-up!.
Yes we confirm that this is due to Sensei engine restarting at 3am. This was a dirty workaround to avoid some nasty netmap bugs.Basically a restart of the process also refreshed netmap internal data structures. When the process exits / starts, hence the netmap closes/opens the interface causing interface down/ups events.
Since netmap is a lot stable now, we beleive we don't need this anymore. We've have removed this in the upcoming 1.7 release.
1.7 is scheduled for tomorrow/this weekend. Stay tuned.
@mb, aah, I see. Thanks for the explanation. "Dirty workaround" indeed! Particularly as the interfaces went down and up multiple times, which I think is what led to dhcp6c borking.
I look forward to the new release, and assuming all is well being able to re-enable the Sensei engine.
Have now installed Sensei 1.7 on OPNsense 20.7.8 and re-enabled it. Will monitor and report back if anything negative. Thanks
@Greelan, thanks, looking forward to it.
Dumb question. Now that Security, App Controls and Web Controls have been merged under Policies, do I need to separately configure the default policy to select the interface and the VLANs on that interface? Previously I'd understood that it was sufficient simply to select the parent protected interface (LAN in my case) under Configuration > General. Currently under Policy Configuration in the default policy the LAN interface is not selected and no VLANs are specified. Thanks
@Greelan, not really. Default policy already matches everything that passes through your protected interfaces which you've set up in Configuration.
Got it, thanks. Presumably then the options in the policy could be used to exclude VLANs if desired, or is that only possible with the premium edition?
Yes, correct. That requires the creation of additional policies which are part of paid subscriptions.
Also, are you saying that VLANs aren't protected unless specifically selected under Configuration? I'd understood previously that only the parent interface needed to be selected in order to protect VLANs, and indeed that it was not desirable to select VLANs specifically as that could cause issues
You are right. If you select the parent interface, you are also protecting the vlans on it.
Thanks, @mb, appreciate the input
Always a pleasure