OPNsense Forum
Archive => 20.7 Legacy Series => Topic started by: direx on September 08, 2020, 07:53:05 am
-
Hi,
I have a problem which was introduced after updating to 20.7:
After round about two days of uptime of my OPNsense box, IPv6 in my networks stops working. This has nothing to do with chaning prefix (mine chages every 180 days) but I figured out that radvd does not announnce the IPv6 prefix any more. This means all clients will lose IPv6 connectivity eventually.
Clicking the restart button for "radvd" in the web UI fixes this and clients re-gain their internet connectivity after this. The strange part is that radvd is always running (output before restart):
# ps aux|grep rad
root 42763 0.0 0.1 1061048 3196 - Ss Sun21 0:30.35 /usr/local/sbin/radvd -p /var/run/radvd.pid -C /var/etc/radvd.conf -m syslog
Between radvd restarts the radvd.conf and the output of "netstat -6an" does not change.
This really looks like a bug to me (radvd freezing) but I don't know how I can debug this. Any hints here on how to get to the root cause of the radvd issue? It looks like the "strace" command is not available so I am a little helpless here.
Regards,
direx
-
Yesterday after a cold boot, I didn't notice I had no IPv6 until 90 minutes later and it required me to save/apply an unchanged WAN interface followed by a save/apply an unchanged LAN interface. Then routing started. It looked like I had IPv6 addresses on hosts, but no connectivity (ipv6 monitored by Nagios ping). There are a few more ipv6 threads that may be related (one solved by moving to 21.1 development version). In my experience testing, unrelated to the above problem (maybe), it looks like radvd is not responding to host solicitations directly. It advertises and I increased the frequency of that using manual settings.
https://forum.opnsense.org/index.php?topic=18868.0
https://forum.opnsense.org/index.php?topic=18549.0
https://forum.opnsense.org/index.php?topic=18591.0
Just an FYI. Oh, and I have not seen the problem you describe where radvd stops altogether. You might want to try manual router adv settings. Something definitely seems wrong as compared to 20.1.x series.
Cheers.
-
I've experienced this issue, too.
-
I can confirm, too. Two OPNsense systems are affected by this issue.
-
I registered on this forum just to say that I've been hit with this problem too.
A restart of the radvd service fixes the problem immediately, but radvd then stops working after 24-48 hours (ipv6 solicitation stops working) until you manually restart the service.
-
Somehow the kernel hits a limit for multicast join/leave in FreeBSD 12. We haven't had the chance to debug this further and there seem to be no relevant patches in FreeBSD.
Cheers,
Franco
-
I'm new to OPNsense, but noticed a couple days ago IPv6 addresses were not being given to devices on the LAN side. A restart of radvd got it working again.
Would a potential work-around for this issue be to set up a cron job to pkill and then restart radvd once a day?
-
Better set the cronjob once per hour....
-
Hey folks,
I just found that our office in Frankfurt is suffering the same problem. They have an uplink ISP that does not provide v6 at all, so we deployed an OPNsense (20.7) to route through a WireGuard tunnel to our main office in Karlsruhe.
Works great, if it wasn't for the router advertisements.
Since the OPNsense box in Frankfurt is the only router for v6 in the LAN, we are in control of everything, and Macs (our preferred developer platform) will probably honour almost anything - would it be a viable workaround to switch to DHCPv6 from SLAAC?
From https://github.com/opnsense/core/issues/4338 I get the issue is not quite fixed in the update planned for tomorrow?
Thanks!
Patrick
EDIT: I just found that DHCPv6 does not work without RA ... ok.
-
One question:
Why are we using radvd at all? I assume this is this product?
http://www.litech.org/radvd/
The FreeBSD base system contains rtadvd which is running in production on our site without a single problem ...
Kind regards,
Patrick
-
https://www.freebsd.org/cgi/man.cgi?query=radvd&apropos=0&sektion=0&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html
-
Yes - but that's a port of an external piece of software:
https://www.freshports.org/net/radvd/
rtadvd is in base and basically what we run everywhere if it's plain FreeBSD and not OPNsense:
https://www.freebsd.org/cgi/man.cgi?query=rtadvd&apropos=0&sektion=0&manpath=FreeBSD+12.1-RELEASE&arch=default&format=html
Why the extra package?
-
Can the element in question be rolled back to what was used in 20.1? Add a watchdog to restart it or make restarting radvd an option in the cron task menu?
-
Can the element in question be rolled back to what was used in 20.1? Add a watchdog to restart it or make restarting radvd an option in the cron task menu?
Base HardenedBSD is also upgraded to 12 since 20.7 release. This issue might be more than just radvd as discussed over opnsense GitHub. So simply reverting back to old radvd package might not guarantee a solution. But not sure why radvd is used over FreeBSD native rtadvd..
For what it worths, I am restarting radvd via cron every 30 minutes as a stopgap for now
-
I think it's in the kernel since radvd is the same. Rollback is not easily possible in this regard.
As for why radvd and not rtadvd... radvd came before or was more reliable (think over 10 years ago) and nobody did the work to evaluate rtadvd migration since then. Maybe now that 12.1 is out and provides challenges to radvd it's time to do this evaluation, but even then the kernel issue might be affecting rtadvd too.
At this point it is still too early to tell.
Cheers,
Franco
-
@pmhausen - are you running it in place of radvd? If so have you modified dhcpd.inc to create the correct config file for it, or are you just calling it manually?
- Edit -
Hmm, looks the same... I'll try it.
-
@pmhausen - are you running it in place of radvd? If so have you modified dhcpd.inc to create the correct config file for it, or are you just calling it manually?
We are not running rtadvd on OPNsense. I just happen to work in an environment with about a hundred FreeBSD machines in total and on some of them we run rtadvd - the ones that are routers, of course.
And I was just puzzled OPNsense included a 3rd party package instead of using what is in base. Specifically because rtadvd has been in FreeBSD since 2000 when KAME IPv6 was integrated. Of course it is in base that long, because *some* router advertisement daemon is mandatory for a router ;)
Currently I look at the source to find what the two daemons might be doing differently. And I am completely flabbergasted when I browse the radvd source. In the original product, still in their git repo, the function
setup_allrouters_membership()
is just a stub with a single "return 0;" statement. The code to actually join the all routers multicast group was added by the port maintainer and just recently improved/fixed by @franco.
https://svnweb.freebsd.org/ports/head/net/radvd/files/patch-device-bsd44.c?view=log
Second, from the control flow the function should be called only once at startup of the daemon when the interface is initialised. So I wonder why the group is joined repeatedly (?) until some kernel bug kicks in. Is that the case or did I completely misunderstand the problem?
The actual code to join the mcast group looks more or less identical for both, rtadvd's is here:
https://svnweb.freebsd.org/base/releng/12.1/usr.sbin/rtadvd/if.c?revision=352546&view=markup
It's essentially
setsockopt(sock, IPPROTO_IPV6, IPV6_JOIN_GROUP, &mreq, sizeof(mreq))
Kind regards,
Patrick
-
Well I have it running on my test unit, no changes at present just a manual stop of radvd and manual start of rtadvd using the same config file etc, it's working. So I'll leave it running and see what happens. I think I'll do the same on my live router as that will get a lot more action. As a side note, radvd did fall over on my live machine this morning, so that proves the issue is still there, but it's very intermittent.
-
Maybe a newbie question, but how would I set up a cronjob to restart radvd hourly from the UI?
I navigated to System/Settings/Cron but it seems to have only a list of predefined commands and doesn't allow custom commands. Or did I miss anything here?
thanks, Till
-
Maybe a newbie question, but how would I set up a cronjob to restart radvd hourly from the UI?
I navigated to System/Settings/Cron but it seems to have only a list of predefined commands and doesn't allow custom commands.
It came as a surprise to me too, that you cannot execute arbitrary commands via Cron, but here you go:
Create /usr/local/opnsense/service/conf/actions.d/actions_radvd.conf with e.g. this content:
[restart]
command:/usr/local/sbin/pluginctl -s radvd restart
type:script
description:Restart radvd
Enter this command:
service configd restart
Voila - new option in the Cron UI.
-
thank you. Worked like a charm!
-
It's important that it has a description
-
is this still a problem in 20.7.4?
-
I suppose so.
-
Without looking at the logs I can't say for sure but I'm definitely still having what I assume is this problem. adding the cron job and we'll see how it goes.
-
Just chiming in. I'm having this issue as well. Found this:
https://forum.netgate.com/topic/142363/ipv6-broken-radvd-can-t-join-ipv6-allrouters-on-interface/137
https://github.com/pfsense/FreeBSD-ports/pull/773
@franco does this help narrow it down.
-
Having the same issue with OPNsense 20.7.7_1-amd64 on APU4 hardware. Have setup the work-around with daily restart of radvd by cron for now (using the Cron UI as pointed out by pmhausen Reply #19 above https://forum.opnsense.org/index.php?topic=19032.msg90983#msg90983 ).
-
For those just joining the party, see https://github.com/opnsense/core/issues/4338#issuecomment-732397405
we have a working fix and pull request. running opnsense-patch 9a4a908
will replace radvd with rtadvd and seems to rectify the issue for everyone.
-
For those just joining the party, see https://github.com/opnsense/core/issues/4338#issuecomment-732397405
Thank you for the pointer.
The opnsense-patch 9a4a908 applied cleanly to OPNsense 20.7.7_1-amd64 and rtadvd is running since 15+ hours after reloading the WebUI and restarting the Router Advertisement Daemon manually, but executed no reboot so far, to avoid loss of connectivity and BGP route flaps upstream.
Now keeping an eye on it as rtadvd approaches the 20 hours mark whereabout radvd got stuck, started to fill the router log with its messages, and required a restart.
-
I've dug through kernel changes for FreeBSD 12 to find something that would indicate radvd stopped working the way it used to when we were still on 11. Although I'm not sure this isn't the new reality I can't say that moving radvd to rtadvd is the obvious solution if we unterstand that radvd works pretty much how we want to and all we did was move from FreeBSD 11 to 12.
I'm also trying a new approach for the BSD-based fix that FreeBSD carries exclusively (not part of upstream for diversity reasons most likely) more closely resembling the way that rtadvd handles its multicast group join internally.
Hopefully that will make radvd usable again in OPNsense in 2021 for the affected users.
Cheers,
Franco
-
I was a little concerned as I said to you that when I created the rtadvd patch there appears to be features in radvd that have no equivalent in rtadvd. However, after a couple of months of people running it there seems to be nothing that doesn't work the way it should. I've just added a second commit to allow for remote log functions to work.
-
I was looking through the old changes and from what I could tell from the comments, the reason pfsense moved from rtadvd to radvd, was they were having problems with CARP and VIPs in ipv6 at the time with rtadvd. Maybe rtadvd has solved the issues from back then.
It is possible that from FreeBSD 11 to 12, something else in the network stack introduced a dependency on rtadvd.
-
I was looking through the old changes and from what I could tell from the comments, the reason pfsense moved from rtadvd to radvd, was they were having problems with CARP and VIPs in ipv6 at the time with rtadvd. Maybe rtadvd has solved the issues from back then.
It is possible that from FreeBSD 11 to 12, something else in the network stack introduced a dependency on rtadvd.
I was just having issues in a lab set up with rtadvd and carp vip's. Not sure it isn't kvm and virtualized network related but it definitely seemed to be a contributing factor.
Sent from my Pixel 4 XL using Tapatalk
-
With CARP etc, it's likely that RTADVD needs to be signalled to stop listening on the interfaces and only become active when needed, that's shouldn't be too difficult to implement if RTADVD became the default daemon. CARP isn't something I play with, so I've not looked into it.
-
I've been using the cron script to restart radvd and dhcpv6 since 2020. I've been on 22.1 for a few weeks and hadn't the issue anymore. Yay!