radvd stops advertising prefixes after a while

Started by meyergru, July 01, 2023, 12:55:29 PM

Previous topic - Next topic
I know this was a thing with 22.7 before (there is an archived thread about this here):

On one of my OpnSense boxes, radvd stops advertising the prefix after a while. That is to say, the RAs lack the prefix info, it is not radvd stops working completely. Restarting radvd immediately helps, after the restart, the prefix is announced again. In the old thread, a cron job to restart radvd every hour was suggested, but I thought that this problem was fixed long ago?

There are two other boxes which do not have this behaviour (with the same ISP!) and I do not see any configuration difference. Also, to my knowledge, the prefix also does not even change...
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

It was a pretty nasty problem but I think it's still fixed. Can you try to confirm when the prefix is not being advertised if it is in /var/etc/radvd.conf -- if it's not there it's a simpler (but still elusiv) timing problem.

It's being discussed here: https://github.com/opnsense/core/issues/6637


Cheers,
Franco

> It was a pretty nasty problem but I think it's still fixed.

I don't think so. Pretty sure I just hit on this error today.

Been running on whatever the latest version of 23.1 was in February; fresh install on a Protectli Vault FW6A. Things were working fine for months. I was not using DHCPv6 yet, though I intended to use it later this year since I don't like stateless auto assignment.

Replaced the FW6A with an FW6C about a week ago; fresh install followed by an update to the latest release: 23.1.11-amd64

Things seemed fine for a few days but then I noticed my devices/clients were missing their IPv6 addresses.

Tried all sorts of stuff on the client side, and the clients ranged from Windows PCs to Linux PCs to Android devices to managed switches running whatever embedded OS they use. All had the same problem. So it's not the clients.

Rebooting the router and then rebooting my devices caused ipv6 to reappear (i.e. be assigned) on my client devices.

I then waited a few days for the problem to hit again, which it did. However, instead of a full reboot the only thing I did today was log into the OPNsese router and restart the radvd service. I didn't restart any of my clients or even bring the network interfaces down and up. The Windows and Linux computers regained their ipv6 address.

I went onto the forums and saw that radvd had this problem in the past. Well, it looks like it's back.

This looks like a regression to me, guys. And a significant one at that.

July 01, 2023, 10:13:05 PM #3 Last Edit: July 01, 2023, 10:14:39 PM by franco
I strongly disagree.

1. https://github.com/opnsense/core/issues/4338 -- there was no single post since 2021 and people are very vocal about it even after issues are closed.

2. The radvd workaround that I put in place is still there since 2021.

3. https://github.com/opnsense/core/issues/6637 matches in timing and problem scope and it's clearly not #4338. And furthermore #6637 is NOT significant after having spend an hour on triage -- if you see it you are affected but jumping to conclusions about scope is futile.

4. Nobody did the debugging so far that I asked for. OP even confirmed two installs work fine. Indicates timing as well.

5. If it were true what you said it's still a FreeBSD kernel bug so good luck over there. https://bugs.freebsd.org


Cheers,
Franco

The /var/etc/radvd.conf does not include the prefix in the erroneous situations, which does seem to indicate that there is a timing issue. The LAN interface does have an IPv6 address, though. After a radvd restart, the prefix appears in both the config file and the router advertisements.

I am at a loss as for why one system has this problem and two do not.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

This is worse than I thought.

I've been messing around with most of the available settings. Managed. Unmanaged. DHCPv6 assisted. SLAAC only.

Doesn't matter.

Within less than 12 hours my IPv6 addresses disappear, replicated on both Windows and Linux clients. Not even 24 hours and the problem rears its head. Restarting the radvd service restores IPv6 connectivity.

I don't remember my IPv6 addresses disappearing this fast some months ago. I think I would have noticed.

I'll reinstall to an older version of OPNsense when I get some time, just to verify, but this looks like a regression to me.

And it doesn't matter if the root problem is the radvd service or FreeBSD. Bottom line is that IPv6 in this release of OPNsense is broken. IPv6 address are being briefly assigned but are somehow getting yanked away/expiring/vanishing into the ether/whatever you want to call it.

Now the other system has started to develop the same problem as well, so it seems to be independent of the configuration.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Trying to find the underlying issue.. a test patch and a request to try the development version are open for feedback...

https://github.com/opnsense/core/issues/6637#issuecomment-1637463799


Cheers,
Franco