24.7.2 IPv6 woes

Started by CruxtheNinth, August 26, 2024, 08:28:06 AM

Previous topic - Next topic
September 05, 2024, 10:50:42 AM #45 Last Edit: September 05, 2024, 10:56:22 AM by CruxtheNinth
Quote from: franco on September 05, 2024, 08:59:52 AM
good: v20240710, bad v20240820

Bisecting: 7 revisions left to test after this (roughly 3 steps)
[ba9cddfcb6e568a5c83775ed0c917a316acb64db] dhcp6c: fix prototype

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_1.pkg

all green so far with this version
IA_NA directly after reboot, IA_PD a few moments later. RA also fine, route got few minutes after reboot
Will observe a bit longer

EDIT: logs attached; reboot was 10:46

Quote from: meyergru on September 05, 2024, 10:41:02 AM
@CruxtheNinth: It now seems obvious that DG has different PON setups in different areas. My 3 installations are all in Lower-Saxony and are based on Nokia equipment. Just for the fun of it: I have read that DG also uses Genexis ONTs, is that what you have?

South Hessen here, also on Nokia ONT

Quote from: CruxtheNinth on September 05, 2024, 10:51:16 AM
South Hessen here, also on Nokia ONT

Just to complete this:
Lower Saxony, here we have both types in the same area, seems like it depends on when it what commissioned... Nokia seems to be the newer one.
i am not an expert... just trying to help...

Quote from: CruxtheNinth on September 05, 2024, 10:50:42 AM
Quote from: franco on September 05, 2024, 08:59:52 AM
good: v20240710, bad v20240820

Bisecting: 7 revisions left to test after this (roughly 3 steps)
[ba9cddfcb6e568a5c83775ed0c917a316acb64db] dhcp6c: fix prototype

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_1.pkg

all green so far with this version
IA_NA directly after reboot, IA_PD a few moments later. RA also fine, route got few minutes after reboot
Will observe a bit longer

EDIT: logs attached; reboot was 10:46

Thanks a lot, here is the next one:

Bisecting: 3 revisions left to test after this (roughly 2 steps)
[6db1c1f5c07c0b53b3ab7671ab52598c4647a54d] common: simplify setloglevel()

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_2.pkg

September 05, 2024, 01:03:48 PM #49 Last Edit: September 05, 2024, 01:08:52 PM by CruxtheNinth
Quote from: franco on September 05, 2024, 11:14:03 AM
Quote from: CruxtheNinth on September 05, 2024, 10:50:42 AM
Quote from: franco on September 05, 2024, 08:59:52 AM
good: v20240710, bad v20240820

Bisecting: 7 revisions left to test after this (roughly 3 steps)
[ba9cddfcb6e568a5c83775ed0c917a316acb64db] dhcp6c: fix prototype

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_1.pkg



all green so far with this version
IA_NA directly after reboot, IA_PD a few moments later. RA also fine, route got few minutes after reboot
Will observe a bit longer

EDIT: logs attached; reboot was 10:46

Thanks a lot, here is the next one:

Bisecting: 3 revisions left to test after this (roughly 2 steps)
[6db1c1f5c07c0b53b3ab7671ab52598c4647a54d] common: simplify setloglevel()

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_2.pkg


Reboot 12:50
Same as with _1;  but it took approx 10 minutes until the def route for ipv6 was available; probably just usual DG RA shenanigans. However all green since a minute.

Bisecting: 1 revision left to test after this (roughly 1 step)
[4bd3f0c78be1683f0a1343af129d829e1a69f8f6] git: 21st century

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_3.pkg

PS: as far as spoilers go this seems to be circling around https://github.com/opnsense/dhcp6c/commit/fe3ed661a5fb9 so it would be neither fix that caused this

the good news in this case would be that it's something simple to fix but the bad news is I don't see what's wrong with these transformations, only that the old code was not even random...

September 06, 2024, 08:38:35 AM #52 Last Edit: September 06, 2024, 09:31:30 AM by CruxtheNinth
Quote from: franco on September 05, 2024, 09:52:37 PM
Bisecting: 1 revision left to test after this (roughly 1 step)
[4bd3f0c78be1683f0a1343af129d829e1a69f8f6] git: 21st century

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_3.pkg

Now mayhem started (logs are attached)

Rebooted with _3 approx 08:05
- Directly after reboot IA_NA/IA_PD + ipv4 online but DNS was failing weirdly
- restarted unbound which fixed the issue for v4 resolution
- RA received approx 10 minutes later and it seemed ok

next i wanted to replicate the DNS problem and did another reboot with _3

reboot approx 08:23
- directly after reboot no IPv4, no IA_NA, no IA_PD
- ipv4 took a while but still no IA_NA or IA_PD
- few minutes late RA arrived (def route visible for v6) but still no IA_NA or IA_PD
(https://imgur.com/210CaAe)

still on _3 now and so far no working ipv6 in sight.

EDIT: 08:42 now, just did a manual restart of igc0/WAN (via reload button in the UI) and it instantly brought back IA_NA/IA_PD and its working now. (https://imgur.com/t1Hi43v)

EDIT: 08:49 its broken again. IA_NA/IA_PD are gone (added another latest.log zip (5) )

EDIT: 09:31 still broken


some weird race condition??

Ok, kind of expected that, so last one:

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[fe3ed661a5fb97f9eb9f14d51e7dadcf7a9364bb] dhcp6c: use arc4random_uniform()

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_4.pkg

And thanks a lot for your help so far!


Cheers,
Franco

September 07, 2024, 08:32:22 AM #54 Last Edit: September 07, 2024, 08:39:58 AM by CruxtheNinth
Quote from: franco on September 06, 2024, 11:46:37 PM
Ok, kind of expected that, so last one:

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[fe3ed661a5fb97f9eb9f14d51e7dadcf7a9364bb] dhcp6c: use arc4random_uniform()

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_4.pkg

And thanks a lot for your help so far!


Cheers,
Franco

you are welcome, happy to do some testing.

Behaviour is similar to _3

First reboot with _4 at 08:08:
- Everything fine directly after reboot, in fact it felt like it was never online this quickly. IA_NA/IA_PD and route there.

However, after a second reboot at 08:15, just for verification it started to act like _3.
- link-local fe80 only, no IA_NA or IA_PD BUT the RA was visible
- tried to recover with igc0 reload (log: <27>1 2024-09-07T08:20:26+02:00) which again did not help

its so odd to see the dhcp6c log to actually show that it received both IA_NA and PD but then just nothing happens.

will do another set of reboots. logs are attached

EDIT: 08:39 multiple reboots later, ipv6 staying down.

September 07, 2024, 08:45:53 AM #55 Last Edit: September 07, 2024, 08:47:27 AM by franco
Ok so that's it as unexpectedly expected...

fe3ed661a5fb97f9eb9f14d51e7dadcf7a9364bb is the first bad commit
commit fe3ed661a5fb97f9eb9f14d51e7dadcf7a9364bb
Author: Franco Fichtner <franco@opnsense.org>
Date:   Wed Aug 14 10:39:41 2024 +0200

    dhcp6c: use arc4random_uniform()
   
    No compatibility shim for now.

common.c | 17 +++--------------
common.h |  1 -
config.c |  2 +-
dhcp6c.c | 10 +---------
4 files changed, 5 insertions(+), 25 deletions(-)

https://github.com/opnsense/dhcp6c/commit/fe3ed661a5fb97f9eb9f14d51e7dadcf7a9364bb

So as a safe point for 24.7.4 we can revert this, but we're only about half-way there then

https://github.com/opnsense/dhcp6c/commit/6cb8c154d6

# pkg add -f https://pkg.opnsense.org/FreeBSD:14:amd64/snapshots/misc/dhcp6c-20240820_5.pkg

I honestly don't see the error in the arc4random-related transformations which just seem to provide correct randomness as per design now, not using a pseudo-random number from random() in these cases which bottom line wasn't really random to begin with. We'll have to dissect this commit a bit more but I'm sure we can fix this reliably and securely.

My theory is one of those timers is now giving "unexpected" values that maybe out of spec much more often now than before.


Cheers,
Franco

This one is probably safe as is, not used anyway: https://github.com/opnsense/dhcp6c/commit/4711abcce51

I'll be back later, need to find an airport. ;)


Cheers,
Franco

Nothing really obvious sticks out, but some remote ideas:

1. The srandom() call was omitted. The arc4random documentation from OpenBSD says: "The subsystem is re-seeded from the kernel random(4) subsystem using getentropy(2) on a regular basis, and also upon fork(2)." Maybe one should still call srandom() to initialize arc4random alongside?

2. Also, I do not see how "r = (double)(arc4random_uniform(1000) + 1) / 10000;" could always lead to RAND being "strictly greater than 0 ([RFC3315 17.1.2])" - however, that was not guaranteed to be the case before, then.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

September 07, 2024, 10:19:06 AM #59 Last Edit: September 07, 2024, 10:36:11 AM by CruxtheNinth
_5 seems "stable" so far. In quotation marks because: Had the funny situation where i had working IPv6 but no DHCPv4 for like 3 minutes but it recovered.

Rebooted twice for testing and it still seems fine but weirdly it (feels) all takes a bit longer to converge compared to 24.7.1, had few visual glitches in the UI where interfaces showed the prefix gone and after reload of the browser tab it was there again.

i am trying to replicate that behaviour but i cant anymore. trying not to chase ghosts, will do a few more tests, may have been a local/pebcak glitch

EDIT: 10:30 another reboot and the problems are back. took manual reload of igc0 to get a IA_NA and IA_PD, no RA yet (attached logs for this). 

EDIT2: 10:35 RA received but IA_NA and IA_PD are gone :/ - seems its not stable

EDIT3: All green again. it feels weird.