dhclient adding static routes for learned dns

Started by RedVortex, February 04, 2022, 11:22:12 PM

Previous topic - Next topic
Hello,

I don't think this was the behaviour before 22.1 but I may be mistaken. Here's my issue.

I have "Allow DNS server list to be overridden by DHCP/PPP on WAN" unchecked (off) in Settings/General.

I have multiple interfaces in DHCP and I use multi-wan

- ISP1 IPV4 PPPoE (Original WAN)
- ISP1 IPTV DHCP

- ISP2 IPv4 DHCP
- ISP2 IPv6 DHCPv6

For some reasons, the IPv4 DNS servers that are learned from the DHCP interfaces are being added as static routes. Even if I put a static route for the DNS server to another interface, it will get overwritten after some time automatically.

When i check Interfaces/Overview, I see that the all the interfaces, except the ISP1 IPV4 (Original WAN), have learned DNS servers in them. They should all be empty since I've unchecked the "Allow DNS server list to be overridden by DHCP/PPP on WAN" setting. Or if they are learned they should not be acted upon at least.

Those learned DNS servers are not being pushed to clients so that part is actually working but opnsense itself seems to be creating static routes to those dns servers IPs (Only on IPv4 DNS servers from what I'm seeing) to the interface it learned them from.

Could it be that only the first, original, WAN interface DNS servers are being ignored ? If so, how can I prevent those automatically added static routes or how can I prevent the other interfaces from learning DNS servers through DHCP ?

Bottom line for me, my ISP2 is much slower (40ms vs 1ms) than my ISP1 and since opnsense seems to add static routes to the interface where it learned the DNS servers automatically, all DNS queries are going to ISP2 instead of ISP1 like I would prefer.

I tried setting the DNS servers in General to the ISP1 gateway, it works for some time and then ISP2 DHCP (I think it's the trigger) renewal happens and the DNS servers routes change back to ISP2. I tried adding a static route to ISP1 for the DNS IPs and again, works for some time then automatically gets back to ISP2.

Any suggestion or idea is welcome as it seems I have no control over the behaviour and cannot force it.

Here's what I should normally have and also what I get if I save setting in General or routes, etc... So I think opnsense do sets everything properly by itself (normal.png)

And what I end up having after a few moments when DHCP renews on ISP2 (Starlink) (afterdhcprenewal.png)

Hi RedVortex,

Thanks for the analysis. I did rework this for 22.1 assuming the code above was in charge of creating routes but it does not appear so. It's a bummer we lose the cache files for nameservers and searchdomain since data and operation is coupled down there but for now this should do the trick

https://github.com/opnsense/core/commit/02dc1ebd93

# opnsense-patch 02dc1ebd93

If you can confirm I'll get this into 22.1.1 and work on the other implications later.


Thanks,
Franco

Quote from: franco on February 05, 2022, 08:21:58 AM
If you can confirm I'll get this into 22.1.1 and work on the other implications later.

Thanks a lot, that indeed works perfectly for me !

As a side note, I still see in Overview the IPv6 DNS servers gathered from DHCPv6 but they don't end up in the route table, not sure what is different for IPv6 ? Maybe not the same script or part of the script ? That would be what we would also like the IPv4 DNS to do I think (show up in the Overview but not in the routes). Anyways, just to mention that the IPv6 vs IPv6 behaviour seems different to DHCP learned DNS and no issue there, it works as expected for me. Could it be because I have prefer IPv4 over IPv6 enabled ? I doubt it...

And yes, I totally agree that it would be very nice in the Overview to still "see" everything that gets pulled from DHCP even though it is ignored as per my current configs.

If I could push my luck on what should also be in Overview: see any routes also being pushed through DHCP, like Starlink does for instance. Those routes end up in the route table (as they should since we respect what DHCP sends us). But that will be for some other time (DHCP learned routes in Overview), you've successfully resolved the initial issue of this post.

Also, maybe in the future, we could add an option to the DHCP client interface config to actually ignore those routes as well, like for DNS. I see many reasons, including route hijacking, security or traffic management that would justify this. I'll open up a "feature request" or something maybe for this one at some point.

However, even though they are there, they don't seem work properly for some reason. I'm still trying to figure out why, maybe because of multi-wan or some nat issue but it was already not working in pre-22.1 so it's nothing related to this version. I'll continue to troubleshoot this part that had been bugging me for months and I'll let you know if I find something.

For your information, here's an example of what we receive from Starlink DHCP regarding the routes I'm talking about.

14:29:28.321436 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 331)
    100.127.255.1.67 > xx.xx.xx.xx.68: [no cksum] BOOTP/DHCP, Reply, length 303, xid 0x29719d56, Flags [none] (0x0000)
  Your-IP xx.xx.xx.xx
  Client-Ethernet-Address xx:xx:xx:xx:xx:xx
  Vendor-rfc1048 Extensions
    Magic Cookie 0x63825363
    DHCP-Message Option 53, length 1: ACK
    Server-ID Option 54, length 4: 100.127.255.1
    Subnet-Mask Option 1, length 4: 255.192.0.0
    Default-Gateway Option 3, length 4: 100.127.255.1
    Domain-Name-Server Option 6, length 8: 8.8.8.8,8.8.4.4
    Lease-Time Option 51, length 4: 300
    Classless-Static-Route Option 121, length 23: (192.168.100.1/32:0.0.0.0),(34.120.255.244/32:0.0.0.0),(default:100.127.255.1)
    END Option 255, length 0


192.168.100.1 being the dish itself, so we can talk to it directly without going through the gateway/Internet or when the service is down but the dish itself is powered on. I suppose there is a reason for the 34.x.x.x as well, maybe for support or something similar I never checked.

Here's what ends up in the route table (link#4/igb0 is Starlink's interface)


34.120.255.244     link#4             UHS        igb0
192.168.100.1      link#4             UHS        igb0


Which seems good. Like I said, still troubleshooting what is the issue with those, most likely my configs or something similar. For now, the workaround is to put an IP alias on the Starlink interface (192.168.100.2 for instance) and this allows us to talk to the dish directly. Seems to be working without doing this for some other routers and firewall so there might be a slight difference with opnsense. Maybe I should open up another post and see what the community thinks about this... Anyways, that's another story.

Thanks a lot for the quick fix, I really appreciate it !

February 07, 2022, 11:27:47 AM #4 Last Edit: February 07, 2022, 11:29:27 AM by franco
Quote from: RedVortex on February 05, 2022, 09:24:59 PM
As a side note, I still see in Overview the IPv6 DNS servers gathered from DHCPv6 but they don't end up in the route table, not sure what is different for IPv6 ? Maybe not the same script or part of the script ? That would be what we would also like the IPv4 DNS to do I think (show up in the Overview but not in the routes). Anyways, just to mention that the IPv6 vs IPv6 behaviour seems different to DHCP learned DNS and no issue there, it works as expected for me. Could it be because I have prefer IPv4 over IPv6 enabled ? I doubt it...

Let's just say the code to this day lacks certain structure. It's amazing to look at the code now and find an ancient bug or implementation shortcomings done just a couple of years ago. As I said I can clean this up for 22.7 now that I know how it should act.

Quote from: RedVortex on February 05, 2022, 09:24:59 PM
And yes, I totally agree that it would be very nice in the Overview to still "see" everything that gets pulled from DHCP even though it is ignored as per my current configs.

Thanks, happy to hear this.

Quote from: RedVortex on February 05, 2022, 09:24:59 PM
If I could push my luck on what should also be in Overview: see any routes also being pushed through DHCP, like Starlink does for instance. Those routes end up in the route table (as they should since we respect what DHCP sends us). But that will be for some other time (DHCP learned routes in Overview), you've successfully resolved the initial issue of this post.

Does not sound like a bad idea. Happy to look at it in exchange for a GitHub issue.

Quote from: RedVortex on February 05, 2022, 09:24:59 PM
Also, maybe in the future, we could add an option to the DHCP client interface config to actually ignore those routes as well, like for DNS. I see many reasons, including route hijacking, security or traffic management that would justify this. I'll open up a "feature request" or something maybe for this one at some point.

Yep. Focus on visibility now, then discuss how something can be achieved with this new runtime data.

Quote from: RedVortex on February 05, 2022, 09:24:59 PM
However, even though they are there, they don't seem work properly for some reason. I'm still trying to figure out why, maybe because of multi-wan or some nat issue but it was already not working in pre-22.1 so it's nothing related to this version. I'll continue to troubleshoot this part that had been bugging me for months and I'll let you know if I find something.

Sure, let me know when you find something. :)


Cheers,
Franco

Need to remove the patch from 22.1.1 for now... differing config.xml content and PATH not knowing about /usr/local prefix in dhclient-script makes this worse than before.


Cheers,
Franco

Hi!  :)

Here is new patch to test on top of 22.1.1_3:

https://github.com/opnsense/core/commit/3d42186f

# opnsense-patch 3d42186f

(more info in the commit message)


Cheers,
Franco


Hello !

Woah ! That's quite a rework of the whole thing, a new script to handle it and all :D I'll definitely test this tonight and get back to you.

It seems to work for me, however I'm not sure for others.

I enabled "Allow DNS server list to be overridden by DHCP/PPP on WAN" to determine if I would get my ISPs DNS servers instead of the ones I have configured myself in the UI (System/Settings/General).

My multiple providers each push me their own DNS servers either through DHCP or PPPoE. However, I don't see those end up in the firewall. I checked my opnsense's /etc/resolv.conf and I only see the DNS servers I configured myself from the System/Settings/General.

Should I remove the DNS servers I've put there myself for the providers DNS servers to kick in or should it automatically work when I check the "Allow DNS server list to be overridden" ?

I also checked the Interfaces Overview and I do not see the providers DNS appear there but I think they used to show up there when the "Allow DNS server list to be overridden" is checked.

When I revert the patch, everything works as expected: I see DNS server appear in the interfaces overview and I also see the opnsense /etc/resolv.cong having all the DNS of the providers appear as well.

To be clear, I do not need this functionality myself, I'm just testing for others to make sure they are still able to use their provider's DNS servers but the patch seems to break this functionality.

I can't say this doesn't work from here in both modes, but what I know is that it won't write the nameserver files until you renew the DHCP lease (or reboot). Otherwise there is no nameservers to operate on, put into /etc/resolv.conf etc.

But it could also be I messed up the backport. Let me take a fresh look today :)


Cheers,
Franco

There is a typo in PPPoE handling, maybe you had this issue. Working on it.


Cheers,
Franco

Another proposal: https://github.com/opnsense/core/commit/be4496f2fe

# opnsense-revert opnsense
# opnsense-patch be4496f2fe


Cheers,
Franco

Hello,

Same results it seems.

Now, hehehe, you may want to get a cup of coffee before digging into this ;-)

I'm trying to properly identify what is happening with and without the patch. This is way more complicated than what I first thought however and I see many different behaviours depending how the DNS are learned. On initial bootup, on a forced DHCP reload from the interface overview or on an automatic DHCP renewal, they all behave differently. Some of those may be desired or not, I guess... I mean, you may want to have specific routes to the DHCP-learned DNS servers if you have many WANs since you may want to reach those learned DNS only through the provider that gave them to you. But a problem arise when you have manually defined routes, or gateway monitoring using those DNS address on the other provider or when you have configured DNS using specific gateway, etc... When that happens, who should win ? The configurations in opnsense or the DHCP-learned ones ? I believe it should alway be what we manually configure or enforces, not what is dynamically learned, else, remove your configs and let the dynamic stuff do his stuff, right ?

Also, take into account that my PPPoE WAN is my primary gateway and the DHCP WAN is my secondary gateway.

Also take into account that I use one of the DHCP learned DNS as the IP address for gateway monitoring and that's how I discovered I had an issue in 22.1 vs 21.x. I think opnsense adds routes to this IP to make sure it uses this specific route/gateway for monitoring marking the gateway as up (which is how it should be). Since this opnsense-added route for monitoring for gateway A was getting overridden by the DHCP-learned DNS IP from gateway B, the gateway A monitoring wasn't working properly anymore. The monitoring for gateway A was going through gateway B (instead of the configured A) after DHCP renewal happened on gateway B because dhcp client triggered the addition of the static route when it learned this DNS IP on gateway B, which happens to be the IP address I use for gateway A monitoring in the gateway A configuration.

So... Trying to sum it up as best as I could, here's what I currently see since 22.1...

WITHOUT the patch and "Allow DNS server list to be overridden" enabled:

- I see all learned DNS servers listed in Interfaces/Overview for both DHCP and PPPoE WAN Interfaces [GOOD]
- I see all learned DNS servers listed in /etc/resolv.conf [GOOD]

DHCP WAN

- Routes are added for each DHCP-learned DNS servers of a DHCP WAN interface. It "seems" to happens on DHCP renewal only, not after initial bootup. On renewal, the routes being added have the link# (of the interface itself) as a gateway, not the DHCP-learned gateway IP. [BAD: behaviour should be the same for bootup, auto-renewal and forced-renewal. BAD: the routes should not be changed for anything defined manually in the opnsense config]

- When I manually go in interface overview and force DHCP reload, the behaviour seems a bit different than the automatic DHCP renewal behaviour. [BAD: inconsistent behaviour]
-- The DNS route for any DNS server defined in the System/Settings/General/DNS revert back to what they should be, which is usually the default gateway IP. [GOOD: what is configured in opnsense should take precedence over dynamic]
-- The "link-numbered DNS route" for any DNS server not defined in the opnsense config do remain or get added. [Could be GOOD, since we want to use the learned DNS in this mode so "unconfigured in opnsense" dns or routes can be added from what is learned dynamically in this case]
-- When the auto-renewal of DHCP re-occurs, the opnsense-defined DNS server gets overridden again with a link# route associated to the DHCP WAN interface, which, for me, becomes a bad route since this is not this gateway that I want for this DNS server. [BAD: what is configured in opnsense should take precedence over dynamic]

- DHCP-learned routes (Classless-Static-Route Option 121)
-- Are not added to the routing table on bootup and on automatic-renewal [BAD: DHCP-learned routes should alway be added to the routing table. Unless we have a manually added static routed is configured already or this subnet is configured on an interface already]
-- Are added to the routing table on forced DHCP reload interfaces overview. [GOOD: Unless we have a manually added static routed is configured already or this subnet is configured on an interface already]

PPPoE WAN

- Routes are added for each PPPoE-learned DNS servers of PPPoE WAN Interfaces on bootup and renewal. Those routes do point to the PPPoE-learned gateway IP, never to the link#. [Could be GOOD, but manually configured routes or DNS should take precedence]

WITHOUT the patch and "Allow DNS server list to be overridden" disabled:

- I see all learned DNS servers listed in Interfaces/Overview only for DHCP WAN Interfaces, missing from PPPoE interfaces [BAD: we want to see them always, on all WAN types, whatever mode we are in]
- No learned DNS servers listed in /etc/resolv.conf  [GOOD]

DHCP WAN

- Same behaviours as above for DNS routes [BAD: no DNS routes should be added or changed in this mode since we should disregard learned DNS. We should still honor DHCP-learned routes, if any though]
- Same behaviours as above for DHCP-learned routes (Classless-Static-Route Option 121) [BAD and GOOD a above]

PPPoE WAN

- NO routes are added for each PPPoE-learned DNS servers of PPPoE WAN Interfaces on bootup and renewal [GOOD in this mode]

WITH the patch and "Allow DNS server list to be overridden" disabled (My ideal situation - Same as right above but with the patch):

- I DO NOT see any learned DNS servers listed in Interfaces/Overview for both DHCP and PPPoE WAN Interfaces [BAD: we want to see them always, on all WAN types, whatever mode we are in]
- No learned DNS servers listed in /etc/resolv.conf  [GOOD]
- No added static routes for DNS on bootup, DHCP or PPPoE automatic-renewal or forced reload [GOOD]

WITH the patch and "Allow DNS server list to be overridden" enabled (Me testing for others for this patch:

Behaviour is the exact same as above on all 3 points, as if the "Allow DNS server list to be overridden" is completely ignored even though it is properly configured

    <dnsallowoverride>1</dnsallowoverride>
    <dnsallowoverride_exclude/>


The part that pertains to this in the patch seems here. This seems to generate an empty list of what is learned as only the configured DNS from the settings ends up in the resolv.conf file.

    if (isset($syscfg['dnsallowoverride'])) {
        $search = array_merge($search, get_searchdomains());
-        foreach (get_nameservers() as $nameserver) {
-            $resolvconf .= "nameserver $nameserver\n";
+        foreach (get_nameservers(null, true) as $nameserver) {
+            $resolvconf .= "nameserver {$nameserver['host']}\n";
+            $routes[] = $nameserver;
        }
    }


Unsure if it is related or not ?

Also, to be sure we're testing against the same version of opnsense here's my version: 22.1.1_3

Hello, :)

Quote from: RedVortex on February 26, 2022, 06:22:44 AM
I'm trying to properly identify what is happening with and without the patch. This is way more complicated than what I first thought however and I see many different behaviours depending how the DNS are learned. On initial bootup, on a forced DHCP reload from the interface overview or on an automatic DHCP renewal, they all behave differently. Some of those may be desired or not, I guess... I mean, you may want to have specific routes to the DHCP-learned DNS servers if you have many WANs since you may want to reach those learned DNS only through the provider that gave them to you. But a problem arise when you have manually defined routes, or gateway monitoring using those DNS address on the other provider or when you have configured DNS using specific gateway, etc... When that happens, who should win ? The configurations in opnsense or the DHCP-learned ones ? I believe it should alway be what we manually configure or enforces, not what is dynamically learned, else, remove your configs and let the dynamic stuff do his stuff, right ?

Basically yes. What you describe is part of the patch notes:

This is still subject to a lot of funky races for overlapping host
routes either by ISP, manual DNS, gateway monitors or static routes.


What stands out about this new method is that we use the same code now to set the routes that set the manual routes configured on System: Settings: General and it still configures them last which means they always win indeed, which wasn't the case previously (static assignments are sometimes done earlier than dynamic ones).

So I think currently the last route wins on production 22.1.x as was the case for years and years on the project and even other projects before that. Now the code merges the routes advertised from connections (DHCP and PPPoE, both IPv4 and IPv6) and serializes them into a single array to add host routes from. Between different ISPs giving the same nameservers that's still a "last one wins" which can present itself as "deterministically" (fixed by interface names, configuration setup) "random" (as per timing for multiple ISP responses done in parallel).

Quote from: RedVortex on February 26, 2022, 06:22:44 AM
Also take into account that I use one of the DHCP learned DNS as the IP address for gateway monitoring and that's how I discovered I had an issue in 22.1 vs 21.x. I think opnsense adds routes to this IP to make sure it uses this specific route/gateway for monitoring marking the gateway as up (which is how it should be). Since this opnsense-added route for monitoring for gateway A was getting overridden by the DHCP-learned DNS IP from gateway B, the gateway A monitoring wasn't working properly anymore. The monitoring for gateway A was going through gateway B (instead of the configured A) after DHCP renewal happened on gateway B because dhcp client triggered the addition of the static route when it learned this DNS IP on gateway B, which happens to be the IP address I use for gateway A monitoring in the gateway A configuration.

Yes, basically the problem in a system where 4 different entities can "claim" the same route and from the GUI you can't find out what is going on. The console is slightly better but also not overly helpful WRT what happens. In the patch above there is now a mode for the nameserver.sh script to view the contents so it's easier to inspect different devices. The whole idea to hinge the functionality over a single script makes it a lot easier to enforce an ordering on the behaviour if we decide to do it. But we are not there just yet.

[...]

Quote from: RedVortex on February 26, 2022, 06:22:44 AM
WITH the patch and "Allow DNS server list to be overridden" enabled (Me testing for others for this patch:

Behaviour is the exact same as above on all 3 points, as if the "Allow DNS server list to be overridden" is completely ignored even though it is properly configured

The part that pertains to this in the patch seems here. This seems to generate an empty list of what is learned as only the configured DNS from the settings ends up in the resolv.conf file.

    if (isset($syscfg['dnsallowoverride'])) {
        $search = array_merge($search, get_searchdomains());
-        foreach (get_nameservers() as $nameserver) {
-            $resolvconf .= "nameserver $nameserver\n";
+        foreach (get_nameservers(null, true) as $nameserver) {
+            $resolvconf .= "nameserver {$nameserver['host']}\n";
+            $routes[] = $nameserver;
        }
    }


Unsure if it is related or not ?

Also, to be sure we're testing against the same version of opnsense here's my version: 22.1.1_3

I can't agree on this: "is completely ignored even though it is properly configured". The intention was to not touch the handling of the configuration to avoid side effects and you can see the handling of the if "(isset(..." didn't change at all. Could be that I screwed up something again, but not sure what it was this time :)


Cheers,
Franco