[RECREATED] BUG - dnsmasq - Query DNS servers sequentially not working as expected

Started by gspannu, January 01, 2025, 07:54:57 PM

Previous topic - Next topic
Update: 2025/Mar/14 16:00 hrs

Please see Post 11 for details to recreate this issue




dnsmasq option for "Query DNS servers sequentially" is not working as expected in 25.1.b_20-amd64

A fairly simple setup:

192.168.1.111 is a PiHole machine on the same LAN
8.8.8.8 is the external Google DNS

The two DNS servers are defined in System > Settings > General in this order
- 192.168.1.111
- 8.8.8.8

The underlying idea is that OPNsense should first try and resolve the DNS query using PiHole (192.168.1.111) and ONLY if it fails, should then resolve the query using the next DNS server i.e. Google (8.8.8.8)


Working behaviour:
  • dnsmasq receives the queries from clients.
  • DNS queries are forwarded to 192.168.1.111
  • No queries are forwarded to 8.8.8.8 (as the query DNS server sequentially is set).
- Verified this with dnsmasq logs. All good.
- Just as information, if the 'Query DNS server sequentially' flag is unset, queries are forwarded to both upstream servers, exactly as expected.
All good so far.

Problematic behaviour:
  • Turn the PiHole machine (192.168.1.111) off or remove network cable (i.e. make PiHole inaccessible)
  • dnsmasq should forward query to 192.168.1.111 (It does, all good)
  • On failing to resolve the query (i.e. timeout), dnsmasq should now forward the query to 8.8.8.8, but it never does.
  • No query is ever sent to 8.8.8.8
- Essentially, all DNS queries from clients now start to fail and dnsmasq never forwards any queries to the next DNS server (8.8.8.8)

As info, this setup was working fine until 24.7 (from what I recall)

-----------------------------------------

Additional information:

  • Unbound is running on port 53535 (I know not needed, but should not be relevant for the use case)
  • Also using a custom dnsmasq config file (/usr/local/etc/dnsmasq.conf.d/0-myfile.conf).
  • It contains two entries so that PiHole can identify the client correctly.
add-mac
add-subnet=32,128


Hmm, not that I know of. Also wrong in 24.7.x? At first glance -- if this is an actual issue -- I would still consider Dnsmasq as the culprit.

The only recent change in the binary is https://github.com/opnsense/ports/commit/74191b13c03 but this is a fix for dhcp-relay from upstream itself.


Cheers,
Franco

Hi Franco,

This 'bug/issue' has likely been introduced sometime in 24.7.x, as I was running this setup for many months without any issue and the failover principle always worked.

I then switched dover to AGH at some point, so cannot pinpoint in which 24.7.x build this may have crept in.

It is definitely not working as expected in 25.1 beta.

Happy so send any logs or any other information required.

As a side question, is there any plan to build DHCP into dnsmasq itself?

@franco:

Are you aware of the default timeout as used by dnsmasq (in OPNsense) for its forwarded query? Or any way of finding out.

I think there may be an issue with the default timeout or some other code base that is causing dnsmasq not to use the next available server (if the first fails).


Quote from: gspannu on January 08, 2025, 09:49:35 PMHi Franco,

This 'bug/issue' has likely been introduced sometime in 24.7.x, as I was running this setup for many months without any issue and the failover principle always worked.

I then switched dover to AGH at some point, so cannot pinpoint in which 24.7.x build this may have crept in.

It is definitely not working as expected in 25.1 beta.

Happy so send any logs or any other information required.

As a side question, is there any plan to build DHCP into dnsmasq itself?

I agree, I believe it came in with 24.7.12. All was fine until I updated then had a dnsmasq issue of some sort. I also noticed that my advanced settings conf file in the dnsmasq.conf.d folder was wiped out after the update. This would have just simplified my client reporting to pihole. Something else must have also happened since my DNS wasn't working at all.

Quote from: opensourcefan on January 18, 2025, 06:49:36 AM
Quote from: gspannu on January 08, 2025, 09:49:35 PMHi Franco,

This 'bug/issue' has likely been introduced sometime in 24.7.x, as I was running this setup for many months without any issue and the failover principle always worked.

I then switched dover to AGH at some point, so cannot pinpoint in which 24.7.x build this may have crept in.

It is definitely not working as expected in 25.1 beta.

Happy so send any logs or any other information required.

As a side question, is there any plan to build DHCP into dnsmasq itself?

I agree, I believe it came in with 24.7.12. All was fine until I updated then had a dnsmasq issue of some sort. I also noticed that my advanced settings conf file in the dnsmasq.conf.d folder was wiped out after the update. This would have just simplified my client reporting to pihole. Something else must have also happened since my DNS wasn't working at all.

You may be right that something definitely changed around 24.7.x

I also recall that in earlier versions (24.7.?) the check-box settings 'Query DNS servers sequentially' did not work at all; the only way to make this work was to write 'strict-order' in a custom conf file.
However, now the checkbox setting does work, but now dnsmasq doe snot utilise the next server.

There is definitely something that has happened over the last few updates... Hopefully @Franco/ others will look into these.

Sequential server queries continue to work normally for me under 24.7.12

> I agree, I believe it came in with 24.7.12.

Wow, but how? Did you use opnsense-revert to verify which should be as easy as claiming this here? These blanket statements are not helping this move along.

Quote from: franco on January 20, 2025, 09:35:14 AM> I agree, I believe it came in with 24.7.12.

Wow, but how? Did you use opnsense-revert to verify which should be as easy as claiming this here? These blanket statements are not helping this move along.

@franco:

Just an update.

The bug is still present in the recent 24.1.2 update

Checking the option for 'Query DNS servers sequentially' ensures that queries are sent in the order of the specified dns servers. However, if the first server does not respond, the query just times out; and dnsmasq does not forward the query to the next defined dns server.

And with the plan for deprecating ISC DHCP and dnsmasq to be further updated in the 25.7 release with more DHCP options; may I request that this bug be looked into?  Many thanks.


I am happy to test/ provide more information...

@franco

I think I have identified the problem....



dnsmasq in OPNsense does not behave as expected if there is a custom .conf file in the `/usr/local/etc/dnsmasq.conf.d` folder.


A simple test to recreate the bug:

1. Add DNS servers to the OPNsense Settings
System > Settings > General > DNS servers
Server: 192.168.99.99
Server: 192.168.22.22
Server: 8.8.4.4

Ensure that the first 2 servers are dummy (i.e. will not respond to any DNS queries) and the 3rd server is a proper DNS server

2. Set strict-order
Services > dnsmasq > Settings
Query DNS servers sequentially - Checked

3. Restart dnsmasq

4. On any client machine, do some nslookup...
e.g. nslookup bbc.com
nslookup google.com


5. After a while, nslookup queries will be resolved.
dnsmasq will try the 1st server, time out, then try the second server, timeoutl and then finally resolve on 8.8.4.4
Check dnsmasq logs

All working fine as expected until now



Now to recreate the problem

1. Create a custom configuration file in OPnsense /usr/local/etc/dnsmasq.conf.d/folder

Create a file e.g. /usr/local/etc/dnsmasq.conf.d/0-custom.conf
Add two simple entries and save the file
add-mac
add-subnet=32,128


2. Restart dnsmasq service

3. Now run the same nslookup test
On any client machine, do a nslookup...
nslookup bbc.com
nslookup google.com


Result:
dnsmasq DOES NOT go to the next server sequence
The nslookup query will eventaually timeout and not resolve.
dnsmasq does not work as expected.



4. Now delete the custom config file from /usr/local/etc/dnsmasq.conf.d/ folder

5. Run the same test again.

Result:
dnsmasq works as expected.
dnsmasq will try the first server in sequence, time out, go to the next one, time out, and will then finally resolve on 8.8.4.4



It appears that dsnmasq does not work as expected when there is a custom configuration file.
dnsmasq was working fine in early versions of 24.7.x and this new incorrect behaviour was introduced sometime in 24.7.x

I cannot recall in which exact 24.7.x version this behaviour changed, but dnsmasq used to work fine with custom configurations.

custom configurations are crucial as there is no way to send mac-addresses, IP-addresses of requesting clients to upstream dns servers without the `add-mac, add-subnet` directives defined in custom conf file.

Could I request that this be looked at and addressed please?

> dnsmasq in OPNsense does not behave as expected if there is a custom .conf file in the `/usr/local/etc/dnsmasq.conf.d` folder.

Congratulations, you played yourself?


Cheers,
Franco

Quote from: franco on March 14, 2025, 05:05:19 PM> dnsmasq in OPNsense does not behave as expected if there is a custom .conf file in the `/usr/local/etc/dnsmasq.conf.d` folder.

Congratulations, you played yourself?


Cheers,
Franco

Hi Franco,

Is there a planned fix for this at some point?

This behaviour was introduced at some point in 24.7 sub-releases, strict-order used to work as expected with custom configurations earlier.

Thanks for your support...

What exactly makes this a OPNsense bug ? You should have the same issue on FreeBSD 14.2 or any linux distro that runs dnsmasq 2.90_5