Likley BUG - dnsmasq - Query DNS servers sequentially not working as expected

Started by gspannu, January 01, 2025, 07:54:57 PM

Previous topic - Next topic
dnsmasq option for "Query DNS servers sequentially" is not working as expected in 25.1.b_20-amd64

A fairly simple setup:

192.168.1.111 is a PiHole machine on the same LAN
8.8.8.8 is the external Google DNS

The two DNS servers are defined in System > Settings > General in this order
- 192.168.1.111
- 8.8.8.8

The underlying idea is that OPNsense should first try and resolve the DNS query using PiHole (192.168.1.111) and ONLY if it fails, should then resolve the query using the next DNS server i.e. Google (8.8.8.8)


Working behaviour:
  • dnsmasq receives the queries from clients.
  • DNS queries are forwarded to 192.168.1.111
  • No queries are forwarded to 8.8.8.8 (as the query DNS server sequentially is set).
- Verified this with dnsmasq logs. All good.
- Just as information, if the 'Query DNS server sequentially' flag is unset, queries are forwarded to both upstream servers, exactly as expected.
All good so far.

Problematic behaviour:
  • Turn the PiHole machine (192.168.1.111) off or remove network cable (i.e. make PiHole inaccessible)
  • dnsmasq should forward query to 192.168.1.111 (It does, all good)
  • On failing to resolve the query (i.e. timeout), dnsmasq should now forward the query to 8.8.8.8, but it never does.
  • No query is ever sent to 8.8.8.8
- Essentially, all DNS queries from clients now start to fail and dnsmasq never forwards any queries to the next DNS server (8.8.8.8)

As info, this setup was working fine until 24.7 (from what I recall)

-----------------------------------------

Additional information:

  • Unbound is running on port 53535 (I know not needed, but should not be relevant for the use case)
  • Also using a custom dnsmasq config file (/usr/local/etc/dnsmasq.conf.d/0-myfile.conf).
  • It contains two entries so that PiHole can identify the client correctly.
add-mac
add-subnet=32,128


Hmm, not that I know of. Also wrong in 24.7.x? At first glance -- if this is an actual issue -- I would still consider Dnsmasq as the culprit.

The only recent change in the binary is https://github.com/opnsense/ports/commit/74191b13c03 but this is a fix for dhcp-relay from upstream itself.


Cheers,
Franco

Hi Franco,

This 'bug/issue' has likely been introduced sometime in 24.7.x, as I was running this setup for many months without any issue and the failover principle always worked.

I then switched dover to AGH at some point, so cannot pinpoint in which 24.7.x build this may have crept in.

It is definitely not working as expected in 25.1 beta.

Happy so send any logs or any other information required.

As a side question, is there any plan to build DHCP into dnsmasq itself?

@franco:

Are you aware of the default timeout as used by dnsmasq (in OPNsense) for its forwarded query? Or any way of finding out.

I think there may be an issue with the default timeout or some other code base that is causing dnsmasq not to use the next available server (if the first fails).


Quote from: gspannu on January 08, 2025, 09:49:35 PMHi Franco,

This 'bug/issue' has likely been introduced sometime in 24.7.x, as I was running this setup for many months without any issue and the failover principle always worked.

I then switched dover to AGH at some point, so cannot pinpoint in which 24.7.x build this may have crept in.

It is definitely not working as expected in 25.1 beta.

Happy so send any logs or any other information required.

As a side question, is there any plan to build DHCP into dnsmasq itself?

I agree, I believe it came in with 24.7.12. All was fine until I updated then had a dnsmasq issue of some sort. I also noticed that my advanced settings conf file in the dnsmasq.conf.d folder was wiped out after the update. This would have just simplified my client reporting to pihole. Something else must have also happened since my DNS wasn't working at all.

Quote from: opensourcefan on January 18, 2025, 06:49:36 AM
Quote from: gspannu on January 08, 2025, 09:49:35 PMHi Franco,

This 'bug/issue' has likely been introduced sometime in 24.7.x, as I was running this setup for many months without any issue and the failover principle always worked.

I then switched dover to AGH at some point, so cannot pinpoint in which 24.7.x build this may have crept in.

It is definitely not working as expected in 25.1 beta.

Happy so send any logs or any other information required.

As a side question, is there any plan to build DHCP into dnsmasq itself?

I agree, I believe it came in with 24.7.12. All was fine until I updated then had a dnsmasq issue of some sort. I also noticed that my advanced settings conf file in the dnsmasq.conf.d folder was wiped out after the update. This would have just simplified my client reporting to pihole. Something else must have also happened since my DNS wasn't working at all.

You may be right that something definitely changed around 24.7.x

I also recall that in earlier versions (24.7.?) the check-box settings 'Query DNS servers sequentially' did not work at all; the only way to make this work was to write 'strict-order' in a custom conf file.
However, now the checkbox setting does work, but now dnsmasq doe snot utilise the next server.

There is definitely something that has happened over the last few updates... Hopefully @Franco/ others will look into these.

Sequential server queries continue to work normally for me under 24.7.12

> I agree, I believe it came in with 24.7.12.

Wow, but how? Did you use opnsense-revert to verify which should be as easy as claiming this here? These blanket statements are not helping this move along.