[SOLVED] 16.7.10: DNS Forwarder host overrides have stopped working

Started by danuary, December 05, 2016, 03:26:27 PM

Previous topic - Next topic
Hi,

As of this morning dnsmasq host overrides return an NXDOMAIN when looked up. All other DNS lookups (those that forward to an external DNS server) work fine. Nothing in the logs at all related that I can see - resolver.log shows no updates in quite some time and just shows reloads.

Two things happened yesterday:
- I updated the box. Don't see how I can tell what updated, but dnsmasq is version 2.76,1 if it matters.
- I had a DHCP client register a hostname with a space (" "). Since removed.

Any ideas?

Hi,

We had one report with bad permissions on the new file /var/etc/dnsmasq-hosts, can you provide output for the following command?

# ls -lah /var/etc/dnsmasq-hosts

600 are wrong permissions, 644 is correct.


Cheers,
Franco

Hi Franco,
Due to needing to get things up and running this morning, I reinstalled the router and restored a recent backup which did resolve the issue. I've taken its clone though and upgraded it to current as of this morning to play around a bit. I'm almost certain dnsmasq-hosts was 600 and indeed that would have been the source of the issue, but I can't seem to reproduce. After update the file is root:wheel and 644. I'll keep an eye on it and see if for some reason it decides to change permissions. Thanks much for your help.

Hi danuary,

Yes, it is some sort of issue with the upgrade having a slightly different environment during its execution, but we've not gotten to the bottom of it because it's difficult to catch.

So far this is the second case we've had recorded. I'll keep an eye out for this.


Cheers,
Franco

I too ran into the same problem, but my /var/etc/dnsmasq-hosts was set to mode 640.  Once I did a chmod 644 and then restarted the servers (DNS Forwarder under services) it started working again.

In my opinion, the solution is two parts:

1. do a chmod after you finish the update to make sure the file is the proper mode. 

2. Log an error message when you can't read the file.  This is so much harder to solve without useful log messages.


Also, a bunch of my logs have all grown really large for some reason:

root@gw:/var/log # ls -ltr
total 9364
drwx------  2 root   wheel     512 Jul 25 19:01 suricata
drwx------  2 www    www       512 Jul 25 19:04 lighttpd
drwxr-x---  2 squid  squid     512 Jul 25 19:11 squid
drwxr-xr-x  2 root   wheel     512 Sep 10 23:06 installer
-rw-------  1 root   wheel  511488 Sep 10 23:08 ipsec.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 openvpn.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 squid.syslog.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 portalauth.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 ppps.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 relayd.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 wireless.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 vpn.log
-rw-------  1 root   wheel  511488 Sep 10 23:08 lighttpd.log
drwxr-xr-x  2 root   wheel     512 Sep 10 23:10 ntp
-rw-------  1 root   wheel  511488 Sep 10 23:26 gateways.log
-rw-------  1 root   wheel  511488 Sep 10 23:26 resolver.log
-rw-------  1 root   wheel  511488 Sep 10 23:26 routing.log
-rw-------  1 root   wheel  511488 Sep 10 23:27 ntpd.log
-rw-------  1 root   wheel  511488 Sep 10 23:27 filter.log
-rw-------  1 root   wheel  511488 Sep 10 23:27 system.log
-rw-------  1 root   wheel     130 Sep 11 03:01 mount.today
-rw-------  1 root   wheel  511488 Oct 22 14:44 suricata.syslog.log
-rw-------  1 root   wheel    3061 Nov 29 03:01 setuid.yesterday
-rw-------  1 root   wheel   10660 Nov 29 03:01 dmesg.yesterday
-rw-------  1 root   wheel   41398 Dec  1 03:01 dmesg.today
-rw-------  1 root   wheel    3061 Dec  4 03:01 setuid.today
-rw-------  1 root   wheel     284 Dec  4 03:01 pf.yesterday
-rw-------  1 root   wheel     407 Dec  5 03:01 pf.today
-rw-------  1 root   wheel  511488 Dec  5 08:25 dhcpd.log
-rw-------  1 root   wheel   10245 Dec  5 08:29 userlog
-rw-r--r--  1 root   wheel       0 Dec  5 08:29 lastlog
-rw-r--r--  1 root   wheel     197 Dec  5 19:02 utx.lastlogin
-rw-r--r--  1 root   wheel     290 Dec  5 19:02 utx.log


And they have a bunch of binary data at the end.  It's as if they all got corrupted somehow. 

Hi there,

1. It will be the only chmod() in all of that code, code that also writes its config files into the same place than before, which means the problem is somewhere else and masking problems without knowing the underlying cause is bound to blow up in half a year or a year or maybe 5 years and who knows which poor soul will have to deal with our legacy code by then. I'd rather have people report this, so we know the impact and scope, even if it is annoying, it's important.

2. Dnsmasq can't read that file but remains allegedly silent. If it doesn't throw an error, what can we do short of reporting it upstream if we have a substantial report for (1.)? :)

The log files are normal: these are clog files, binary ring buffer log data so they don't have to be rotated. This is true for most of the logs contained.


Franco

Hello,

After upgrading OPNsense to 16.7.10-amd64, Host Overrides in DNS Forwarder (version 2.76,1) doesn't work.

QuoteA potential DNS Rebind attack has been detected.
Try to access the router by IP address instead of by hostname.

I've checked :
root@opnsense:~ # ls -lah /var/etc/dnsmasq-hosts
-rw-r-----  1 root  wheel    46B Dec  6 14:30 /var/etc/dnsmasq-hosts


So I've made a chmod and restarted dnsmasq service from web and console (service dnsmasq stop, service dnsmasq onestart) but it still doesn't work.

root@opnsense:chmod 644 /var/etc/dnsmasq-hosts
root@opnsense:/var/log # ls -lah /var/etc/dnsmasq-hosts
-rw-r--r--  1 root  wheel    46B Dec  6 14:50 /var/etc/dnsmasq-hosts
root@opnsense:/var/log # service dnsmasq stop
Stopping dnsmasq.
Waiting for PIDS: 62881.
root@opnsense:/var/log # service dnsmasq onestart
Starting dnsmasq.



Am I missing something ?


"service dnsmasq onestart" is not the right way to start dnsmasq, unfortunately. We've made this work for newer components we've added, but for the ones that were always there like dnsmasq.

Make sure to kill the stray dnsmask to avoid further problems:

# pkill dnsmasq

Go to Services: DNS Forwarder and simply save the configuration and it should be back.


Cheers,
Franco

Bad news,

# pkill dnsmasq and save the configuration didn't solved the problem.
I've also rebooted the firewall, it didn't work better.

Everything seems to be well configured :
root@opnsense:~ # ls -lah /var/etc/dnsmasq-hosts
-rw-r--r--  1 root  wheel    46B Dec  6 19:48 /var/etc/dnsmasq-hosts
root@opnsense:~ # cat /var/etc/dnsmasq-hosts
10.143.7.245    support.XXX.fr support


But when from LAN I try to resolve support.XXX.fr, opnsense stills return a public address (not 10.143.7.245)

In the same time, Domain Overrides works perfectly.

I don't know what to do now, any suggestion ?

Yes, there was a bug in dnsmasq for many years that tainted /etc/hosts and therefore local resolution.

You have "Do not use the DNS Forwarder/Resolver as a DNS server for the firewall" checked, but expect it to be unchecked?


Cheers,
Franco

I've just tried to check/unchek System > Settings > General "Do not use the DNS Forwarder/Resolver as a DNS server for the firewall" but result is still the same.

Computer on LAN has OPNsense LAN IP address as first DNS.
OPNsense has to resolv a host support.XXX.fr with an entry host overrides in dnsmasq but it doesn't work : I get a public IP instead of a private IP

Tomorrow I'm going to configure another firewall andmake some tests with a previous firmware...

But there is still a bug :/

Hello,

On a new OPNsense firewall running OPNsense 16.7.9-amd64 firmware,
dnsmasq works perfectly.

I won't upgrade to 16.7.10-amd64 and will wait for a patch... looking the changelog.

I 've noticed something strange : both firmwares are running dnsmasq 2.76,1
(see it in System > Firmware > Updates)

To be continued

The same problem, but it emerged only after adding a new Host Override record, and not immediately after upgrade to 16.7.10.

Fixed with "chmod 644 /var/etc/dnsmasq-hosts" and "Restart  Service" button in DNS Forwarder settings.

Not my most favourite solution, but more reports came up. Run this from the console and restart/reboot:

# opnsense-patch cd6cdba1b

https://github.com/opnsense/core/commit/cd6cdba1b


Cheers,
Franco

That's definitively strange,
patch doesn't work in my configuration...

What I've ever done :
* On a 16.7.10, I reinstall dnsmasq plugin, it doesn't work
* I reset to factory defaults (same 16.7.10/2.76,1 version, fresh config) : after reconfiguring, it still doesn't work
* I reinstall a fresh 16.7 version, upgrade to 16.7.10 and reconfigure dnsmasq : it doesn't work and chmod is ok
* I force a "chmod 644 /var/etc/dnsmasq-hosts" and "Restart  Service" but same result.

I'm going crazy  :o Am I missing something ?

Any help will be appreciated :)