When I reboot DNS resolution by either Unbound or DNSmasq doesn't work. DNS requests from the LAN can go through the firewall to the internet during this time. Port 53 (TCP) can be opened from the LAN but DNS requests timeout.
Going in to SYSTEM: SETTINGS: GENERAL and clicking "Save" fixes the issue (with no setting changes).
This behavior affects Firewall requests like update checks and DNS diagnostics (etc) too.
I have tried seeing any of this in the firewall logs but couldn't see any - until I did the "General - Save" fix then I would see allowed DNS requests.
Any suggestions for what settings I may have missed - or further tests I can run?
Update 1
Further testing seems to show that the issue doesn't self correct, and after the General-Save fix a reboot will rebreak DNS using the Opnsense/DNSmasq DNS resolver.
For anyone wondering: I have been using another DNS server on the LAN since this issue started.
I have the same issue: https://forum.opnsense.org/index.php?topic=23342.0 (https://forum.opnsense.org/index.php?topic=23342.0)
Manually saving in 'System: Settings: General' will force whatever DNS configuration you have to re-initialize, so it sounds like DNS isn't being correctly configured and initialized properly at boot. To try and narrow down what's happening, perhaps you could try the following with Unbound.
- Configure 'Services: Unbound DNS: General'. Defaults should be fine, but the most important options are likely
- Enable: checked
- Network Interfaces: All
- Outgoing Network Interfaces: All
- Configure the system to use Unbound in 'System: Settings: General'
- DNS servers: blank
- Allow DNS server list to be overridden by DHCP/PPP on WAN: unchecked
- Do not use the local DNS service as a nameserver for this system: unchecked
Then reboot the system, immediately start a shell session and check for the following
Is unbound running?
root@OPNsense:~ # ps auxwww | grep unbound
unbound 78629 0.0 1.2 99436 49424 - Is Mon20 0:17.84 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root 60108 0.0 0.1 1060900 3196 3 R+ 21:08 0:00.00 grep unbound
If it's not running, check to see if the generated config file exists and contains something meaningful. It should look something like the following
root@OPNsense:~ # head -n 20 /var/unbound/unbound.conf
##########################
# Unbound Configuration
##########################
##
# Server configuration
##
server:
chroot: /var/unbound
username: unbound
directory: /var/unbound
pidfile: /var/run/unbound.pid
root-hints: /var/unbound/root.hints
use-syslog: yes
port: 53
[...]
If it is running, will it respond to queries?
root@OPNsense:~ # drill @127.0.0.1 google.com
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 46171
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; google.com. IN A
;; ANSWER SECTION:
google.com. 300 IN A 172.217.17.110
;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 1294 msec
;; SERVER: 127.0.0.1
;; WHEN: Sun Jun 13 21:09:59 2021
;; MSG SIZE rcvd: 44
If it will respond to queries locally, the issue will likely be with firewall rules.
Unbound is running, drill @127.0.0.1 google.com worked, and requests from outside the firewall (from LAN devices) succeeded. However update check and diagnostic DNS queries through the GUI fail still. I haven't found and rules that are blocking these DNS Queries. Doing "General - Save" then has things working correctly again.
If requests can be successfully made from the shell (i.e. the host itself) and clients on the LAN (i.e. at least one network external to the host), the problem likely isn't firewall rules. Requests made through the web interface are effectively the same as what you've just tested.
Looking at the code behind 'Interfaces: Diagnostics: DNS Lookup' (`diag_dns.php (https://github.com/opnsense/core/blob/stable/21.1/src/www/diag_dns.php)`), I can't see too many places where it could be going wrong. Perhaps `/etc/resolv.conf` is not being populated correctly. It should look like this
root@OPNsense:~ # cat /etc/resolv.conf
domain localdomain
nameserver 127.0.0.1
Interesting result:
# Generated by resolvconf
nameserver 10.x.y.z
... and this IP is from a Wireguard VPN fixed DNS IP.
"General - Save" set the values as you have described.
WireGuard has no concept of issuing DNS servers via a DHCP-like mechanism, so I'm not sure where this IP could be coming from.
`/etc/resolv.conf` ultimately gets generated by system_resolvconf_generate() (https://github.com/opnsense/core/blob/stable/21.1/src/etc/inc/system.inc#L161) which uses the various 'System: Settings: General' DNS parameters, and whether the Unbound (and/or dnsmasq) service is enabled.
Perhaps have a look through your `/conf/config.xml` file for that 10.x.y.z IP. `dnsserver` should be empty, and `dns[1-9]gw` should be `none` I.e.
<dnsserver/>
<dns1gw>none</dns1gw>
<dns2gw>none</dns2gw>
<dns3gw>none</dns3gw>
<dns4gw>none</dns4gw>
<dns5gw>none</dns5gw>
<dns6gw>none</dns6gw>
<dns7gw>none</dns7gw>
<dns8gw>none</dns8gw>
WireGuard 'servers' have a `<dns/>` key which should probably be empty; I can't see a way for this value to be populated through the webui.
It was through going through the config.xml that I found the IP address under Wireguard
<dns>10.x.y.z</dns>
this is configured on the Advanced enabled GUI of the WireGuard Local Configuration screen.
For some reason it seems to take the WireGuard DNS setting to populate resolv.conf when rebooted. But then takes the General DNS settings when I use General - Save.
At least we got to the bottom of this. As to why the logic works like this, I have no idea;
Perhaps open an issue (https://github.com/opnsense/core/issues) pointing back to this thread. At the very least it will hopefully attract the attention actual OPNsense developers.
When WG is first enabled or restarted wg-tools overwrites resolv.conf. Then you overwrite it again by saving the general settings. Better not to define DNS in your WG config