DNS resolv timeout

Started by neopard, January 28, 2025, 12:40:16 PM

Previous topic - Next topic
 have a network connected to an OPNsense firewall, which in turn is the sole Ethernet device connected to a SOHO modem (which, unfortunately, I must keep because it supports a VoIP telephone line).

The SOHO modem has very limited configurability; its only role is to forward all ports (1-65535) to the IP address of OPNsense (the only connected device).

OPNsense (version 24.7.12_2-amd64; FreeBSD 14.1-RELEASE-p6; OpenSSL 3.0.15, just updated, but the issue was present before the update) is currently configured with UnboundDNS.

Under System → Settings → General, the DNS server is set to 127.0.0.1.
The "Do not use the local DNS service as a nameserver for this system" option is ticked, while "Allow DNS server list to be overridden by DHCP/PPP on WAN" is not.

UnboundDNS is enabled, and the only selected option in its configuration is "Aggressive NSEC"
DNS over TLS is also enabled using Cloudflare's servers (1.1.1.1 on port 853, hostname one.one.one.one, etc.).

Clients are configured with automatic DHCP and receive the OPNsense interface as their DNS server.

With this setup, the network initially works fine, but after a few hours, users start reporting issues such as web pages failing to load and very slow browsing performance. The problem appears to be related to DNS requests failing, as ongoing downloads (already initiated) continue at normal speeds.

More specifically, most DNS queries time out (though not all). For instance, performing DNS resolutions will fail for the majority of queries, while a few succeed with normal response times.

The issue sometimes resolves itself after a few minutes but can persist for hours.


I performed various tests using different tools:

    From Linux:
    From Windows:
    • (c) nslookup google.com
    • (d) nslookup google.com 1.1.1.1
    • (e) Using the DNSJumper.exe tool to query multiple DNS servers directly.
    From OPNsense itself
    • (f) from Web gui Interfaces->Diagnostic->DNS resolves
    • (g) from root ssh shell with hostname command
    • From what I understand:

    In cases (a) and (c), the requests are routed through UnboundDNS on OPNsense.
    In cases (b), (d), and (e), the requests are made directly to external DNS servers (e.g., Cloudflare, Google, OpenDNS, etc.).
    When the issue occurs, even queries that bypass OPNsense and are sent directly to external DNS servers experience 70-90% timeouts. On DNSJumper.exe, most of the queried servers show as red (timeout), while the remaining servers respond within 10-40ms.

    the problem is the same also if the request are done in Interfaces -> diagnostic ->

    I tried connecting a PC directly to the SOHO router (in parallel with OPNsense, using a second Ethernet port on the modem). On this PC, the issue never occurs, even when the problem is active on the rest of the network. Multiple tests confirmed that DNS queries from this directly connected PC were successful with no timeouts.
    So, in some way, this make me think the problem is in OPNsense configuration.

    I have tried the following alternatives, but none made any difference:

    • -Disabling UnboundDNS and setting the DNS servers directly under System → Settings → General.
    • -Using UnboundDNS without DNS over TLS.
    • -Manually configuring DNS servers on individual PCs.
    • -Adding a firewall rule to allow all traffic to and from 1.1.1.1, 8.8.8.8, and 8.8.4.4.

    I also attach the image with last page of log of Unbound DNS



    Question:
    What can I do to understand the cause of the problem next time that it will present itself?

    Thanks

I don't know if it is pertinent but i'm out of ideas...

traceroute to a DNS server, when everything works correctly has no asterisks, when there are DNS timeouts there are.
BUT, pinging an internet IP works without problems

Again, if at the same time I try to resolv, trace  etc from a PC directly connected to the SOHO router (as opposed to one connected through OPNsense) I found no problems.