Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - RutgerDiehard

#1
I've spent a lot of time following the docs to set up ISP IPv6 PD + dnsmasq + DHCP6 which has been successful apart from one issue. All IPv6 hosts are registered in dnsmasq as host.home.arpa rather than the domain name that's listed in the DHCP6 range in dnsmasq.

I know there's been a lot of dicussions on dnsmasq recently and some fixes have been created for other issues. Is this a known issue or is it something new?

I can provide further information on setup if required.
#2
Quote from: Drinyth on May 21, 2025, 12:47:47 PM
Quote from: meyergru on May 21, 2025, 11:47:22 AMI can create a band-aid or manually configured variant, that works for me, as well, but I think normal users should have an option that is supported via the GUI.

For what it's worth, I'm using AGH that is listening on port 53 and forwarding queries for local and reverse domains to dnsmasq running on a different port. So similar to having unbound running on port 53 and handling everything non-local. AGH can forward to upstream providers using DoT or DoH. It is also available to configure via the GUI.

Granted, it's not part of the default opnsense offering and one has to add Mimugmail's repo to enable it.

I am also using AGH running on 53. So I thought this will be a simple enough fix to bypass Unbound and use AGH to forward local domains and rDNS queries to Dnsmasq to resolve while everything else goes out over DoT (or even DoH).

Unfortunately, even when configured correctly (I think), there is still lookup timeouts with nslookup. The only thing that stops it is to add a DNS entry (1.1.1.1) in System > Settings > General. As soon as that's added, 1.1.1.1:53 traffic starts appearing in the firewall logs.

Whilst I originally thought this was only Windows boxes that had the issue, I've seen the same on Linux (Ubuntu Server 24.04 LTS) also.

Taking another look at the configuration in System > Settings > General, there is an option to NOT use the local DNS service as the nameserver for this system. It was initially unchecked so I removed 1.1.1.1 in the DNS servers list and checking this box to not use local DNS.

With AGH configured as above, no delays were noticed on any client running nslookup and no traffic to 1.1.1.1:53 was observed in the firewall logs - as to be expected.

Is anybody able to test this when Unbound is providing DNS duties.
#3
I can now confirm all timeouts for nslookup has now been resolved by adding a single DNS server (1.1.1.1) in System > Settings > General without Use System Nameservers checked in Unbound DNS Query Forwarding.

I can see in the firewall logs outbound 1.1.1.1:53 connections from the WAN address in addition to DoH connections that are configured in Unbound. If I create a firewall rule to block :53, the nslookup timeouts return.

I removed all Query Forwarding rules and included them in a /usr/local/etc/unbound.opnsense.d/local.conf file as per https://github.com/opnsense/core/issues/7639. This did not work; I recieved SERVFAIL errors for every query sent for internal domains so this doesn't look like a solution - for my installation anyway.

As @OPNenthu mentioned, I've only started to experience this because I moved to multiple internal domains by following the Dnsmasq migration guide.
#4
I'm seeing exactly the same symptoms; DNS timeouts for internal and external addresses.

I do have a slightly different configuration though in that I use Adguard Home running on OPNsense for all DNS queries (port 53). This is then passed on to Unbound running on port 65353 as an upstream DNS server with Private Reverse DNS Server configured to point to Dnsmasq. I've followed the instructions exactly including Dnsmasq running on port 53053 and created the necessary query forwards for all internal domains and reverse lookups.

However, nslookup gives timeouts for every query - even for the same query.

One thing I have noticed though is that the timeouts only happen on Windows devices. I can run nslookup on a Macbook and on Linux and don't see any timeouts, just an instant response.
 
#5
Great explanation, many thanks @meyergru.

One thing I can't find for Unbound is:

Quote from: meyergru on May 20, 2025, 02:13:13 PMYou also tell it to "Do not forward private reverse lookups".

Please tell me where this setting is.
#6
Update:

Zenarmor Support has replied:

" We have identified some issues with IPDRStreamer and addressed them in version 1.18.4, which is scheduled for release next week."

I have been given access to the beta version to test and so far, it's fixed the high IO and IPv4 issues. CPU is raised slightly from pre-update levels, but I'll wait and see if the official release has this.

Many thanks to the quick response from Zenarmor Support and access to the fix.
#7
Thanks sy, report sent including logs, configuration and OPNsense configuration.
#8
But what is confusing is that this is not reflected in the interface stats. Does this mean it's OPNsense only and not something on the network?

I've not seen anything like this before.

#9
And SYN Cookies
#10
This is also reflected in Handshake issues
#11
I've spent some time looking at this and have come across something that's concerning. Checking the other Netdata graphs at the time the high disk writes occur, I can see similar patterns for IPv4 Bandwidth usage.
#12
Another screenshot showing Elasticsearch as the culprit
#13
Java process screenshot
#14
Following the recent update, disk and CPU usage has increased significantly which appears to be caused by elasticsearch and the Java process.

I've run a health check on the settings database and run a check on the database indexes which both completed successfully. I also increased the memory disk size to 500MB. I've even completely reset the reporting database but nothing has helped.

If I restart the Zenarmor filter engine, the system behaves itself for a while but then CPU load increases and disk IO - specifically write operations - goes through the roof.

The screenshots show IO after the latest update was applied which was around 10:00am mark and the Java process hitting the disk.

This does seem to be update related as the system has been rock solid with very little load up to this point.





#15
Many thanks sy, that worked.

I can now select the Zenarmor widget and it displays in the dashboard correctly.