Either Unbound or the latest patch (23.7.xxxx) broke my connection

Started by lar.hed, November 02, 2023, 12:51:02 AM

Previous topic - Next topic
Nope - no matter what I do Unbound just refuses to resolve any names.

I disabled DNSSEC
And then disabled DoT
And then disabled BlockLists

It seems to be ghost in the system when it can not even work empty so to speak.

Is there any easy way to kind-of-restore-default to only Unbound settings, like hacking the config file and just remove everything - is that doable to make a "cold reset" so to speak so I know that I am not fighting anything old that just "hangs-around-the-corner-that-I-forgett-or-can-not-see" thing?

Edit: Worth noting in all this: This config Used to work, and did so for what 6 months since last change (and it worked before that to). So I find it very strange that it all of a sudden, just because of upgrade of OPNsense, and most likly Unbound, it just stops resolving names. It is at least one ghost in the system......

I don't have a problem with my DoT on Unbound and I am still on OPN ver 23.1.11_2 but looking at these last posts in this thread makes me wonder if people are putting a problematic mix of resolvers in Unbound.
Especifically DNSSEC:
Cloudflare only mentions DNSSEC on their main resolver 1.1.1.1 https://developers.cloudflare.com/1.1.1.1/encryption/dnskey/. Encryption is mentioned on two ips only.
Quad9 mentions that a DNSSEC auth failure will be reported by their resolver as SERVFAIL just as an inexistent one, and a way to differentiate https://docs.quad9.net/FAQs/
In other words, if you have a mix of capable/uncapable DNSSEC upstream resolvers, Unbound might not be able to work properly. Just a thought, I haven't had a need to diagnose, but aligment of those upstreams might be helpful in this thread.

Okey, well, I have found what was error in my configuration, and it is not what I expected....

Short "executive summery" is: Under Services, Unbound, Access Lists there is a "Default Action" drop down selection. This was for some fuggly reason selected to BLOCK - that was the error in my case. Replaced with ALLOW - problem gone, and now I am running Unbound, DoT, DNSSEC and Block Lists. And maybe something more.

The thing I can not get my head around is WHY that was changed in my config. I just can not think I have changed that my self.

So I went back in my config files, to one from 2023-08-07 - before the upgrade thingi started. There are NO records in my config backup file at all about Access Lists. So clearly, until someone points me in the right direction, I have to think that this BLOCK selection was part of the upgrade process - and not something I did or have done. If this is how the upgrade "disabled" Unbound, then this is the Ghost In the System. My current guess is that it is.

Sorry, but I spoke to soooooooooon....

Yes Unbound works, but it pins one (of 8 ) cores to 100% load all the time, no matter what. Something is running hard in that Unbound process....

So I am back on DSN Masq....

And I am back on Unbound DNS without DoT.

Disabling DNSSEC did not work, it stopped working again after a very short period of time with DoT enabled and using Quad9. I probably will try cloudflare to eliminate the fact that quad9 itself can be the issue, but I doubt it.

However it is not absolutely clear how to disable DNSSEC at all for me except from the flag under Services -> Unbound -> General -> Enable DNSSEC Support (uncheck and apply) and Services -> Unbound -> Advanced -> Harden DNSSEC Data (uncheck and apply), but I think thats it.

I have OPNsense 23.7.9 installed (latest as for now) with unbound 1.19.0. No change in issue.

My unbound config (as is configured over UI), but probably not very helpful and DoT disabled!

/var/unbound/unbound.conf

##########################
# Unbound Configuration
##########################

##
# Server configuration
##
server:
chroot: /var/unbound
username: unbound
directory: /var/unbound
pidfile: /var/run/unbound.pid
root-hints: /var/unbound/root.hints
use-syslog: yes
port: 53
include: /var/unbound/advanced.conf
harden-referral-path: no
do-ip4: yes
do-ip6: yes
do-udp: yes
do-tcp: yes
do-daemonize: yes
so-reuseport: yes
module-config: "python iterator"
num-threads: 4
msg-cache-slabs: 8
rrset-cache-slabs: 8
infra-cache-slabs: 8
key-cache-slabs: 8




# Interface IP(s) to bind to
interface: 0.0.0.0
interface: ::
interface-automatic: yes



# Private networks for DNS Rebinding prevention (when enabled)
private-address: 0.0.0.0/8
private-address: 10.0.0.0/8
private-address: 100.64.0.0/10
private-address: 169.254.0.0/16
private-address: 172.16.0.0/12
private-address: 192.0.2.0/24
private-address: 192.168.0.0/16
private-address: 198.18.0.0/15
private-address: 198.51.100.0/24
private-address: 203.0.113.0/24
private-address: 233.252.0.0/24
private-address: ::1/128
private-address: 2001:db8::/32
private-address: fc00::/8
private-address: fd00::/8
private-address: fe80::/10


# Private domains (DNS Rebinding)
include: /var/unbound/private_domains.conf

# Static host entries
include: /var/unbound/host_entries.conf

# DHCP leases (if configured)


# Custom includes
include: /var/unbound/etc/*.conf



python:
python-script: dnsbl_module.py

remote-control:
    control-enable: yes
    control-interface: 127.0.0.1
    control-port: 953
    server-key-file: /var/unbound/unbound_server.key
    server-cert-file: /var/unbound/unbound_server.pem
    control-key-file: /var/unbound/unbound_control.key
    control-cert-file: /var/unbound/unbound_control.pem


And another "fun" fact is when DNS stops working when DoT is enabled.

I do not get any more packages over WAN when sniffing via


tcpdump -i igb1 'port 853' # WAN DoT


But i still get packages via


tcpdump -i igb0 'port 53' # LAN DNS


And Unbound Service still seems to be running as a process or is clearly visible in the GUI as a still running service...

And also my overrides list to my internal apps is still working, which means unbound generally works except for name resolution to the world wide web :o

In graphics this means


WWW <--DoT:853--> Unbound (DoT) <-x-BROKEN?-> Unbound (DNS) <--DNS:53-WORKS--> Lokal Clients
  |                                                      |
  -------------------- DNS:53-WORKS ----------------------

FWIW....

After I applied the patch that franco requested to be tested: https://forum.opnsense.org/index.php?topic=37243

I had no issues with Unbound - this is a bit unexpected, but I am happy anyway. My problems seems to be solved (and now I will run away and grab that egg timer until next challenge presents itself!).

@lar.hed Thank you for pointing this out.

Quote
After I applied the patch that franco requested to be tested: https://forum.opnsense.org/index.php?topic=37243

Glad it solves your issue 👍. I will probably wait for the new OPNsense release which hopefully will include this. Had no chance to apply the patch.

Right....

I can now say I still have this problem with Unbound. It just hanged at 100% on one core (of 8). I had to kill-9 to get it to release and stat to behave again. Trying restart the process from GUI just do not work.

No I do not know what is wrong... I would love to build a monit watchdog so kill-9 and restart when it hits CPU above 99% on one core.... Anyone know how to write such a thing?

Hmm anyone know how I can interpret this error:

Quote2023-12-03T19:58:27   Error   unbound   [24652:3] error: reading root hints /root.hints 2:6: Syntax error, could not parse the RR's type

Quote from: lar.hed on November 02, 2023, 11:22:54 AM
For anyone reading up on my issue: Unbound seems to break when upgrade to 23.7.7.x. Unbound worked perfect before latest and greatest - and now it just don't. I am not sure when I did the latest upgrade before 23.7.7 so I can not say exactly which level broke Unbound. But something sure did.


Hello i have the same issue with Opnsense 23.7. How can i fix the problem with unbound?

It´s al little bit strange. If i use only the default LAN unbound works perfect.

If i create a new network/VLAN on Opnsense with the same rules like default LAN unbound crashs.
I have only one LAN Interface and connect a new network/VLAN via this Interface.

Here are Some reports from unbound:

2024-01-28T10:46:57   Critical   unbound   [16075:3] fatal error: Could not initialize thread   
2024-01-28T10:46:57   Critical   unbound   [16075:0] fatal error: Could not initialize main thread   
2024-01-28T10:46:57   Error   unbound   [16075:0] error: Could not set root or stub hints   
2024-01-28T10:46:57   Error   unbound   [16075:0] error: reading root hints /root.hints 24:4: Syntax error, could not parse the RR's TTL   
2024-01-28T10:46:57   Error   unbound   [16075:3] error: Could not set root or stub hints   
2024-01-28T10:46:57   Error   unbound   [16075:3] error: reading root hints /root.hints 2:8: Syntax error, could not parse the RR's type   
2024-01-28T10:46:37   Critical   unbound   [69178:4] fatal error: Could not initialize thread   
2024-01-28T10:46:37   Warning   unbound   [69178:1] warning: root hints /root.hints:29 skipping type A   
2024-01-28T10:46:37   Error   unbound   [69178:4] error: Could not set root or stub hints   
2024-01-28T10:46:37   Error   unbound   [69178:4] error: reading root hints /root.hints 2:11: Syntax error, could not parse the RR's type   
2024-01-28T10:45:21   Critical   unbound   [91266:1] fatal error: Could not initialize thread   
2024-01-28T10:45:21   Error   unbound   [91266:1] error: Could not set root or stub hints   
2024-01-28T10:45:21   Error   unbound   [91266:1] error: reading root hints /root.hints 2:17: Syntax error, could not parse the RR's type   
2024-01-28T10:43:48   Critical   unbound   [32545:2] fatal error: Could not initialize thread   
2024-01-28T10:43:48   Critical   unbound   [32545:0] fatal error: Could not initialize main thread   
2024-01-28T10:43:48   Error   unbound   [32545:2] error: Could not set root or stub hints   
2024-01-28T10:43:48   Error   unbound   [32545:0] error: Could not set root or stub hints   
2024-01-28T10:43:48   Error   unbound   [32545:0] error: reading root hints /root.hints 28:30: Syntax error, could not parse the RR's class

Regards
Kevin

I think I will quote myself from two other threads where this has been an standing challenge, which for the moment seems to be under control untill I find another challenge (16 days withut problem so far - do note the Monit scripts to help out if somethings happens anyway):

Quote from: lar.hed on January 26, 2024, 05:20:13 PM
Hi @Fright!

Thanks for helping out.

I can add this that I wrote in the other Unbound thread:
Quote from: lar.hed on January 23, 2024, 10:44:08 AM
I need to be more precis I think...

So, my current setup is OPNsense 23.7.11-amd64.

On this I have the two patches earlier referenced:
opnsense-patch a086f40b
opnsense-patch 845fbd384fe


The I have removed a two plugins: mDNS and IGMP Proxy - and is only running UDP Broadcast Relay: https://forum.opnsense.org/index.php?topic=38114.0

Also, since in my case there seem to be some kind of connection to IP adress changes or something I decided to uncheck "Register DHCP Leases" and "Register DHCP Static Mappings".

So in all 6 changes. I can not say that each change has anything to do with this challenge I have with Unbound, however, the changes above has made Unbound stable from 100% CPU Bound. Which one I would vote for? Patches all day long....

I have had one Unbound stop which I have no reference to why. Monit restarted Unbound directly and since I'm not at home where the OPNsense is installed, I have not been able to check anything....

I have not had any more 100% CPU on one core since I changed the above. Currently I do not know exactly which one that is most likely to have solved this. Although I have to say that removing the extra plugins should not be the reason....

Quote from: lar.hed on January 28, 2024, 11:22:10 AM
I think I will quote myself from two other threads where this has been an standing challenge, which for the moment seems to be under control untill I find another challenge (16 days withut problem so far - do note the Monit scripts to help out if somethings happens anyway):

Quote from: lar.hed on January 26, 2024, 05:20:13 PM
Hi @Fright!

Thanks for helping out.

I can add this that I wrote in the other Unbound thread:
Quote from: lar.hed on January 23, 2024, 10:44:08 AM
I need to be more precis I think...

So, my current setup is OPNsense 23.7.11-amd64.

On this I have the two patches earlier referenced:
opnsense-patch a086f40b
opnsense-patch 845fbd384fe


The I have removed a two plugins: mDNS and IGMP Proxy - and is only running UDP Broadcast Relay: https://forum.opnsense.org/index.php?topic=38114.0

Also, since in my case there seem to be some kind of connection to IP adress changes or something I decided to uncheck "Register DHCP Leases" and "Register DHCP Static Mappings".

So in all 6 changes. I can not say that each change has anything to do with this challenge I have with Unbound, however, the changes above has made Unbound stable from 100% CPU Bound. Which one I would vote for? Patches all day long....

I have had one Unbound stop which I have no reference to why. Monit restarted Unbound directly and since I'm not at home where the OPNsense is installed, I have not been able to check anything....

I have not had any more 100% CPU on one core since I changed the above. Currently I do not know exactly which one that is most likely to have solved this. Although I have to say that removing the extra plugins should not be the reason....

I have installed opnsense Version 23.7.12 with Monit it looks like unbound service is now permanently online  (But it´s a workaround). But DNS within VLANs doesn´t work (local Network & WAN). On my VLANs i can connect to every client with IP but not DNS Name. I can ping 1.1.1.1 and 8.8.8.8.
If i connect via SSH to my opnsense i try a ping to my local clients but only IP works. But a ping to google works with DNS Name....


It´s a little bit strange i have installed a fresh Opnsense Version 23.7 and everything works fine with the default LAN Interface and same rules. I have only one LAN Interface. If i create new VLANs/Network and connect them to my singel NIC (Default LAN interface) it seems to be unbound crashed.
I have the same rules for every VLAN/Network like the default LAN Network.


Regards
Kevin

If you have unchecked the "Register DHCP Leases" and "Register DHCP Static Mappings" - then DNS name resolution on your intranet will not work.

I have installed Version 24.1 with the same configuration and now it seems to be fine.
Unbound doesn´t crash since 1 day.

Quote from: lar.hed on January 29, 2024, 03:36:47 PM
If you have unchecked the "Register DHCP Leases" and "Register DHCP Static Mappings" - then DNS name resolution on your intranet will not work.