Either Unbound or the latest patch (23.7.xxxx) broke my connection

Started by lar.hed, November 02, 2023, 12:51:02 AM

Previous topic - Next topic
Now I have read the threads that is about latest, and well, I might have something else, but strange it is anyway:

This morning started with my KEF LSX2 Speakers not having any connection to internet. I was a bit tired so I just pulled the powecord on the main speaker, and hardbooted so to say the speakers - everything worked again.

Then I decided later on the day to do upgrade to latest and greatest - and then everything went i the wrong direction. Slowly...

First my LAN port (I have 8 ports on my OPNsense firewall hardware, bare metal so nothing in between) stoped working. Well I could connect to my OPNsense box, and my Home Assistant - they are both IP adresses...

Then my WAN droped somwhow so my dual wan took over (LTE).
Then my server droped.
And my WiFI (Unifi AP).
And then my Home Assistant.
And finaly my LTW WAN backup.

Well, I could still use IP adresses - so I could do some stuff but it was not working as it used to.

After reading, on my mobile (this got to be the first time I loved to have a folding phone!) I read up on OPNsense issues here in this forum, and decided ye let's reinstall and apply config backup. So I did. This was becuase of other reports. But do note two things: a) the IP adress resolution seemed to work (one could enter 1.1.1.1 and get that web page) and b) my firewall hardware was VERY hot, something was runnig maxed out and I could not figure out what it was, since well, the Dashboard seemed just fine...

...until later this evening when service after service also started to fail - when I did the reinstall I think there was 8 red boxes on services that stoped. Something was waaay of.

Now I did not prepp for this scenario so I had to download everything. But reinstall I did. 23.7.

And no DNS service STILL. IP worked, and my hardware was once again cool.

So I just decided let's try the DNS Masq version instead of Unbound DNS.

And now everything is back to normal I think - I will have to check tomorrow and so on.

But the thing I would like to share here is:
1) Double check that IP adress like 1.1.1.1 (which is a web page) might work - then look at the DNS solution one has choosed, and change just for the sake of testing to one of the others.
2) Be a bit reserved on the latest patch. This got very bad after upgrade, however it seems to be Unbound that might overwrite something, since it seems to kill port after port slowly....

And if I am wrong in anything above, well then I do appologize in advanced - this is how it behaved for me this evening, and I did lack the energy to debug the crap out of it.

Same happening to me when (I think i upgraded from 23.7.3 or 4 to 23.7.7_1, then 23.7.7_3) on OPNsense 23.7.7_3. After that random stopping/crashing of dns (unbound) and had to switch to Dnsmasq and add DNS Servers under Settings -> General -> DNS Servers to make my router work again.

When I remember correctly a ping works, which means it definitely looks like a DNS issue...eventually only with DoT users with DNSSec enabled or also people without it? I use DoT to quad9 (9.9.9.9 and 149.112.112.112) but not sure if this is relevant. Can someone confirm that also stops and crashes without DoT and DNSSec enabled?

No clue where to look for as It does not look like Unbound DNS throws any errors.

I also did not find a matching issue report on

Another thread maybe following the same issue might be the one here: https://forum.opnsense.org/index.php?topic=35527.75

Would be realy nice to have at least a clue about the progress or what the cause could be...

Yes I was running DoT (Unbound), but turned it off. No difference. Since I also run DNSSEC I guess I should have tested to turn that off also, I did not (to tired).

I just compared an old (2023-08-07) config xml file with new one from last night when all problems started, when I still had Unbound DNS enabled. A few things noted:

1) There is a section in old that is missing in the new one:
  <unbound>
    <enable>1</enable>
    <custom_options/>
    <dnssec>1</dnssec>
    <regdhcp>1</regdhcp>
    <regdhcpstatic>1</regdhcpstatic>
    <stats>1</stats>
  </unbound>


2) Unbound DNS was 1.0.4 on old version and 1.0.8 on the new. And it is a lot more on the new version under unboundplus section.

Here is the OLD one (yes at this time I had only two DoT servers defined):
    <unboundplus version="1.0.4">
      <service_enabled/>
      <advanced>
        <hideidentity>0</hideidentity>
        <hideversion>0</hideversion>
        <prefetch>0</prefetch>
        <prefetchkey>0</prefetchkey>
        <dnssecstripped>0</dnssecstripped>
        <serveexpired>0</serveexpired>
        <serveexpiredreplyttl/>
        <serveexpiredttl/>
        <serveexpiredttlreset>0</serveexpiredttlreset>
        <serveexpiredclienttimeout/>
        <qnameminstrict>0</qnameminstrict>
        <extendedstatistics>0</extendedstatistics>
        <logqueries>0</logqueries>
        <logreplies>0</logreplies>
        <logtagqueryreply>0</logtagqueryreply>
        <logverbosity>1</logverbosity>
        <privatedomain/>
        <privateaddress>0.0.0.0/8,10.0.0.0/8,100.64.0.0/10,169.254.0.0/16,172.16.0.0/12,192.0.2.0/24,192.168.0.0/16,198.18.0.0/15,198.51.100.0/24,203.0.113.0/24,233.252.0.0/24,::1/128,2001:db8::/32,fc00::/8,fd00::/8,fe80::/10</privateaddress>
        <insecuredomain/>
        <msgcachesize/>
        <rrsetcachesize/>
        <outgoingnumtcp/>
        <incomingnumtcp/>
        <numqueriesperthread/>
        <outgoingrange/>
        <jostletimeout/>
        <cachemaxttl/>
        <cacheminttl/>
        <infrahostttl/>
        <infracachenumhosts/>
        <unwantedreplythreshold/>
      </advanced>
      <dnsbl>
        <enabled>1</enabled>
        <type>bla0,bla,blc,bld,blf,blf0,blg,blm,blp,blp0,blp1,blr,blr0,bls,blt,blt0,blt1,el,ep,nc,pt,sa,st,sb,ws,yy</type>
        <lists>https://raw.githubusercontent.com/larhedse/hostnamelistan/master/BaraHostLista.txt</lists>
        <whitelists/>
        <address/>
        <nxdomain>0</nxdomain>
      </dnsbl>
      <forwarding>
        <enabled>0</enabled>
      </forwarding>
      <dots>
        <dot uuid="9dc79fd6-5c5e-41bc-b193-f94b5cb007bc">
          <enabled>1</enabled>
          <type>dot</type>
          <domain/>
          <server>1.1.1.3</server>
          <port>853</port>
          <verify>cloudflare-dns.com</verify>
        </dot>
        <dot uuid="d9de93d8-a4ed-4283-b44b-aea29794de07">
          <enabled>1</enabled>
          <type>dot</type>
          <domain/>
          <server>1.1.1.2</server>
          <port>853</port>
          <verify>cloudflare-dns.com</verify>
        </dot>
      </dots>
      <hosts/>
      <aliases/>
      <domains/>
    </unboundplus>


And then the NEW one with all the extras:
    <unboundplus version="1.0.8">
      <general>
        <enabled>1</enabled>
        <port>53</port>
        <stats>1</stats>
        <active_interface/>
        <dnssec>1</dnssec>
        <dns64>0</dns64>
        <dns64prefix>64:ff9b::/96</dns64prefix>
        <noarecords>0</noarecords>
        <regdhcp>1</regdhcp>
        <regdhcpdomain/>
        <regdhcpstatic>1</regdhcpstatic>
        <noreglladdr6>0</noreglladdr6>
        <noregrecords>0</noregrecords>
        <txtsupport>0</txtsupport>
        <cacheflush>0</cacheflush>
        <local_zone_type>transparent</local_zone_type>
        <outgoing_interface/>
        <enable_wpad>0</enable_wpad>
      </general>
      <advanced>
        <hideidentity>0</hideidentity>
        <hideversion>0</hideversion>
        <prefetch>0</prefetch>
        <prefetchkey>0</prefetchkey>
        <dnssecstripped>0</dnssecstripped>
        <serveexpired>0</serveexpired>
        <serveexpiredreplyttl/>
        <serveexpiredttl/>
        <serveexpiredttlreset>0</serveexpiredttlreset>
        <serveexpiredclienttimeout/>
        <qnameminstrict>0</qnameminstrict>
        <extendedstatistics>0</extendedstatistics>
        <logqueries>0</logqueries>
        <logreplies>0</logreplies>
        <logtagqueryreply>0</logtagqueryreply>
        <logservfail>0</logservfail>
        <loglocalactions>0</loglocalactions>
        <logverbosity>1</logverbosity>
        <valloglevel>0</valloglevel>
        <privatedomain/>
        <privateaddress>0.0.0.0/8,10.0.0.0/8,100.64.0.0/10,169.254.0.0/16,172.16.0.0/12,192.0.2.0/24,192.168.0.0/16,198.18.0.0/15,198.51.100.0/24,203.0.113.0/24,233.252.0.0/24,::1/128,2001:db8::/32,fc00::/8,fd00::/8,fe80::/10</privateaddress>
        <insecuredomain/>
        <msgcachesize/>
        <rrsetcachesize/>
        <outgoingnumtcp/>
        <incomingnumtcp/>
        <numqueriesperthread/>
        <outgoingrange/>
        <jostletimeout/>
        <cachemaxttl/>
        <cachemaxnegativettl/>
        <cacheminttl/>
        <infrahostttl/>
        <infrakeepprobing>0</infrakeepprobing>
        <infracachenumhosts/>
        <unwantedreplythreshold/>
      </advanced>
      <acls>
        <default_action>deny</default_action>
      </acls>
      <dnsbl>
        <enabled>1</enabled>
        <safesearch>0</safesearch>
        <type>bla0,bla,blc,bld,blf,blf0,blg,blm,blp,blp0,blp1,blr,blr0,bls,blt,blt0,blt1,el,ep,nc,pt,sa,st,sb,ws,yy</type>
        <lists>https://raw.githubusercontent.com/larhedse/hostnamelistan/master/BaraHostLista.txt</lists>
        <whitelists/>
        <blocklists/>
        <wildcards/>
        <address/>
        <nxdomain>0</nxdomain>
      </dnsbl>
      <forwarding>
        <enabled>0</enabled>
      </forwarding>
      <dots>
        <dot uuid="9dc79fd6-5c5e-41bc-b193-f94b5cb007bc">
          <enabled>1</enabled>
          <type>dot</type>
          <domain/>
          <server>1.1.1.3</server>
          <port>853</port>
          <verify>cloudflare-dns.com</verify>
        </dot>
        <dot uuid="d9de93d8-a4ed-4283-b44b-aea29794de07">
          <enabled>1</enabled>
          <type>dot</type>
          <domain/>
          <server>1.1.1.2</server>
          <port>853</port>
          <verify>cloudflare-dns.com</verify>
        </dot>
        <dot uuid="2fef985d-b3a5-42fc-9665-3bca6e5bee6b">
          <enabled>1</enabled>
          <type>dot</type>
          <domain/>
          <server>9.9.9.9</server>
          <port>853</port>
          <verify>dns.quad9.net</verify>
        </dot>
        <dot uuid="8e309da4-9864-4442-ba6f-76c7d409109c">
          <enabled>1</enabled>
          <type>dot</type>
          <domain/>
          <server>149.112.112.112</server>
          <port>853</port>
          <verify>dns.quad9.net</verify>
        </dot>
      </dots>
      <hosts/>
      <aliases/>
      <domains/>
    </unboundplus>


I guess the next thing for me to do is to replace the NEW unbound stuff with the OLD ones to see if that works. I would say that for the moment the OPNsense update is breaking Unbound...

Also, running 23.7 means I can not install any plugins since they ALL "23.7.7_3 is required." - How funny is that?

sorry to interfere. I wouldn't replace new config with old one after an update/upgrade of the software/application that uses it. It's normal to have different configs from one version to another.
You really need to diagnose the setup that post update doesn't seem to work correctly, if there is time or rollback but not a partial rollback that will just make it worse (most likely).

I agree. My fault. It was logic in my head, but reality is exactly like you say (write).

So I upgraded to latest again, but NOT Unbound enabled - I need to see and understand what destroyed Unbound before I enable Unbound again.

For the moment everthing works - well except for Unbound of course.

OK. First I'd do a checkconfig Unbound-checkconf as a very basic sanity check.That tells you it won't bomb out and the configuration of itself is OK.
Then you need to look around it. What rules are in place in the firewall that might be problematic.
Frankly shouldn't be a problem from one minor OPN version to the next.
Any chance of diagraming your setup? See, "my wan/lan/ect dropped" doesn't give anything to work with :)
You'd want to consider when it happens, drop to a shell on the affected client, do dig or nslookup requests and follow the packet on the firewall live session with adequate logging set or a packet capture.

For what it is worth: I have not touched Unbound for "ages" - that is it has been running just fine for months. At least around 6 months...

The ONLY thing I have added about two months ago was "IoT" interface, which is VLAN for IoT stuff running over my Unifi AP. And that has nothing in my mind to do with DNS at all.

And this is a bit why I "blamed" the upgrade at first - nothing has really changed lately....

For anyone reading up on my issue: Unbound seems to break when upgrade to 23.7.7.x. Unbound worked perfect before latest and greatest - and now it just don't. I am not sure when I did the latest upgrade before 23.7.7 so I can not say exactly which level broke Unbound. But something sure did.


I don't know, and for the moment I hesitate to even try - I would love to see that my OPNsense setup (with DNSmasq) works for more than 24h at least before I try anything else. To get proof kind-of that it is/was Unbound that killed it self so to speak....


Okay, after close to 40 hours or so, my connection was lost again. However, DNSmasq was still in use so no reference to Unbound DNS. And this time I lost all - could only get partial connection on outside (some web pages loaded part of page - and then everything just stoped). Like last time it kicked in on the first port on my firewall, and all other worked. This time I just did a reboot direct to see if / what - and now it is back online. For how long I do not know. What I do know is that DNS seems not to be involved.

I would love to be able to rollback to an older version, and not to be forced to use latest, since latest does not work.

Where can I download 23.7.5 or .6 - I would like to be able to validate something.... I just found a somewhat odd error in one of all logfiles, but well I would like to be able to separate things out....

Okay I have (at least) two different issues here....

Unbound DNS stopped working after upgrade - I will return to this later....
link down/up = no connection at all. This issue I will write about in a separate thread so I can handle them better...