So I was happy that this version (24.1) had somehow magically solved the issue with 100% CPU Unbound on one core - it has NOT been solved. It is the same as before:
2024-02-14T15:19:55 Critical unbound [71574:6] fatal error: Could not initialize thread
2024-02-14T15:19:55 Error unbound [71574:6] error: Could not set root or stub hints
2024-02-14T15:19:55 Error unbound [71574:6] error: reading root hints /root.hints 2:8: Syntax error, could not parse the RR's type
So I still need to run "kill -9 <unbound pid>" to kill the process and then restart. It is just in 23.7 - no no difference at all. And this is after 5 days and a bit over 5 hours - however the thing I have to say is that this one could have been kicked of by me starting my PC which is connected direct / native to one of the interfaces. If that can be of any help. Oh and the roots.hint file was/is identical - it is not what is inside the file that is trouble, it is the access to it maybe? as in locked for another process to read it?
Edit: running bare metal with OPNsense 24.1.1-amd64
You could try to use truss or ktrace on the process to find out what the heck it is doing ;)
Thanks - I will keep that in mind to next time, which we all of course wish will never happen, but then again, where is that egg timer?
https://forum.opnsense.org/index.php?topic=37973.msg188912#msg188912
Quote from: Fright on February 14, 2024, 06:36:29 PM
https://forum.opnsense.org/index.php?topic=37973.msg188912#msg188912
Quote from: Fright on February 04, 2024, 08:16:00 AM
still root-hint file read error?
best i can offer for now:
https://github.com/opnsense/core/commit/2e2294c0642cdc537cccd785464059edea4948a6
opnsense-patch -a kulikov-a 2e2294c
then enable "Use built-in root hints" in Services: Unbound DNS: General (with advanced mode "on") and Apply
Although I do not mind running any patches, they have a tendency to never be included in any updates, so in the end I will need to re-apply them after each upgrade. That might be okay also, however since this has been the case since 23.7.x, I would like either this patch or something else (sorry don't know what) more permanent.
Here is the thing: I have a restart Unbound from Monit service enabled with high CPU and/or high CPU temp (actually this time I got the high temp version that kicked in - it should have been the high CPU however my fingers seems to have modded the monit script to not work - my bad). This kind of works as a work-around, just like the patches?
I am seeing the exact same thing.
I tried applying the patches within the 23.7 unbound thread here - https://forum.opnsense.org/index.php?topic=35527.msg187426#msg187426
These haven't appeared to make any real difference. There was a small hotfix update after i applied them so not sure if they get overwritten each time or depending on what the update was.
I have just applied the linked patch and enabled so will see how I go
Could you try
cd /tmp
ktrace -p <pid of misbehaving unbound>
# wait a couple of seconds
ktrace -C
kdump
This will catch all system calls the process performs in that time and state. It will not catch if it's calculating "something" internally. But frequently this gives hints about problems. E.g. server processes trying to open a logfile in a nonexistent directory so they cannot log why the fail to start etc. For file accesses you want to look for NAMI calls, for example.
Patrick, I will do that "next time" since it is running just fine for the moment.
I have to be a more specific on the setup also: I am back to my old config so to speak, so I am NOT using any of the old patches from 23.7. I have also re-enabled DHCP registration ("Registrer DHCP Leases" and "Register DHCP Static Mappings") - and the reason for mentioning this is that my old 23.7 OPNsense installation was a bit more stable without DHCP stuff enabled.... Remember there might, in my case with bare metal and direct attached LAN PC, something that triggers when I wake up my LAN attached PC...?
And just to be perfectly clear, I have applied the patch mentioned above:
opnsense-patch -a kulikov-a 2e2294c
I have done so to see if that makes any difference - I need to see that, so I am now just waiting for either nothing or, as I think, a new 100% CPU Unbound... Eggtimer 8)
Iam running in the same issues since 24.1 is there any way to help debug this? It looks like it appears every few days but couldnt find any pattern:
This night it appeard again since monit showed failing DNS resolution in the night. However once my PC was running again it seems to work again without any intervention. So iam not sure if this has some relation or if it just was some random event.
Error when it stopped working was this:
2024-02-15T01:58:19 Error unbound [47314:1] error: reading root hints /root.hints 7:8: Syntax error, could not parse the RR's type
2024-02-15T01:58:19 Error unbound [47314:3] error: reading root hints /root.hints 2:13: Syntax error, could not parse the RR's type
TypeError: an integer is required (got type NoneType)
os.write(self._pipe_fd, res.encode())
File "dnsbl_module.py", line 227, in log_entry
logger.log_entry(query)
File "dnsbl_module.py", line 548, in operate
Suspiciously the time i shutdown my PC was around 2PM and it looks like after that unbnound started failing. Booting it up again in the morning at around 08:50 and unbound worked again ? Could this have something to do with an specific interface going up or down ? Its the first time i see the link between client going up and down influencing unbound but maybe its just random. Error seems to be the same as from the Thread creator:
2024-02-15T08:46:24 Informational unbound [84952:0] info: generate keytag query _ta-4f66. NULL IN
2024-02-15T08:46:23 Informational unbound [84952:0] info: start of service (unbound 1.19.0).
2024-02-15T08:46:23 Notice unbound [84952:0] notice: init module 2: iterator
2024-02-15T08:46:23 Notice unbound [84952:0] notice: init module 1: validator
2024-02-15T08:46:23 Notice unbound [84952:0] notice: init module 0: python
2024-02-15T08:46:20 Notice unbound Closing logger
2024-02-15T01:58:19 Notice unbound Backgrounding unbound logging backend.
2024-02-15T01:58:19 Notice unbound daemonize unbound dhcpd watcher.
2024-02-15T01:58:19 Error unbound [47314:0] error: str: syscall error with errno No error: 0
2024-02-15T01:58:19 Notice unbound [47314:0] notice: failed connection from 127.0.0.1 port 44860
2024-02-15T01:58:19 Informational unbound [47314:0] info: start of service (unbound 1.19.0).
2024-02-15T01:58:19 Critical unbound [47314:1] fatal error: Could not initialize thread
2024-02-15T01:58:19 Critical unbound [47314:3] fatal error: Could not initialize thread
2024-02-15T01:58:19 Informational unbound [47314:3] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0
2024-02-15T01:58:19 Informational unbound [47314:3] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2024-02-15T01:58:19 Informational unbound [47314:1] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
2024-02-15T01:58:19 Informational unbound [47314:1] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
2024-02-15T01:58:19 Error unbound [47314:1] error: Could not set root or stub hints
2024-02-15T01:58:19 Error unbound [47314:3] error: Could not set root or stub hints
2024-02-15T01:58:19 Error unbound [47314:1] error: reading root hints /root.hints 7:8: Syntax error, could not parse the RR's type
2024-02-15T01:58:19 Error unbound [47314:3] error: reading root hints /root.hints 2:13: Syntax error, could not parse the RR's type
Also the General log shows this for the morning:
2024-02-15T08:46:20 Error opnsense /usr/local/etc/rc.linkup: The command '/bin/kill -'TERM' '47314''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 47314: No such process'
2024-02-15T08:46:10 Error opnsense /usr/local/etc/rc.linkup: The command `/sbin/ifconfig 'bridge0' addm 'igb1'' failed to execute
Quote from: Sensler3000 on February 15, 2024, 10:11:45 AM
Suspiciously the time i shutdown my PC was around 2PM and it looks like after that unbnound started failing. Booting it up again in the morning at around 08:50 and unbound worked again ? Could this have something to do with an specific interface going up or down ? Its the first time i see the link between client going up and down influencing unbound but maybe its just random.
So you have your PC connected directly to the firewall LAN interface, without a switch? That will cause tons of issues if that's the case.
Iam using a MiniPC with multiple ports and connected all Devices directly to it. Its running this way since 2 years so i dont see any issue with that ? I just saw the log so i assumed its worth posting this. Why would a switch between the PC and the Firewall change any of this ? i Dont need a switch the Firewall has enough ports to cover all devices. i dont have any issues with the devices whatsoever only unbound started causing the mentioned issues since the recent update.
Quote from: doktornotor on February 15, 2024, 10:28:46 AM
Quote from: Sensler3000 on February 15, 2024, 10:11:45 AM
Suspiciously the time i shutdown my PC was around 2PM and it looks like after that unbnound started failing. Booting it up again in the morning at around 08:50 and unbound worked again ? Could this have something to do with an specific interface going up or down ? Its the first time i see the link between client going up and down influencing unbound but maybe its just random.
So you have your PC connected directly to the firewall LAN interface, without a switch? That will cause tons of issues if that's the case.
I also have one of my PCs direct connected to OPNsense bare metal - I do not have a "ton" of issues with that, and pre23.7 even Unbound worked without ANY issues. There is something introduced in 23.7, and it is still present in 24.1 obviously, that gives OPNsense challenges at some point - interface up/down is most likely not part of Unbound issue - maybe DHCP registration of name/ip is though, I will have to test that at a later date when my current setup fails (if ever).
So, you bridge all the ports on the firewall and connect ALL devices directly to it?
Well, long story short - you absolutely DO need a switch. Total waste of time debugging similar setups. Anything you unplug disrupts the bridge, disrupts unbound which binds to all the interfaces by default - and generally fires up scripts on interface (un)plug events that disrupt things even further.
Just read the logs:
2024-02-15T08:46:10 Error opnsense /usr/local/etc/rc.linkup: The command `/sbin/ifconfig 'bridge0' addm 'igb1'' failed to execute
Not to mention the horrible performance penalty using SW bridge vs. HW switching with ASIC.
Do not do this. Ever.
I have a 8-port setup, no bridge needed.
Since OP has no bridge but the same error it seems to be unrelated. Also it runs like this for 2 years+ (with no penalty affecting me) so maybe i was on the wrong track here and the interface has nothing to do with it. Pretty sure its still something with unbound.
He doesn't bridge them, instead uses a port for a single device (pc) that I've been telling for some time to avoid for the reason you explained. Switches the device on, scripts fire, disruption ensues.
I've stated that that is the expected behaviour and unbound should not keel over but that is adding to the mix.
Quote from: lar.hed on February 15, 2024, 10:37:26 AM
I have a 8-port setup, no bridge needed.
The other guy clearly has one (at least).
Once again, the port will go down when you unplug device from it. Unbound listens on that interface. Unbound will get disrupted. Do NOT do this. You can get an unmanaged 8 port switch pretty much for free from a dumpster. Instead of setting up 8 different subnets and routing packets b/w them for no good reason at all.
SMH.
P.S. The Unbound interfaces are configurable. Exclude anything not permanently connected from its configuration. Will most likely need to listen on WAN only and and point any internal devices to WAN IP address for DNS with similar broken-by-design setups.
i understand it fires a script when changes on the interface can happen. But still something seems to cause the issue when the script fires (at least i assume this based on your answer) so we should try to debug this instead of saying never up and down an interface to avoid running the script ?
Also i assume its a pretty normal setup for people connecting devices to the firewall itself without a switch ?
Also iam pretty sure this problem happend also when no device changes was happening (so all running). So maybe its just a coexistence ? @lar.hed did you ever see this error triggering when one of your devices connects / disconnects from the interface ?
Quote from: Sensler3000 on February 15, 2024, 10:47:42 AM
i understand it fires a script when changes on the interface can happen. But still something seems to cause the issue when the script fires (at least i assume this based on your answer) so we should try to debug this instead of saying never up and down an interface to avoid running the script ?
Also i assume its a pretty normal setup for people connecting devices to the firewall itself without a switch ?
Doubt it. Reason switches exist. People do it, sure. Want to live with the necessary disruption? I don't.
So you say the Unbound error is a directly connecting to not using a switch ? Or is this just an assumption? i know its not best practice but it worked flawlessless for years so suddenly this error appeared with some changes ?
Quote from: cookiemonster on February 15, 2024, 10:50:43 AM
Quote from: Sensler3000 on February 15, 2024, 10:47:42 AM
i understand it fires a script when changes on the interface can happen. But still something seems to cause the issue when the script fires (at least i assume this based on your answer) so we should try to debug this instead of saying never up and down an interface to avoid running the script ?
Also i assume its a pretty normal setup for people connecting devices to the firewall itself without a switch ?
Doubt it. Reason switches exist. People do it, sure. Want to live with the necessary disruption? I don't.
Pretty much. Except for things like a dedicated management port with a /30 subnet or similar, configured carefully in a way that you can plug in your laptop to it with a statically configured IP in order to fix issues on a headless box, or similar, and no unneeded services bound to that interface if possible. SSH (and possibly webgui) only.
Ok understood. Since i had this issue the last weeks as well without any device connecting or disconnecting i still think its at least not only related to not using a switch so i try to gather additional data.
Quote from: doktornotor on February 15, 2024, 10:54:23 AM
Pretty much.
Since I do not run bridge ports and direct connected PC, and I have ALSO had this challenge with Unbound when my direct connected LAN PC has NOT been used (as in I am not home, nothing started that particular PC so no interface up/down). So No, your assumption is not correct in this Unbound case. There has been other things related to interface up/down, but that is a totally different story. Do also note that there is at least ONE installation which is a virtual installation, that have been experiencing this Unbound challenge.
Development!!!! Yihaa - or maybe not, hang on ::)
So I just had another 100% CPU bound happening. This time NO roots.hint error. So that in a way is great, or is it?
Well, there is still that 100% Unbound CPU usage, I just do not get the roots.hint error anymore - and this might be related to the patch mentioned earlier:
opnsense-patch -a kulikov-a 2e2294c
So this removes an "error" about the roots.hint file that no one could relate to or anything - it simply put did not make anyone happier.
So I am still with the challenge that something gives Unbound a challenge at some point of time. My current test approach to see if I can get it more stable (do remember that it was a lot more stable in the end of my 23.7-testing with two patches (not applied) and me removing a few plugins (most likely not involved in anything related to Unbound), and then the final thing I did was to NOT update Unbound on DHCP. So now I will uncheck:
1) Register DHCP Leases
2) Register DHCP Static Mappings
This will of course sabotage my name resolution on my intranet - however out of pure luck I have never used name resolution at all on my intranet - everything goes over IP addresses... So the impact for me is slim to none, so this is an easy way to test/validate if the update of any IP address from DHCP might be involved in this Unbound challenge...... Stay tuned.... 8) Starting that egg timer, yet again ::)
Quote from: Patrick M. Hausen on February 14, 2024, 11:24:02 PM
Could you try
cd /tmp
ktrace -p <pid of misbehaving unbound>
# wait a couple of seconds
ktrace -C
kdump
This will catch all system calls the process performs in that time and state. It will not catch if it's calculating "something" internally. But frequently this gives hints about problems. E.g. server processes trying to open a logfile in a nonexistent directory so they cannot log why the fail to start etc. For file accesses you want to look for NAMI calls, for example.
At my latest 100% CPU Unbound happening, I have added thoose two commands to my kill script so that it will create something at each restart - I might not be home and all that, so it was an easy way. Well the files are empty from the two ktrace commands - maybe I do something wrong?
pgrep "unbound" | grep -v "$$" | xargs ktrace -p > /home/lars/ktracep_`date +'%y%m%d_%T'`
sleep 5
ktrace -C > /home/lars/ktraceC_`date +'%y%m%d_%T'`
Quote from: Patrick M. Hausen on February 14, 2024, 11:24:02 PM
Could you try
cd /tmp
ktrace -p <pid of misbehaving unbound>
# wait a couple of seconds
ktrace -C
kdump
I could catch a crash live today and tried this command. But the output was empty. To make sure i did it correct i tested it for another process and i got plenty of output, i tried mutliple times while waiting more than 5 minutes but output was still empty so i guess the process is just dead ? So the unbound process just sits there with 97% usage and DNS resolution does not work anymore until i kill it.
Output was again:
2024-02-15T19:34:08 Critical unbound [18464:3] fatal error: Could not initialize thread
2024-02-15T19:34:08 Error unbound [18464:3] error: Could not set root or stub hints
2024-02-15T19:34:08 Error unbound [18464:3] error: reading root hints /root.hints 8:14: Syntax error, could not parse the RR's class
TypeError: an integer is required (got type NoneType)
os.write(self._pipe_fd, res.encode())
File "dnsbl_module.py", line 227, in log_entry
mod_env['logger'].log_entry(
File "dnsbl_module.py", line 379, in cache_cb
logger.close()
File "dnsbl_module.py", line 444, in deinit
Not dead but mostly dead ;D
So the process ist occupying one core with just internal calculations of *whatever* without issuing any system calls. Weird ...
DTrace would be the next larger gun.
i might have a way to trigger it now but iam still testing. Could you tell me how to use DTrace than i happy to help debugging.
No, sorry. I know it exists and I did use it on one occasion or two but I am definitely not fluent.
I wonder if the problem persists when you revert to a "default" configuration. I can imagine such behaviour when you create an internal loop like a CNAME pointing to itself.
I know that unbound does some magic with appending the default domain when none is given, that could probably lead to something similar (like when there is a machine that has the same name as a "local" domain, such as home.home). Whatever, I would check what happens when you remove all of your overrides.
i dont use any overrides. Only customization i use for Unbound is DoT.
I don't use overrides either.
I "only" use BlockList and DoT.
A DNS loop is just one thing that might explain such a hangup. What I wanted to point out is that obviously, not everybody has those problems, so it seems to be triggered by something special in your setup.
Now, that you have one thing in common, namely DoT:
I do not use it and have no problem. Did you try to disable it?
Regarding DoT: No not yet.
And there is a reason for that: This config I am running, as I mentioned before, was running just fine until I upgraded to 23.7.x - after that upgrade this Unbound challenge has been around. Different patches has been tested, but none so far have solved this challenge. So currently, as mentioned, I have removed DHCP registration within Unbound - which, fingers crossed and all that, seems to make this just so much more stable.
So why might this be an issue for say myself and not for others - of course this is a very good question, which we still are looking for answers for. My thinking is: for some reason I think this will happen sooner or later for everybody (that are using Unbound that is), but for some reason my setup is faster (!?) to trigger this. So say that you upgrade your OPNsense as soon as something is released, or just a reboot once a week or something, this will so to speak reset the eggtimer.... And I seem to drag my feet abit around before I upgrade...
Is it possible this is realated to
DNSSEC validators -- denial-of-service/CPU exhaustion from KeyTrap and NSEC3 vulnerabilities
CVE: CVE-2023-50868
CVE: CVE-2023-50387
WWW: https://vuxml.FreeBSD.org/freebsd/21a854cc-cac1-11ee-b7a7-353f1e043d9a.html
which is patched in unbound 1.19.1?
Hopefully that update will find it's way to opnsense soon.
I have been running this since I posted earlier.
I cleared all my logs.
Within the settings screen under general for unbound i only have ticked;
enable unbound
enable DNSSEC support
Flush DNS Cache during reload
I have not seen that error again so far since adding the patch, its been a few days so far. Device has been pretty stable so far.
@joshndroid which of the 3 patches did you apply ?
I had applied the 3 from the original thread that is linked in the 23.7 unbound thread.. However there was a hot fix update so not sure if they stuck as I never reapplied them - https://forum.opnsense.org/index.php?topic=35527.msg187426#msg187426
I then applied the patch in this thread - https://forum.opnsense.org/index.php?topic=37973.msg188912#msg188912
Regarding the switch vs direct plug discussion. I've had many times in my career where I was sure something was completely unrelated and didn't test it until I had gone down many other rabbit holes. Sometimes it fixed the problem, sometimes it didn't. But switches are cheap and easy to test. Because sometimes computers gonna computer.
I'm using DoT, DNSSEC, and DHCP registrations and I don't have any issues with Unbound. I'm also using multiple blocklists, but not anything like HAProxy, AdGuard, etc. Just wanted to provide some more data.
I'm still waiting for the next 100% CPU. Currently I am running DoT, DNSSEC, BlockList and NO DHCP registration.
If this works for say 14 days (I will not update for any fixes until crash or something... bad idea partly I know, but then again...), I will attach a switch (I have a rather simple D-Link somewher, it will do for this test) to see if anything changes (with DHCP re-enabled for registration). I do agree with CJ about a switch that sometimes can move the problem away...
I'm not out of ideas to test, it just takes for ever...
EDIT: And I do run ONLY IP4 - no IP6 at all.
I don't use unbound myself but a quick internet search gives comments that you should disable DNSSEC & caching - I don't know if that will help.
I prefer to run my DNS servers on the LAN and fwiw, I use Powerdns Authoritative & Recursor plus dnsdist for load balancing - never had any problems with that configuration.
I just had a strange thing happening: Unbound just stopped - I have no clue why, nothing in the log for Unbound, or in System/Log/General. My Monit setup started Unbound again so no biggi in that way, but why just stop?
Hope it was just a glitch...
Yeasterday I did a firmware update, so now I am running:
OPNsense 24.1.2-amd64
FreeBSD 13.2-RELEASE-p10
OpenSSL 3.0.13
Within less than 24h I got another 100% CPU - and my Monit setup kill Unbound and everything got back to normal. I have combined three logs: General (under System), Monit and Unbound - it is way easier to follow what happens (why is that not a built in option, to look at several logs at the same time - or is it and I just do not know how to combine logs?):
LOG Date Severity Process Line
Monit 2024-02-22T00:09:22 Informational monit 'UnboundHighCPU' status succeeded (1) -- exit: Expression Syntax.
Unbound 2024-02-22T00:07:29 Informational unbound [17657:1] info: generate keytag query _ta-4f66. NULL IN
Unbound 2024-02-22T00:07:29 Informational unbound [17657:0] info: start of service (unbound 1.19.1).
Unbound 2024-02-22T00:07:29 Notice unbound [17657:0] notice: init module 2: iterator
Unbound 2024-02-22T00:07:29 Notice unbound [17657:0] notice: init module 1: validator
Unbound 2024-02-22T00:07:22 Informational unbound [17657:0] info: dnsbl_module: blocklist loaded. length is 3687688
General 2024-02-22T00:07:18 Error opnsense /usr/local/sbin/pluginctl: The command '/bin/kill -'TERM' '59284''(pid:/var/run/unbound.pid) returned exit code '1', the output was 'kill: 59284: No such process'
General 2024-02-22T00:07:18 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (execute task : unbound_configure_do(1))
General 2024-02-22T00:07:18 Notice opnsense /usr/local/sbin/pluginctl: plugins_configure unbound_start (1)
Monit 2024-02-22T00:07:18 Informational monit 'UnboundHighCPU' start: '/usr/local/sbin/pluginctl -c unbound_start'
Unbound 2024-02-22T00:07:18 Informational unbound [17657:0] info: dnsbl_module: updating blocklist.
Unbound 2024-02-22T00:07:18 Notice unbound [17657:0] notice: init module 0: python
Monit 2024-02-22T00:07:13 Informational monit 'UnboundHighCPU' stop: '/usr/local/opnsense/scripts/OPNsense/Monit/Unbound_Kill.sh stop'
Monit 2024-02-22T00:07:13 Informational monit 'UnboundHighCPU' trying to restart
Monit 2024-02-22T00:07:08 Error monit 'UnboundHighCPU' status failed (100) -- no output
Unbound 2024-02-22T00:03:14 Informational unbound [59284:4] info: generate keytag query _ta-4f66. NULL IN
Unbound 2024-02-22T00:03:14 Critical unbound [59284:3] fatal error: Could not initialize thread
Unbound 2024-02-22T00:03:14 Informational unbound [59284:0] info: start of service (unbound 1.19.1).
Unbound 2024-02-22T00:03:14 Informational unbound [59284:3] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0
Unbound 2024-02-22T00:03:14 Informational unbound [59284:3] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Unbound 2024-02-22T00:03:14 Error unbound [59284:3] error: Could not set root or stub hints
Unbound 2024-02-22T00:03:14 Error unbound [59284:3] error: reading root hints /root.hints 2:6: Syntax error, could not parse the RR's type
Unbound 2024-02-22T00:03:14 Notice unbound [59284:0] notice: init module 2: iterator
Unbound 2024-02-22T00:03:14 Notice unbound [59284:0] notice: init module 1: validator
Unbound 2024-02-22T00:03:07 Informational unbound [59284:0] info: dnsbl_module: blocklist loaded. length is 3687688
General 2024-02-22T00:03:06 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt2(igb2)
General 2024-02-22T00:03:06 Notice kernel <6>igb2: link state changed to DOWN
Unbound 2024-02-22T00:03:03 Informational unbound [59284:0] info: dnsbl_module: updating blocklist.
Unbound 2024-02-22T00:03:03 Notice unbound [59284:0] notice: init module 0: python
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : dnsmasq_configure_do())
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns ()
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp ()
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (execute task : ipsec_configure_do(,opt2))
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (,opt2)
General 2024-02-22T00:03:00 Notice opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt2'
General 2024-02-22T00:03:00 Notice kernel <6>igb2: link state changed to UP
Unbound 2024-02-22T00:03:00 Informational unbound [50504:0] info: service stopped (unbound 1.19.1).
General 2024-02-22T00:02:59 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt2(igb2)
General 2024-02-22T00:01:38 Notice dhclient dhclient-script: Creating resolv.conf
General 2024-02-22T00:01:38 Notice dhclient dhclient-script: Reason RENEW on igb1 executing
Do NOT get confused about the roots.hint error - it is not the problem, I have showed that earlier - it is a symptom maybe? but it is not what is the challenge, it is just "a thing" that seems to happen....
Anyway, for the record: DNSSEC, BlockLists, DoT enabled and NO DHCP registration --> this kills my idea of something DHCP related. Instead I have to admit that this might very well be interface related. I have earlier written I do not think so, and part of me are still in that odd corner, however when 23.7 was release I had problems with my direct connected PC (LAN if you will) as one might recall: https://forum.opnsense.org/index.php?topic=36807.msg180191#msg180191
So it does indeed look like there is a connection with link up/down. However as I wrote earlier, this worked in 23.1 - and from 23.7 it is a bit of a challenge...
I will go and search for my DLink switch, so I can remove the link up/down from this and see if this works better. Do note however that I have for example my HomeAssistant rpi4 server direct attached to another port, and a Linux Server to yet another port. However they run 24x7 so no link up/down on them...
I'll be back 8)
hi
QuoteDo NOT get confused about the roots.hint error - it is not the problem, I have showed that earlier - it is a symptom maybe?
sorry, what make you think so? from what i see - its (one of ?) the reason
QuoteI did a firmware update
so 'use internal hints' patch is gone?
Quote from: Fright on February 22, 2024, 07:07:56 AM
hi
QuoteDo NOT get confused about the roots.hint error - it is not the problem, I have showed that earlier - it is a symptom maybe?
sorry, what make you think so? from what i see - its (one of ?) the reason
QuoteI did a firmware update
so 'use internal hints' patch is gone?
Well I can of course not say for 100% sure that the roots.hint is not the challenge - but I had this 100% CPU Unbound challenge even with that patch installed. That patch got overwritten I guess when I upgraded to latest, since Unbound was part of that firmware upgrade (ports: unbound 1.19.1) - the option that was installed with that patch got removed anyway so it is not available anymore (except if I re-install the patch of course).
Unbound not runnning...
This time it seems to have been triggered by med switching on my LAN PC, at least it is about the same time, but there is NO lines in the Unbound log that says anything usefull for me, except fatal error (which of course is something bad?):
2024-02-22T13:50:58 Critical unbound [43239:3] fatal error: Could not initialize thread
2024-02-22T13:50:58 Critical unbound [43239:2] fatal error: Could not initialize thread
It would be lovely if I knew why - just telling me the above gives no clue. The general log has some more details maybe, but I can not interpret what is going on:
2024-02-22T13:50:58 Notice kernel <6>pid 43239 (unbound), jid 0, uid 59: exited on signal 11
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : unbound_configure_do())
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns (execute task : dnsmasq_configure_do())
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dns ()
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure dhcp ()
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (execute task : ipsec_configure_do(,opt2))
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: plugins_configure ipsec (,opt2)
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: ROUTING: entering configure using 'opt2'
2024-02-22T13:50:43 Notice opnsense /usr/local/etc/rc.linkup: DEVD: Ethernet attached event for opt2(igb2)
2024-02-22T13:50:43 Notice kernel <6>igb2: link state changed to UP
If anyone cares: Had another 100% CPU from Unbound. This time around there was NO interface / link going up or down - my LAN PC had been turned off for hours before this incident.
So here I give up, the Monit script works by running kill -9 and then restart. It would be nice if this gets solved some day, but I have very low fate in that to happen. There does not seem to be anything to work on, no interface dependes, no blocklist, no DoT, no DNSSEC or anything - it just freaks out. Someone has made a change, and my prediction is that this will get worse.
Quote from: lar.hed on February 26, 2024, 09:33:26 AM
If anyone cares: Had another 100% CPU from Unbound. This time around there was NO interface / link going up or down - my LAN PC had been turned off for hours before this incident.
So here I give up, the Monit script works by running kill -9 and then restart. It would be nice if this gets solved some day, but I have very low fate in that to happen. There does not seem to be anything to work on, no interface dependes, no blocklist, no DoT, no DNSSEC or anything - it just freaks out. Someone has made a change, and my prediction is that this will get worse.
I gave up as well...
Disabled unbound and moved over to the adguard plugin package... almost drop in replacement in the end.
Still get to keep all my router rules/settings, still get to keep my upsteam DOT/DOH. Just had to put the DNS server values as upstream into adguard and I was off...
Was even able to clean up some of my rewrites in the process.
Its been days since any sort of internet issue.
Having the same issue. Everything is smooth until I would try and look at unbound log files and get stuck 'working...' then it would instantly jump to 100% and stay there. Not sure if it the same cause for others but cleared out log files and instantly dropped back to normal and no issues so far. Pretty new to OPNsense but figured I'd share my experience in hope to help narrow it down. Running log level verbosity 2, blocklist, DoT if that helps any. Not sure if verbosity level would be causing issues with the log size or something maybe?