I'm stumped. I need help.
I have been going forward and backward for three days now, but I cannot get unbound to query my other domainserver properly.
Situation:
I have two locations in different buildings.
One is running a local LAN as the local '
I am at the location 'far' at the moment, as I am working on the firewall there. Counter intuïtive the 'far' LAN is currently my local LAN.
The 'ned' domain is the remote domain at the other location.
I have opnsense 24.7.2 and unbound running in the local 'far' location, serving the local 'far'-domain for the local LAN.
I have a different nameserver running to serve the remote LAN 'ned'-domain at the 'ned'location.
There is a wireguard VPN connection between the local 'far' domain and the remote 'ned' domain.
On the LAN in the 'far' domain, I can guery the 'ned' nameserver with dig (dig @<nameserver'ned'-IP> <hostname in 'ned' domein>).
succes. No problems.
Working.
On the GUI of opnsense firewall at 'far' I can use menu:/Interfaces/Diagnostics/DNS Lookup/ with
Hostname <hostname in 'ned' domein> and
server <address of nameserver in 'ned'>
Returns the expected result, i.e. the proper IP adress for the remote host.
Working.
Problem:
The 'ned' nameserver is in menu:/Services/Unbound/Query forwarding
The internet upstream is in menu:/Services/Unbound/DNS over TLS
I query my local unbound in the 'far'domain, expecting the query to be forwarded to the remote 'ned' nameserver:
dig <hostname in 'ned' domein>
I receive no address, but a SERVFAIL.
Not working.
In the unbound log, I can see it decided to query the remote 'ned' nameserver over wireguard, but it failed to parse the answer:
[3506:1] error: SERVFAIL <hostname.ned. A IN>: all the configured stub or forward servers failed, at zone ned. from <nameserve'ned-IP> could not parse upstream response
Okay. So I disabled menu:/System/Settings/Administration/DNS Rebind check
Did not fix the problem.
I removed 192.168.0.0/16 from menu:/Services/Unbound/Advanced/Rebind protection networks
Not working.
menu:/Services/Unbound/General/Enable DNSSEC support is unchecked
Not working.
I added 'ned' to menu:/Services/Unbound/Advanced/Insecure Domains
Not working.
I logged in with ssh and
cd /var/unbound/etc
unbound-host -v -C ./dot.conf hostname.ned
Response is fine:
hostname.ned has address <hostname IPv4 address> (insecure)
hostname.ned has no IPv6 address (insecure)
hostname.ned has no mail handler record (insecure)
Personal conclusions (I am wrong probably):
The connection is ok.
Firewall rules are ok.
Both nameservers are responsive.
The config of unbound for query forwarding is oke. Forwarding to public ns works oke, with DNS over TLS
There must be something in the configuration of Unbound that is stopping the query with an odd looking error, but I cannot find the problem, even with more logging switched on. In the log I can see the query being forwarded to the remote 'ned' nameserver, but I get the error while parsing the response.
Yet, when the 'ned' nameserver is queried from the LAN or even from the firewall, the response is fine.
Even unbound-host can do it!
I am stuck. I have no idea where to look next.
Help is appreciated.
cheers,
Michiel
Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?
Quote from: dseven on August 27, 2024, 12:41:55 PM
Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?
Yes I did. It is in there.
Thanks.
Michiel
Quote from: mifi42 on August 27, 2024, 02:52:13 PM
Quote from: dseven on August 27, 2024, 12:41:55 PM
Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?
Yes I did. It is in there.
In where? If you mean your problem description; note that "Private Domains" is not the same as "Insecure Domains" - the latter means that DNSSEC can be broken....
Quote from: dseven on August 27, 2024, 04:07:09 PM
Quote from: mifi42 on August 27, 2024, 02:52:13 PM
Quote from: dseven on August 27, 2024, 12:41:55 PM
Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?
Yes I did. It is in there.
In where? If you mean your problem description; note that "Private Domains" is not the same as "Insecure Domains" - the latter means that DNSSEC can be broken....
It is in 'private domains'. I replied to what you asked.
And 'ned' is also in 'Insecure domains', it is in both fields.
michiel
hmm, I don't know, then. I don't think you said what type 'ned's nameserver is, but... can you see if it logs anything at the time of the failure? Maybe you could use tcpdump to capture and examine the response and see if there's anything unusual about it....
Quote from: dseven on August 27, 2024, 09:52:09 PM
hmm, I don't know, then. I don't think you said what type 'ned's nameserver is, but... can you see if it logs anything at the time of the failure? Maybe you could use tcpdump to capture and examine the response and see if there's anything unusual about it....
Yes, thank you, Looking at the logs at the other might be a good idea. The 'ned' nameserver is an instance of dnsmasq, by the way.
Last night I figured I should try with tcpdump of the wireguard interfaces, as the unbound logs do not seem to help my any further.
Cheers,
Michiel
It is interesting to mention that I cannot find any reference to what error messages mean. I have looked in opnsense documentation, and in unbound documentation (at nlnetlabs.nl).
Searching this forum and the internet for 'unbound+could+not+parse+upstream+response' did not yield any results.
the closest result I could get was this:
https://github.com/NLnetLabs/unbound/issues/946 (https://github.com/NLnetLabs/unbound/issues/946)
I am tempted to search the source code of unbound on a cloned github repo, to find when a parse error is generated. It is getting ridiculous, is it not?
Start at https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L646-L650
Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think? https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L4311-L4321
Were you able to capture a response with tcpdump?
WTF?
tcpdump -v -i wg0
shows queries going accross the tunnel, whit
dig @192.168.11.2 <hostname.ned>
But
dig <hostname.ned>
shows noting at all. Not even a query.
Still, the error in the unbound log shows it could not parse the result. It also shows it knew where to send the query, because the nameserver address is correct. ???
[79782:1] error: SERVFAIL <hostname.ned. A IN>: all the configured stub or forward servers failed, at zone ned. from 192.168.11.2 could not parse upstream response
It appears to understand it has to query 192.168.11.2, but it doesn't?
Tcpdump tells me it seems to not even query the server!
Time for a coffee.
michiel
Quote from: dseven on August 28, 2024, 12:39:04 PM
Start at https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L646-L650
Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think? https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L4311-L4321
Were you able to capture a response with tcpdump?
I have just posted the exact error message. It uses the word 'response' so your first link is accurate. Thanks.
However, in the mean time I am totally confused, because I see neither a query by unbound nor a response in tcpdump.
It must be me slowly going mad! :-[
I am not surprised there would be a parse error on a non-existent response. Why do I not see the query going out?
Michiel
Quote from: mifi42 on August 28, 2024, 12:53:17 PM
Quote from: dseven on August 28, 2024, 12:39:04 PM
Start at https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L646-L650
Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think? https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L4311-L4321
I have just posted the exact error message. It uses the word 'response' so your first link is accurate. Thanks.
It would be an additional log message, before the error message. You might have to turn on verbose logging to see it, but I think you have done that already(?)
Quote from: dseven on August 28, 2024, 12:39:04 PM
Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think?
Indeed I'd perhaps start with making unbound use TCP only.
Quote from: dseven on August 28, 2024, 01:22:56 PM
It would be an additional log message, before the error message. You might have to turn on verbose logging to see it, but I think you have done that already(?)
Unbound is capable of logging every single query, all phases of it. Perhaps the loglevel is not bumped enough.
I had to kill all users (disconnecting cables ::) because I am flooded with log messages if not.
I scrolled the log (verbosity at 5 at the moment).
I have seen the query going out and a response coming in as UDP packet. Roundtrip time 17msec
I see UPD responses coming in but they are not decoded. It is sending the query to the right server, requesting the right records 'IN A' and it noticed a response from the said server over UDP.
That sure is sifting through a lot of log!
Selecting on 192.168.11 omits the response. It does several tries each with a response received.
It is shows serviced query: EDNS works for ipv4 192.168.11.2 port 53 (len 16)
one time, attempts four queries:
sending to target: <ned.> 192.168.11.2#53
, each with a response received and fails the parse.
I admit I have no idea what I am supposed to be looking for. I have noticed though, the queries to other servers seem to have the decoded responses in text in the log, and the responses for 'ned' do not.
So I have a piece of the log on level 5 debug selected and saved, but it is not making me any wiser :(
The UDP response received seems to be binary, but it is not decoded.
Michiel
Quote from: doktornotor on August 28, 2024, 01:34:17 PM
Quote from: dseven on August 28, 2024, 12:39:04 PM
Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think?
Indeed I'd perhaps start with making unbound use TCP only.
I seem to remember vaguely that I have seen such a setting, but I cannot find it at the moment. Unbound currently is sending an receiving UDP packets.
As for the "parse error on reply packet": I do not see that.
I am seeing "parse error on response packet".
I am also seeing that Unbound detects that the 'ned' server is EDNS capable.
m
server:
do-udp: no
IIRC. Read the man page.
Quote from: doktornotor on August 28, 2024, 03:55:33 PM
server:
do-udp: no
IIRC. Read the man page.
Ah, yes, when hacking the config files is allowed ???
I restricted myself to using the OPNsense GUI, but yes...
I must admit, I am about to give up on this. I have spent the last three days, going through logs, tcpdumps, and even looked at some source code of unbound, albeit not in the right spot.
I will experiment further with running dnsmasq as my internal root servers for the internal domains, without all the strict controls of unbind. I cannot afford to dive deeper or spend more time, especially if that leads me away from managing the firewall via the GUI.
It feels like defeat, but it is what it is.
Sorry for bothering you with this.
Michiel
I have used a workaround for now.
Unbound does still not successfully parse the response from my older nameserver for NED, so I decided to add 'host overrides' in unbound for the most important hosts in the NED network.
See menu:/Services/Unbound/Overrides/Host Overrides
Not very neat, a bit of a hack, but usable for the time being.
Thanks for the support,
Micihiel
I am seeing the same since 24.7.4_1.
Thought i was the only one until i saw your post. Looks like opnsense just stopped resolving. Doing manual lookups and specifying the dns server while the tunnel is up works like normal.
I have a nameserver resolving a domain behind an ipsec tunnel. Had it set in unbounds domain-overrides, worked flawless until recently. Was looking for 2 days and then just used the domain-overrides in dnsmask..
Its not a fix but a workaround. I switch on dnsmask when i need the remote domain.
You could add dnsmask as a dns-server to unbound. Ugly but doesnt includes adding all the single hosts to the override file.
Quote from: janb-de on September 22, 2024, 09:14:23 PM
Its not a fix but a workaround. I switch on dnsmask when i need the remote domain.
You could add dnsmask as a dns-server to unbound. Ugly but doesnt includes adding all the single hosts to the override file.
You probably mean you switch on dnsmasq locally on OPNsense when you need the remote DNS server?
Yes, that is doable.
m