[WORKAROUND] opnsense 24.7.2 Unbound forwarding to private server riddle

Started by mifi42, August 27, 2024, 11:53:52 AM

Previous topic - Next topic
I'm stumped. I need help.
I have been going forward and backward for three days now, but I cannot get unbound to query my other domainserver properly.

Situation:
I have two locations in different buildings.
One is running a local LAN as the local '
I am at the location 'far' at the moment, as I am working on the firewall there. Counter intuïtive the 'far' LAN is currently my local LAN.

The 'ned' domain is the remote domain at the other location.
I have opnsense 24.7.2 and unbound running in the local 'far' location, serving the local 'far'-domain for the local LAN.
I have a different nameserver running to serve the remote LAN 'ned'-domain at the 'ned'location.
There is a wireguard VPN connection between the local 'far' domain and the remote 'ned' domain.


On the LAN in the 'far' domain, I can guery the 'ned' nameserver with dig (dig @<nameserver'ned'-IP> <hostname in 'ned' domein>).
succes. No problems.
Working.

On the GUI of opnsense firewall at 'far' I can use menu:/Interfaces/Diagnostics/DNS Lookup/ with
Hostname <hostname in 'ned' domein> and
server <address of nameserver in 'ned'>
Returns the expected result, i.e. the proper IP adress for the remote host.
Working.


Problem:
The 'ned' nameserver is in menu:/Services/Unbound/Query forwarding
The internet upstream is in menu:/Services/Unbound/DNS over TLS

I query my local unbound in the 'far'domain, expecting the query to be forwarded to the remote 'ned' nameserver:
dig <hostname in 'ned' domein>
I receive no address, but a SERVFAIL.
Not working.

In the unbound log, I can see it decided to query the remote 'ned' nameserver over wireguard, but it failed to parse the answer:

[3506:1] error: SERVFAIL <hostname.ned. A IN>: all the configured stub or forward servers failed, at zone ned. from <nameserve'ned-IP> could not parse upstream response


Okay. So I disabled menu:/System/Settings/Administration/DNS Rebind check
Did not fix the problem.

I removed 192.168.0.0/16 from menu:/Services/Unbound/Advanced/Rebind protection networks
Not working.

menu:/Services/Unbound/General/Enable DNSSEC support is unchecked
Not working.

I added 'ned' to menu:/Services/Unbound/Advanced/Insecure Domains
Not working.

I logged in with ssh and

cd /var/unbound/etc
unbound-host -v -C ./dot.conf hostname.ned
Response is fine:
hostname.ned has address <hostname IPv4 address> (insecure)
hostname.ned has no IPv6 address (insecure)
hostname.ned has no mail handler record (insecure)


Personal conclusions (I am wrong probably):
The connection is ok.
Firewall rules are ok.
Both nameservers are responsive.
The config of unbound for query forwarding is oke. Forwarding to public ns works oke, with DNS over TLS


There must be something in the configuration of Unbound that is stopping the query with an odd looking error, but I cannot find the problem, even with more logging switched on. In the log I can see the query being forwarded to the remote 'ned' nameserver, but I get the error while parsing the response.
Yet, when the 'ned' nameserver is queried from the LAN or even from the firewall, the response is fine.
Even unbound-host can do it!

I am stuck. I have no idea where to look next.
Help is appreciated.

cheers,
Michiel

Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?

Quote from: dseven on August 27, 2024, 12:41:55 PM
Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?
Yes I did. It is in there.
Thanks.
Michiel

Quote from: mifi42 on August 27, 2024, 02:52:13 PM
Quote from: dseven on August 27, 2024, 12:41:55 PM
Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?
Yes I did. It is in there.

In where? If you mean your problem description; note that "Private Domains" is not the same as "Insecure Domains" - the latter means that DNSSEC can be broken....

Quote from: dseven on August 27, 2024, 04:07:09 PM
Quote from: mifi42 on August 27, 2024, 02:52:13 PM
Quote from: dseven on August 27, 2024, 12:41:55 PM
Have you tried (apologies if I'm missing it in your report) adding 'ned' to Services -> Unbound DNS -> Advanced -> Private Domains ?
Yes I did. It is in there.

In where? If you mean your problem description; note that "Private Domains" is not the same as "Insecure Domains" - the latter means that DNSSEC can be broken....

It is in 'private domains'. I replied to what you asked.
And 'ned' is also in 'Insecure domains', it is in both fields.

michiel

hmm, I don't know, then.  I don't think you said what type 'ned's nameserver is, but... can you see if it logs anything at the time of the failure? Maybe you could use tcpdump to capture and examine the response and see if there's anything unusual about it....

Quote from: dseven on August 27, 2024, 09:52:09 PM
hmm, I don't know, then.  I don't think you said what type 'ned's nameserver is, but... can you see if it logs anything at the time of the failure? Maybe you could use tcpdump to capture and examine the response and see if there's anything unusual about it....

Yes, thank you, Looking at the logs at the other might be a good idea. The 'ned' nameserver is an instance of dnsmasq, by the way.
Last night I figured I should try with tcpdump of the wireguard interfaces, as the unbound logs do not seem to help my any further.
Cheers,
Michiel

It is interesting to mention that I cannot find any reference to what error messages mean. I have looked in opnsense documentation, and in unbound documentation (at nlnetlabs.nl).

Searching this forum and the internet for 'unbound+could+not+parse+upstream+response' did not yield any results.
the closest result I could get was this:
https://github.com/NLnetLabs/unbound/issues/946

I am tempted to search the source code of unbound on a cloned github repo, to find when a parse error is generated. It is getting ridiculous, is it not?



WTF?

tcpdump -v -i wg0

shows queries going accross the tunnel, whit
dig @192.168.11.2 <hostname.ned>

But
dig <hostname.ned>
shows noting at all. Not even a query.

Still, the error in the unbound log shows it could not parse the result. It also shows it knew where to send the query, because the nameserver address is correct.  ???
[79782:1] error: SERVFAIL <hostname.ned. A IN>: all the configured stub or forward servers failed, at zone ned. from 192.168.11.2 could not parse upstream response

It appears to understand it has to query 192.168.11.2, but it doesn't?
Tcpdump tells me it seems to not even query the server!

Time for a coffee.
michiel

Quote from: dseven on August 28, 2024, 12:39:04 PM
Start at https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L646-L650

Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think? https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L4311-L4321

Were you able to capture a response with tcpdump?
I have just posted the exact error message. It uses the word 'response' so your first link is accurate. Thanks.
However, in the mean time I am totally confused, because I see neither a query by unbound nor a response in tcpdump.
It must be me slowly going mad!  :-[

I am not surprised there would be a parse error on a non-existent response. Why do I not see the query going out?

Michiel

Quote from: mifi42 on August 28, 2024, 12:53:17 PM
Quote from: dseven on August 28, 2024, 12:39:04 PM
Start at https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L646-L650

Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think? https://github.com/NLnetLabs/unbound/blob/b5951ce1fa30b64b4fb079e36d5d98d57fb53372/iterator/iterator.c#L4311-L4321
I have just posted the exact error message. It uses the word 'response' so your first link is accurate. Thanks.

It would be an additional log message, before the error message. You might have to turn on verbose logging to see it, but I think you have done that already(?)



Quote from: dseven on August 28, 2024, 12:39:04 PM
Do you see a log message "parse error on reply packet"? If not, it'd have to be eDNS, I think?

Indeed I'd perhaps start with making unbound use TCP only.

Quote from: dseven on August 28, 2024, 01:22:56 PM
It would be an additional log message, before the error message. You might have to turn on verbose logging to see it, but I think you have done that already(?)

Unbound is capable of logging every single query, all phases of it. Perhaps the loglevel is not bumped enough.

I had to kill all users (disconnecting cables  ::) because I am flooded with log messages if not.

I scrolled the log (verbosity at 5 at the moment).
I have seen the query going out and a response coming in as UDP packet. Roundtrip time 17msec

I see UPD responses coming in but they are not decoded. It is sending the query to the right server, requesting the right records 'IN A' and it noticed a response from the said server over UDP.

That sure is sifting through a lot of log!

Selecting on 192.168.11 omits the response. It does several tries each with a response received.
It is shows serviced query: EDNS works for ipv4 192.168.11.2 port 53 (len 16) one time, attempts four queries:

sending to target: <ned.> 192.168.11.2#53

, each with a response received and fails the parse.

I admit I have no idea what I am supposed to be looking for. I have noticed though, the queries to other servers seem to have the decoded responses in text in the log, and the responses for 'ned' do not.

So I have a piece of the log on level 5 debug selected and saved, but it is not making me any wiser  :(

The UDP response received seems to be binary, but it is not decoded.


Michiel