Weird DNS issues

Started by garbinc, September 10, 2023, 11:41:02 PM

Previous topic - Next topic
Hello everyone,

I've been having a weird issue since my K8S migration to Cilium BGP and firewall upgrade.

Every DNS request is being sent to my actual server but being picked up and "resent" from the firewall.

I have a rule for IoT that should be blocking 8.8.8.8 (google) and adds my blocky dns server to the list.

You can see my rules for IoT and the "weird behavior" and in the attachments.

I also attached what my blocky instance is receiving as requests and you can see clearly that the requests are coming from OPNsense.local (192.168.1.1)


Does anyone have any idea what is going on?
I'm not completely noob with regards to DNS, but this is the first time I see this, even if i specifically request a server `dig plex.domain.com @192.168.50.7` it's being picked up by the firewall.

I appreciate any insight you can give me, i've been on this for a few days

If it helps, this is a packet capture of the request in question

Could you please share some information:

* Which interfaces are configured with which networks?
* Are you using NAT?
* You're working on unbound or dnsmasq on OPNsense?
* Which DNS server do you have in place on .50.7?

Quote from: tron80 on September 11, 2023, 04:29:58 PM
Could you please share some information:

* Which interfaces are configured with which networks?
* Are you using NAT?
* You're working on unbound or dnsmasq on OPNsense?
* Which DNS server do you have in place on .50.7?

Hey there thanks for your reply. I really do appreciate it.
I'll add the firewall rules too at the bottom.

* Which interfaces are configured with which networks?

CILIUM     - 192.168.50.X/24 (not sure i needed an entire VLAN, but i needed the base interface for BGP)
MGMT       - 192.168.30.X/24
IoT           - 192.168.20.X/24
WIRELESS - 192.168.10.X/24
VIDEO       - 192.168.8.X/24
LAN           - 192.168.1.X/24

BGP runs on OPNsense at 50.1 and connects with each computer of the kubernetes cluster. Simple enough really this is currently working.


(⎈|k3s:kube-system)➜  ~ cilium bgp peers
Node               Local AS   Peer AS   Peer Address   Session State   Uptime    Family         Received   Advertised
fluffy             65442      65401     192.168.50.1   established     4h4m32s   ipv4/unicast   10         10
                                                                                 ipv6/unicast   0          0
frenzy             65442      65401     192.168.50.1   established     4h4m33s   ipv4/unicast   10         10
                                                                                 ipv6/unicast   0          0
k3s-worker-2       65442      65401     192.168.50.1   established     4h4m32s   ipv4/unicast   10         10
                                                                                 ipv6/unicast   0          0
k3s-worker-5       65442      65401     192.168.50.1   established     4h4m34s   ipv4/unicast   10         10
                                                                                 ipv6/unicast   0          0
k3s-worker-gpu-1   65442      65401     192.168.50.1   established     4h4m34s   ipv4/unicast   10         10
                                                                                 ipv6/unicast   0          0
k3s-worker-gpu-4   65442      65401     192.168.50.1   established     4h4m32s   ipv4/unicast   10         10
                                                                                 ipv6/unicast   0          0
whirlwind          65442      65401     192.168.50.1   established     4h4m31s   ipv4/unicast   10         10
                                                                                 ipv6/unicast   0          0



* Are you using NAT?

Yes.

For port-forward i only have the "Anti-Lockout Rule".
For outbound, I have some rules for consoles.

Here they are:


Interface Source Source Port Destination Destination Port NAT Address NAT Port Static Port Description
LAN            any udp/*                *               udp/* Interface address *          YES Tailscale v4    
LAN            any udp/*                *               udp/* Interface address *          YES Tailscale v6    
WAN   WIRELESS net tcp/udp/*                *     tcp/udp/* WAN address *                  YES          
WAN          LAN net tcp/udp/*                *     tcp/udp/* WAN address *                  YES                


(sorry i really tried to make this as clean as possible)

I also have two automatic nat rules. (that are autogenerated, i have no control over them)

* You're working on unbound or dnsmasq on OPNsense?
I'm using Unbound, I tried both i'm getting the same weird issues.
I even tried disabling both DNS servers on the Firewall and all requests are always coming from the firewall. Even when being sent from the clients.

* Which DNS server do you have in place on .50.7?
The DNS server is a stateless DNS server called blocky -> https://github.com/0xERR0R/blocky

It's similar to adguard, but I can run as many replicas as I want. Sometimes I get more load when people are over and it autoscales up.

You can see my entire configuration here: https://github.com/larivierec/home-cluster/blob/main/kubernetes/apps/networking/blocky/app/config/config.yml

The important bit is this: https://github.com/larivierec/home-cluster/blob/main/kubernetes/apps/networking/blocky/app/config/config.yml#L82-L83 where it gets the names / ips from upstream (which is supposed to be populated by Unbound)

* Firewall Rules

LAN -> I just have the standard Anti Lockout rule
Other interfaces except IoT which has what's in the attachment below.

Let me know if I can supply anything else, I'm really at a loss with this.

I noticed something, this is only happening on vlan networks.
The ones wired and attached to LAN after doing a dig request, I am getting the proper behaviour.

Something is clearly not right with you setup.

Some observations:
* Why do you have a NAT rule on LAN for all UDP traffic? That would partially explain why your OPNsense appears to send a DNS packet. As you rewrite the source IP to the LAN IP of OPNsense
* What it does not explain is why this happens for traffic between two networks outside LAN.
* Given your additional comment regarding intended behaviour in LAN I wonder if you have some quirks in your network outside OPNsense's responsibilty. Are you sure that VLAN IDs a correctly set on all involved (virtual/physical) hosts, (virtual/physical) switches and OPNsense?

We can rule out DNS I guess.

September 12, 2023, 12:19:03 PM #6 Last Edit: September 12, 2023, 12:48:13 PM by garbinc
Well, you're right. I disabled the LAN rules for UDP and everything is back to normal.
I added this rule because of tailscale (which is basically a Wireguard mesh and relies solely on UDP)

https://tailscale.com/kb/1097/install-opnsense/

EDIT: I realize that I also misread the rule as it's supposed to be on WAN interface not LAN. That's a mega facepalm.

I have one running in cluster and one on router in case of cluster failure.
I blindly added the outbound rule without really thinking of the consequences and what you stated makes sense.

I guess I can try NAT-PMP?

Thanks again for your help in identifying my mistakes !


For your other comment regarding vlans, yes I'm sure they are all setup properly. :)