Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Ben S

#1
Glad you got it sorted.  I've created a bug report for this now https://github.com/opnsense/core/issues/9792
#2
Quote from: LemurTech on February 14, 2026, 10:09:58 PMThe order of the parameters doesn't seem to matter here:

Well, I beg to differ, because..

Quoteroot@fw01:~ # drill @127.0.0.1 -p 53053 emporia.iot.lan
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 6682
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
...

root@fw01:~ # drill -p 53053 @127.0.0.1 emporia.iot.lan
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 33100
;; flags: qr aa rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

Note the difference in flags.  You are getting different results here.  aa = Authoritative Answer because that one is hitting Dnsmasq directly.  The other is not, because the port is being ignored, and it's going via Unbound.

But yes, I have what is a broadly similar setup.

  • I am forwarding a subdomain of a real, existing domain to Dnsmasq (e.g. subdomain.example.net) rather than a .lan domain.  I've just tried with a .lan domain like yours though and it still all seems to work too.
  • Another difference in my setup may be that I'm using DoT forwarders (Quad9) instead of doing full recursion.  Potentially that makes some difference to DNSSEC validation.  But I also tried temporarily turning those off and everything still seemed to work.
  • I don't have any windows AD stuff but you're seeing problems even without that involved so I don't think that's the culprit.

I'm kind of out of ideas but it does seem like what you're trying to do should be possible.  If no-one else has better ideas you might need to turn up Unbound logging levels and see if there are any clues, especially with DNSSEC enabled, what's different between the working queries and the one which fail after 30 seconds?
#3
This looks like it's due to not having any old style rules.  I'm guessing you've migrated to new style rules, and deleted all of your old rules?  I can reproduce this by ensuring at least one old style rule exists, running step 5 of the rules migration, and then trying to delete a gateway group.  Deleting the last rule by hand doesn't cause the problem.

As a workaround until this is fixed, if you add any old style rule (even if it is disabled) then you should be able to delete the gateway group.
#4
I do a similar thing, using Unbound as the resolver, forwarding my local domain to Dnsmasq and it seems fine.

Do you have DNSSEC validation turned on Dnsmasq?  I don't in my setup, only in Unbound.  I don't have DNSSEC hardening turned on but I notice you've tried that both ways anyway.

Another thing I notice which may not be relevant is that Dnsmasq seems to be trying to forward the query, given the response mentions NXDOMAIN with a reference to root-servers.net.  In my setup I have enabled 'Do not forward to system defined DNS servers' in Dnsmasq settings.  It may not help but since you're using Dnsmasq just to serve local domains, and Unbound should be the recursive resolver, it makes sense to me to never allow Dnsmasq to do any forwarding.  If nothing else it may make it easier to diagnose what's going on since you'll know any answer from Dnsmasq is _only_ from Dnsmasq.

I notice another potential problem in your tests: doing some similar tests myself, I noticed that the port specifier must be first

This will use the specified port:
drill -p PORT @127.0.0.1 NAME
This will not, the port seems to just be silently ignored:
drill @127.0.0.1 -p PORT NAME
(I normally use dig instead, where the order doesn't seem to matter as much.)

So what you're seeing probably is Unbound behaviour changing, rather than Dnsmasq.  The fact you don't see 'aa' (authoritative answer) in the flags response is a clue that you're going via a recursive resolver and not hitting Dnsmasq directly.

tl;dr I don't know why it doesn't work, sorry!  But maybe you can at least change your drill command and be sure Dnsmasq isn't changing here.
#5
I have done part 1 here, everything seems ok so far.  I'll keep an eye on things.  I got a bit confused by whether I should see lifetimes, but I'm using "prevent release" for reasons I can no longer remember, so if I've understood correctly, I shouldn't expect to see lifetimes in that case.

I haven't done part 2, as I'm not running the development version.
#6
I'm not 100% sure of the logic but I think dhcp6c may only run if an IPv6 router advertisement is received on the WAN interface, perhaps worth checking (e.g. with tcpdump) if you're seeing those.
#7
From your last message it sounds like you have DNS problems, if you do a traceroute to an IP address do you see it routing out via the VPN?  And if you try from a machine which shouldn't route via the VPN or from OPNsense itself, do you see it routing out not via the VPN?

It might be useful to see screenshots of your VPN, gateway, NAT, and firewall settings in case anyone can see a problem, redacted as you see fit.  I have a similar setup where one network routes out via Mullvad VPN, I usually use WireGuard but I've tried OpenVPN too and both work fine with suitable gateway, NAT, and firewall settings.

Also Unbound - not sure what you'd need to change there, and since you mention DNS problems, maybe that's somewhere to start.  Are the clients behind OPNsense using OPNsense as their resolver?
#8
Are you still not seeing any DHCPv6 responses from your ISP?  I wonder if it's worth copying the DUID from your working ISP router into OPNsense > Interfaces > Settings in case their DHCP server restricts to 'known' DUIDs only somehow.  That assumes the ISP router lets you find out what the DUID is of course.
#9
It looks like the tailscale 'network' alias is a single IP on IPv4 - confirmed from Firewall > Diagnostics > Aliases and also ifconfig showing a netmask of 0xffffffff.

'any' should be safe as there should only be trusted traffic on the tailnet but you'd have to use 100.64.0.0/10 if you wanted to lock it down to tailscale IPs.  Seems like for IPv6 the interface does have a /48 mask so you'd only see this problem for IPv4.
#10
It's been a while since I set this up and tested it but I think the gateway IP should be the remote exit node's Tailnet IP, not the OPNsense Tailnet IP.
#11
In the absence of any better ideas, you could try:

sh -x /var/etc/rtsold_script.sh  em0
(replace em0 with WAN interface)

If I kill dhcp6c and run that the expected output is something like

+ [ -z em0 ]
+ grep -q '^interface em0 ' /var/etc/radvd.conf
+ [ -n '' ]
+ [ -f /var/run/dhcp6c.pid ]
+ [ -f /var/run/dhcp6c.pid ]
+ /usr/bin/logger -t dhcp6c 'RTSOLD script - Starting dhcp6c daemon'
+ /usr/local/sbin/dhcp6c -c /var/etc/dhcp6c.conf -p /var/run/dhcp6c.pid -D

If that works, and you see dhcp6c running, the question is why isn't it starting automatically.  If that doesn't work then you might have to look through logs to see if there are any errors logged.  Perhaps somehow there is something invalid in the dhcp6c config file for example.
#12
Create an issue on GitHub: https://github.com/opnsense/plugins/issues

It looks like the widget might be using the wrong attribute from the Tailscale status info but the JSON structure isn't very well documented that I can see.
#14
From a quick read of the code and confirming with my dashboard, it looks as if the interface statistics widget is sorted by total packets, descending, and the interfaces widget is just presented in the order displayed by ifconfig.  Whether they should/will change is not for me to say of course, just thought I'd answer the question of what the current order actually is.

Edit: I noticed OP is right about the descending order by interface name if the statistics widget is showing a table - the packet count descending order is for the pie chart legend, perhaps that's ordered that way to improve how the pie chart looks, or something.
#15
I don't use a custom headscale server but I think I was able to reproduce the problem of bootup stalling by just trying a fake headscale URL.  I found a crude workaround, you could see if it works for you, if you're comfortable checking code from a random stranger on the Internet is safe to use - it's a very small change: https://github.com/bensmithurst/opnsense-plugins/commit/0cbcf2d54412e2899348083ee46dd3d198e6ea3c

curl https://github.com/bensmithurst/opnsense-plugins/commit/0cbcf2d54412e2899348083ee46dd3d198e6ea3c.patch > tailscale-timeout.patch
patch -d /usr/local -p4 < tailscale-timeout.patch

Go to tailscale > settings in the UI and press apply to make it re-generate the config.  You should see the change in /etc/rc.conf.d/tailscaled

tailscaled_up_args="--timeout=10s .....

Then reboot and see if it works as expected.  This should make 'tailscale up' give up after 10 seconds during bootup and not stall completely.  If that works, it at least gives an idea of what a proper fix for the plugin might be (e.g. maybe something like I've done but made into a config option).