OPNsense Forum

English Forums => 26.1, 26,4 Series => Topic started by: Orionrise on June 15, 2026, 07:31:10 PM

Title: Been driving me mad STAFF VLAN client DNS to gateway: queries arrive at firewall
Post by: Orionrise on June 15, 2026, 07:31:10 PM
Summary

A client on a tagged VLAN (STAFF, VLAN 20) cannot resolve DNS via the firewall's own interface IP (192.168.20.1), but the same VLAN routes to the internet fine (TCP/443/80 works, ping 1.1.1.1 works). A packet capture proves the DNS query arrives at 192.168.20.1:53 on the VLAN interface, but the firewall never sends a reply — Unbound never logs the query, and pf logs no block or pass verdict for it. DNS works perfectly for a client on the MGMT VLAN (192.168.10.0/24) using the same Unbound instance.

Disabling the firewall entirely (Firewall > Settings > Advanced > Disable all packet filtering) makes STAFF DNS work — so it is pf/packet-path related, not Unbound itself.

I have eliminated a large number of causes (listed below) and would appreciate eyes from anyone who knows igc / VLAN receive-path behaviour.

Hardware / software


Appliance: Protectli VP2420 (Intel quad-port, igc driver)
OPNsense: 26.1 Community Edition
LAN trunk: igc0, carrying all VLANs (10–80) tagged to a UniFi USW-Pro-Max-16 switch
STAFF interface: vlan02, parent igc0, VLAN tag 20, static 192.168.20.1/24
Resolver: Unbound, recursion mode (forwarding disabled), listening on All interfaces, port 53, transparent local zone
DHCP: Kea, hands out 192.168.20.1 as gateway + DNS to STAFF clients
WAN: behind an ISP router that cannot bridge (double-NAT; OPNsense WAN is 192.168.1.119)
Firmware status: OPNsense 26.1 Community Edition (config confirms community flavour — no business subscription). Running the current 26.1 line; no firmware update was offered when last checked, so believed to be up to date on the latest 26.1.x point release.


Symptoms


STAFF client gets correct DHCP lease: IP 192.168.20.50, GW 192.168.20.1, DNS 192.168.20.1
STAFF client can ping 1.1.1.1 and reach internet hosts over TCP (443/80) — confirmed in state table, NAT'd out WAN correctly
STAFF client cannot resolve any DNS name via 192.168.20.1
MGMT client (192.168.10.51) resolves through the same Unbound with no issue
Reproduced on a second, different laptop with a built-in NIC (rules out the first client's USB-Ethernet dongle)


The decisive evidence (packet capture)

Packet capture on interface vlan02 [STAFF], UDP, port 53, host filter blank (both directions). After running nslookup on the STAFF client, the capture shows only outbound queries, never a single reply:

192.168.20.50.57604 > 192.168.20.1.53: [udp sum ok] 4+ A? google.com. (28)
192.168.20.50.57605 > 192.168.20.1.53: [udp sum ok] 5+ AAAA? google.com. (28)
... (many more, all client -> 192.168.20.1:53, all [udp sum ok])

There is not one 192.168.20.1.53 > 192.168.20.50 line in the entire capture. Queries arrive with valid checksums; the firewall answers nothing.

Searching the Unbound query log for 192.168.20.50 returns no results — Unbound never receives the query. Meanwhile MGMT (192.168.10.51) queries appear and are answered NOERROR in the same log.

The firewall live/plain log filtered on 192.168.20.50 shows only the DHCP pass (UDP 68->67); there is no pass and no block verdict for the port-53 traffic. Default block AND default pass logging are both enabled.

What I have already checked / eliminated


Firewall rules (STAFF tab): "Allow STAFF DNS" pass rule for dest 192.168.20.1:53, quick, at top of automation rules. Only block targets MGMT net, not RFC1918. Confirmed correct.
Floating rules: none defined.
Outbound NAT: Automatic mode, both auto rules are Interface=WAN only — do not apply to traffic destined to the firewall's own VLAN IP.
reply-to: "Disable reply-to on WAN rules" toggled ON, states reset — no change.
Normalization / scrub: all default, no custom scrub rules.
Unbound config: Enabled, Listen Port 53, Network Interfaces = All (recommended), DNSSEC off, transparent local zone, ACL explicitly allows 192.168.20.0/24. Full stop/start performed. Reboot performed.
Hardware offloading: disabled hardware checksum / TSO / LRO, and set VLAN Hardware Filtering = Disable, rebooted — no change.
DNS over TLS forwarders: removed (recursion only).
Dnsmasq: was on port 53053/LAN only, now stopped — not involved.
Listening sockets (netstat): unbound is bound to *:53 (udp4 and tcp4) — so a valid socket exists for 192.168.20.1:53.
ARP: STAFF client correctly resolves 192.168.20.1 to igc0/LAN trunk MAC; L2 adjacency confirmed.
Config-level verification: a saved config snapshot from before the recreate confirms the old vlan02 (STAFF) was defined identically to its working siblings — tag 20, pcp 0, same parent igc0, and a minimal clean interface block (just IP 192.168.20.1/24, enabled). No malformed entry, no stray gateway/MTU/flags. The fault is not in the config.
Switch: STAFF client learned on the correct access port (Port 5) in VLAN 20; firewall uplink trunk (Port 1) carries VLAN 20 tagged. UniFi port health for Port 5 reports Cables & Power, Network Loops, Broadcasts, and Traffic Path Health all "Excellent", anomaly score 0, zero errors/discards. MAC table shows the client correctly on VLAN 20. Switch is clean.
Same parent NIC for working vs broken VLAN: MGMT (vlan01, tag 10) and STAFF (vlan02, tag 20) are BOTH children of parent igc0. MGMT resolves fine, STAFF does not — same physical NIC, same trunk, same switch, same Unbound. The only difference between working and broken is the VLAN tag itself. (Note: UniFi cosmetically labels both networks' subnet as 192.168.1.0/24 and STAFF's native VLAN as "20 (undefined)"; MGMT shows the same cosmetic labels and works, so these are not causal.)
Client: single active adapter, single default route via 192.168.20.1, correct DNS server, Wi-Fi disabled. Reproduced on a second machine with a built-in NIC.
Disable Firewall test: with all packet filtering disabled, STAFF DNS works — confirming it is pf/packet-path, not Unbound.
Full VLAN interface recreate (the big one): I deleted the STAFF interface assignment, deleted the underlying vlan02 device entirely, recreated a fresh VLAN device (auto-named vlan09, parent igc0, tag 20, Best Effort priority), reassigned it as the STAFF interface (it re-took the opt2 identifier), re-added 192.168.20.1/24, and re-added an explicit "Allow STAFF DNS to gateway" pass rule (STAFF net -> STAFF address, port 53, TCP/UDP, ordered above the inter-VLAN block). The fault is completely unchanged on the brand-new device — STAFF DNS still fails identically. This rules out a corrupted or half-bound interface: the problem follows VLAN tag 20 on igc0 regardless of how cleanly the interface is rebuilt.


The question

A UDP/53 packet is captured arriving on vlan02 destined for the firewall's own IP (192.168.20.1), with a valid checksum and a valid listening socket (*:53), yet:


it is never delivered to Unbound (no query log entry),
the firewall never emits a reply,
pf logs no pass/block verdict for it,
the same Unbound answers another VLAN (MGMT) fine,
and disabling pf entirely makes it work.


What would cause inbound UDP-to-self on one tagged VLAN to be dropped between the NIC and the listening socket, without a pf verdict, while transit traffic and a SECOND VLAN on the very same parent NIC (igc0) work normally? Both MGMT (tag 10) and STAFF (tag 20) ride igc0; MGMT resolves, STAFF doesn't. I have already deleted and recreated the entire VLAN interface from scratch (fresh device, fresh assignment, fresh IP and rules) and the fault is identical on the new device — so this is not a corrupted interface. The problem follows VLAN tag 20 on igc0 specifically. Is this a known igc / VLAN hardware-filtering / netmap interaction, and what is the fix? Would changing the VLAN tag (e.g. moving STAFF off tag 20 to a different number) be a reasonable diagnostic to try?

Any pointers gratefully received — happy to provide pfctl output, full configs, or further captures.
Title: Re: Been driving me mad STAFF VLAN client DNS to gateway: queries arrive at firewall
Post by: Orionrise on June 15, 2026, 07:47:46 PM
This is the only thing I have found after many days of fighting this. Follow-up — possible related lead (GitHub #8231)

While waiting I did some digging and came across OPNsense core issue #8231, which may be the same class of problem — I'd be grateful if anyone can confirm or rule it out.

In that report, traffic to one network on a physical interface silently stopped passing after a version/kernel change, while other VLANs on the same NIC continued to work, with the configuration provably correct. The reporter notes that inbound VLAN traffic was visible in OPNsense but tagged traffic originating from the firewall did not pass, with no rule blocking it — and that moving the VLAN to a different igc port behaved identically. They resolved it only by rolling the kernel back (running 24.7.9 / kernel 24.7.8).

That lines up closely with what I'm seeing:


queries arrive on the VLAN interface but the firewall never replies (no pf verdict, no Unbound log entry),
a second VLAN on the same igc0 parent works perfectly,
and a full delete/recreate of the VLAN interface made no difference.


Their trigger was tagged-vs-untagged on one NIC; mine is one tagged VLAN vs another on the same parent — but the underlying shape (a receive/transmit-path issue on igc that silently affects one network on a multi-VLAN interface, independent of config) seems to match.

Two questions for anyone who knows this area:


Is my symptom likely the same igc / kernel-level class as #8231, and is it present in the current 26.1 line?
The thread also mentions the netmap tunable (dev.netmap.admode) and "can't allocate llinfo" errors. I don't knowingly run an inline IPS or Sensei/Zenarmor, but is netmap on igc a plausible mechanism here, and is there a quick way to confirm whether it's intercepting on my STAFF interface?


Reference: github.com/opnsense/core/issues/8231

Happy to provide any further captures, pfctl output, or the full config. Thanks again.
Title: Re: Been driving me mad STAFF VLAN client DNS to gateway: queries arrive at firewall
Post by: newsense on June 15, 2026, 08:24:45 PM
Need to see the Services-Unbound-General page
Title: Re: Been driving me mad STAFF VLAN client DNS to gateway: queries arrive at firewall
Post by: Orionrise on June 15, 2026, 08:46:42 PM
Thanks. Here are the full Services → Unbound DNS → General settings (advanced mode on):

Enable Unbound: enabled
Listen Port: 53
Network Interfaces: All (recommended)
Enable DNSSEC Support: off
Enable DNS64 Support: off
Enable AAAA-only mode: off
Register ISC DHCP4 Leases: off
DHCP Domain Override: empty
Register DHCP Static Mappings: on
Do not register IPv6 Link-Local addresses: off
Do not register system A/AAAA records: off
TXT Comment Support: off
Flush DNS Cache during reload: on
Force SafeSearch: off
Local Zone Type: transparent

Recursion mode (no query forwarding). Access Lists allow 192.168.20.0/24 (STAFF). Listening socket confirmed on *:53 (udp4/tcp4). The firewall's own DNS Lookup tool resolves fine, and MGMT VLAN clients resolve through this same Unbound with no problem — only STAFF (VLAN 20) fails, and its queries never appear in the Unbound query log at all.
Title: Re: Been driving me mad STAFF VLAN client DNS to gateway: queries arrive at firewall
Post by: cookiemonster on June 15, 2026, 11:02:33 PM
I was going to ask about Network Interfaces: All (recommended) but that is set correct.
At this point I wonder if you have one of those obscure cases of VLANs mixing traffic with tagged and untagged. Can you check for that? What we're looking for is the setup, where your trunk to OPN has ONLY tagged traffic.