OPNsense 26.1.4 VLAN odd behavior

Started by Shoresy, Today at 02:06:03 PM

Previous topic - Next topic
I'm troubleshooting an internal DNS issue on OPNsense 26.1.4 / FreeBSD 14.3.
Setup
  • one client VLAN
  • one services VLAN
  • internal DNS servers are on the services VLAN
  • client VLAN has explicit allow rules to both DNS servers on TCP/UDP 53
  • hardware offloading is disabled:
  • checksum offload
  • TSO
  • LRO
  • VLAN hardware filtering
Problem
A client on the client VLAN repeatedly sends DNS queries to two internal DNS servers. The DNS servers reply immediately, but the client keeps retrying the same queries as if it never receives the responses.
Packet capture results
I captured on:
  • the client VLAN interface
  • the services VLAN interface
  • the parent LACP interface
  • the individual LACP member interfaces
What I see:
  • On the services VLAN interface, I see the full exchange:
  • client -> DNS query
  • DNS server -> client reply
  • On the client VLAN interface, I only see:
  • client -> DNS query
  • I do not see the matching reply
  • On the parent trunk interface, I see:
  • VLAN-tagged DNS frames from the client MAC to the firewall MAC
  • but I do not see matching VLAN-tagged reply frames from the firewall MAC back to the client MAC
  • On the LACP member interfaces, this client's VLAN traffic consistently hashes to a single member (

    igc4), and on that physical member I still only see the client's DNS requests, not the return replies
Rule/state details
pfctl -vvsr shows the DNS traffic is matching the intended explicit client VLAN DNS allow rules, specifically the VLAN interface rules for UDP/53 to each DNS server.
pfctl -vvss shows states being created for both directions of the flow.
So this does not appear to be:
  • the wrong rule matching
  • a forced gateway/policy route on the client VLAN rule
  • a hardware offload artifact
Additional notes
  • clearing the affected client's DNS states did not fix the problem
  • pfctl -si shows a nonzero 

    state-mismatch counter, though not a huge value
Question
Has anyone seen OPNsense / FreeBSD behave like this:
  • correct VLAN DNS rule matches
  • pf state exists
  • reply packet is visible arriving from the services VLAN
  • but the packet does not appear to be emitted back onto the client VLAN / parent trunk / physical LACP member carrying the flow
I'm trying to determine whether this points to:
  • pf/state handling
  • VLAN forwarding inside OPNsense
  • LACP/VLAN interaction
  • or a known packet capture visibility quirk
OPNsense 26.1.x-amd64
Intel(R) Celeron(R) N5105CPU @ 2.00GHz
Intel I226-V 2.5Gbe ports x6
16GB DDR4 RAM
256GB NVMe SSD
Dual WAN 1Gb symmetrical Fiber + 1Gb Cable

Check the client VLAN interface settings. Maybe you've stated a wrong mask, so that the concerned client is outside of its subnet.

The VLANs are intentionally separate routed /24 networks. The problem is not that the client can't identify its own subnet; the problem is that DNS replies are visible arriving on the services VLAN but are not visible leaving back toward the client VLAN.
OPNsense 26.1.x-amd64
Intel(R) Celeron(R) N5105CPU @ 2.00GHz
Intel I226-V 2.5Gbe ports x6
16GB DDR4 RAM
256GB NVMe SSD
Dual WAN 1Gb symmetrical Fiber + 1Gb Cable

Well, and a possible reason for this behavior could be, that the network mask is set wrongly for the client VLAN interface on OPNsense.

"client VLAN" is, how you called the network in your initial post.

Today at 10:00:53 PM #4 Last Edit: Today at 10:08:06 PM by Shoresy
I checked the live interface config on OPNsense. The client VLAN interface is 192.168.20.1/24 (255.255.255.0), which matches the intended subnet. So the client VLAN interface mask on OPNsense does not appear to be incorrect.

The client on the VLAN network is getting a valid IP/subnet mask/gateway/DNS server assignment as well. Appreciate the input so far.
OPNsense 26.1.x-amd64
Intel(R) Celeron(R) N5105CPU @ 2.00GHz
Intel I226-V 2.5Gbe ports x6
16GB DDR4 RAM
256GB NVMe SSD
Dual WAN 1Gb symmetrical Fiber + 1Gb Cable

Did you state a gateway on any of the involved interfaces by any chance?

No explicit gateway is set on the involved VLAN20 (client)/VLAN40 (Services) DNS rules. The VLAN20 and VLAN40 interfaces are just the local routed interfaces (192.168.20.1/24 and 192.168.40.1/24). So this does not appear to be caused by a gateway being attached to the DNS pass rules.
OPNsense 26.1.x-amd64
Intel(R) Celeron(R) N5105CPU @ 2.00GHz
Intel I226-V 2.5Gbe ports x6
16GB DDR4 RAM
256GB NVMe SSD
Dual WAN 1Gb symmetrical Fiber + 1Gb Cable

Have you enabled logging on all rules? I would expect logs to be available for state mismatches. There are lots of ways to (generally inadvertently) stop traffic flow (e.g. using reply-to incorrectly, using "bind states to interface" when you have alternate paths, etc.) - a state mismatch might offer a clue. Say... can you ping the client from the firewall? I didn't see that noted.


Logging is not enabled on every rule, only selectively. For the DNS path itself, the explicit VLAN20 DNS pass rules are matching, and packet captures show the DNS queries leaving VLAN20 and the replies arriving on VLAN40. I did not get a useful drop/state-mismatch-specific clue from pflog yet.I agree the nonzero state-mismatch counter may still be relevant.I have not yet tested ping from the firewall to the affected client, so that is a good suggestion and I'll check that next. Client->firewall ping was not useful because ICMP to the gateway was intentionally disabled on that VLAN during testing.
OPNsense 26.1.x-amd64
Intel(R) Celeron(R) N5105CPU @ 2.00GHz
Intel I226-V 2.5Gbe ports x6
16GB DDR4 RAM
256GB NVMe SSD
Dual WAN 1Gb symmetrical Fiber + 1Gb Cable