Quote from: franco on April 09, 2025, 07:14:40 PMI'm trying to keep the scope small but I seem to have failed at that. I also checked the code and at first glance disabling the gateway should force it down so the kill state should work, but feel free to challenge me on this.
Cheers,
Franco
Hi Franco,
Thank you again for the feedback. We understand that Kill states when down might not reliably trigger on manual gateway disable actions. While that explains the need for manual state resets to enforce blocking after the fact, our primary issue seems to be that the traffic doesn't even attempt the path where the kill switch rule lives, making the state killing trigger somewhat irrelevant to the initial failure.
To demonstrate, we performed the following test using the standard Tag + Floating Rule configuration (detailed fully in my original post, currently active):
Test Steps & Evidence:
VPN Enabled: NordVPN gateway object enabled and confirmed Online in Status. Continuous ping 10.0.10.11 -> 1.1.1.1 runs successfully.
Evidence: Live Log (Screenshot_1.png) confirms ping packets Pass via LAN rule Route LAN traffic via NordVPN. State Table (Screenshot_2.png) shows expected NATted outbound state 10.5.0.2 -> 1.1.1.1 and return state.
NordVPN Gateway Manually Disabled: The gateway object was Disabled via System -> Gateways -> Configuration -> Edit -> Check Disable -> Save -> Apply.
Observation: The continuous ping 10.0.10.11 -> 1.1.1.1 continued to succeed indefinitely without manual intervention (visually confirmed in Screenshot_5.png foreground).
Evidence: Gateway Configuration page confirmed NordVPN object was Disabled (Screenshot_3.png context).
Diagnostics While Kill Switch Failing (Gateway Disabled, Ping Succeeding):
Live Log (action=pass): Confirmed ping packets were still being logged as PASSED by the LAN rule Route LAN traffic via NordVPN (Screenshot_4.png, Screenshot_5.png background), despite this rule pointing to the now-disabled NordVPN gateway.
Live Log (action=block): Showed NO blocks for the ping traffic by the Kill Switch block for NordVPN floating rule (Screenshot_6.png).
Packet Capture WAN (em0): Capture filtered for ICMP 1.1.1.1 was EMPTY (OPN1_CAPTURE2.jpg). The traffic was not attempting to leave via WAN.
Packet Capture wg0: Capture filtered for ICMP 1.1.1.1 showed continuous successful echo requests/replies (10.5.0.2 <-> 1.1.1.1) egressing/ingressing directly via the wg0 interface (Screenshot_7.png). This proves the bypass path.
State Table: Showed the initial state matching the LAN rule (Route...NordVPN) plus the anomalous second state (icmp 10.5.0.2 -> ...).
Manual State Reset: Clicking Reset state table (Firewall -> Diagnostics -> States).
Observation: The continuous ping 10.0.10.11 -> 1.1.1.1 stopped immediately (visually confirmed in Screenshot_8.png).
Evidence: Live Log (action=block) then showed ICMP packets being blocked by the Kill Switch block for NordVPN floating rule (Screenshot_10.png), confirming the rule works only after the state reset forces traffic to attempt the WAN path.
Conclusion from Test:
The evidence clearly shows that when the policy route target (NordVPN gateway object) is disabled, traffic matching the rule is not failing over to the default route (where the WAN kill switch rule lives). Instead, it seems OPNsense internally routes the traffic directly out the associated wg0 interface (as shown by the wg0 capture), effectively bypassing the gateway's disabled status and all kill switch logic until states are manually flushed.
This seems linked to the persistent 10.5.0.1 via wg0 host route which appears in netstat -rn (and WG logs) even though Disable Routes is checked in the WG Instance settings.
So, while Kill states when down might not trigger on manual disable, the bigger issue seems to be that the traffic isn't even reaching the point where state killing (or the floating block rule) on the correct failover path (WAN) can occur, due to this apparent routing override via wg0.
Is this direct egress via wg0 for traffic policy-routed to a disabled gateway expected? Could the handling of the 10.5.0.1 via wg0 route be involved?
Thanks for looking into this!
https://imgur.com/a/AYsj1vj
https://imgur.com/a/AYsj1vj
https://imgur.com/a/AYsj1vj