1
24.1 Legacy Series / pf routing rules randomly start being ignored till service restart
« on: March 21, 2024, 11:51:24 pm »
Every couple of months, pf randomly starts ignoring firewall rules that force traffic from our untrusted lab DMZs to use our wireguard VPN gateway, and starts routing traffic out via the our default underlying gateway. If the pf service, and ONLY the pf service, is restarted, without touching anything else, they DMZ starts routing over the VPN again. This is obviously a gigantic problem, because it means that other pf rules might be failing as well and letting in traffic that shouldn't be coming in, and letting out traffic that shouldn't be leaving.
We have opnsense configured to:
1) Group the DMZ subnet and non-DMZ canary into an alias
2) pf rule (first match) for the LAN interface to pass traffic from the DMZ alias with destination '!LAN net' to use Gateway WAN_WG_IPv4 - rule at the top, so is above the default rule for "pass LAN net -> * (any)"
3) pf rule (first match) for WAN interface to block traffic for 'DMZ alias -> !LAN net' on * (any) gateway
What commands or logs can I dump from to figure out exactly what routing and firewalls are being used/ignored when it enters this failed state, or if pf is even still alive?
The above configuration passed the following tests:
1) As normal, `curl icanhazip.com` reports the wireguard gateway ip, not the default underlying gateway ip
2) When wireguard connection is offline (via a forced configuration error and disabling in webui), no traffic passes from DMZ and no external routes are reachable from the DMZ machines
(Note: the non-DMZ canary performs a check on the route to ensure that the route is protected, if not it will notify, but doesn't have the ability to kill, since the first few times it happened we thought it was a configuration error.)
We have opnsense configured to:
1) Group the DMZ subnet and non-DMZ canary into an alias
2) pf rule (first match) for the LAN interface to pass traffic from the DMZ alias with destination '!LAN net' to use Gateway WAN_WG_IPv4 - rule at the top, so is above the default rule for "pass LAN net -> * (any)"
3) pf rule (first match) for WAN interface to block traffic for 'DMZ alias -> !LAN net' on * (any) gateway
What commands or logs can I dump from to figure out exactly what routing and firewalls are being used/ignored when it enters this failed state, or if pf is even still alive?
The above configuration passed the following tests:
1) As normal, `curl icanhazip.com` reports the wireguard gateway ip, not the default underlying gateway ip
2) When wireguard connection is offline (via a forced configuration error and disabling in webui), no traffic passes from DMZ and no external routes are reachable from the DMZ machines
(Note: the non-DMZ canary performs a check on the route to ensure that the route is protected, if not it will notify, but doesn't have the ability to kill, since the first few times it happened we thought it was a configuration error.)