16
20.1 Legacy Series / [BUG] inc/filter.inc: check for 'kill_states' / state flush on ruleset update
« on: July 14, 2020, 03:36:54 pm »
With no changes yet being made to "Firewall / Settings / Advanced":
if gateway monitoring is active, and any monitored gateway is down,
opnsense will flush all pfctl states ("/sbin/pfctl -Fs") whenever
firewall rulesets are updated. Such updates happen e.g. when a gateway
goes down and /usr/local/etc/rc.syshook.d/monitor/10-dpinger invokes
"configctl filter reload". But this also happens when firewall rules are
changed using the admin GUI.
Killed sessions include the current admin sessions to the firewall via
web GUI and ssh.
After the flush, TCP ACK packets from the browser (in reply to GUI
output) are no longer part of an established session and get dropped by
"Default deny rule".
The admin GUI web request actually succeeds in installing the new
firewall rules, but the GUI HTTP response will time out and the browser
will display a corresponding error message.
Perhaps this is even related to some instances of the famous "slow GUI"
problem seen with opnsense clusters?
Firewall / Settings / Advanced:
Gateway Monitoring
Kill states (default: checked)
Disable State Killing on Gateway Failure
Help text: The monitoring process will flush states
for a gateway that goes down if this box is not
checked. Check this box to disable this behavior.
The corresponding config value "kill_states" has a slightly confusing
name: kill_states=1 means states should _not_ be killed on gateway
failures.
The help text also is misleading: "flush states for a gateway" sounds
as if only states using the failed gateway were involved, while
actually each and every pfctl connection state will get flushed.
So far I have seen three possible values for "kill_states" in
/conf/config.xml:
a) default after install (20.1 ISO)
+ upgrades to 20.1.8_1: <kill_states/>
b) after unchecking: ... no kill_states entry at all ...
c) after checking again: <kill_states>1</kill_states>
Bug 1) misinterpreting "kill_states" default value
/usr/local/etc/inc/filter.inc:
128 function filter_delete_states_for_down_gateways()
129 {
...
145 if ($any_gateway_down == true) {
146 mwexec("/sbin/pfctl -Fs");
147 }
148 }
...
211 function filter_configure_sync($verbose = false, $flush_states = false, $load_aliases = true)
212 {
...
561 if (empty($config['system']['kill_states'])) {
562 filter_delete_states_for_down_gateways();
563 }
...
570 if ($flush_states) {
571 mwexec('/sbin/pfctl -Fs');
572 }
...
590 unlock($filterlck);
591 }
line 561 works for case c) ("<kill_states>1</kill_states>"), but not for
case a) ("<kill_states/>"). Perhaps array_key_exists() would be better?
Bug 2) flushing state on ruleset update
lines 561-563 happen to be in the same codepath for "gateway down" and
for "admin ruleset update"; they ought to be in the "gateway down"
codepath only.
Function "filter_configure_sync()" already has an explicit parameter
"flush_states", which is used e.g. from /usr/local/etc/rc.newwanip
(reacting to an updated WAN ip address).
I think lines 561-563 should be removed from
/usr/local/etc/inc/filter.inc, and be moved to
/usr/local/etc/rc.syshook.d/monitor/10-dpinger instead.
Regards
Matthias Ferdinand
if gateway monitoring is active, and any monitored gateway is down,
opnsense will flush all pfctl states ("/sbin/pfctl -Fs") whenever
firewall rulesets are updated. Such updates happen e.g. when a gateway
goes down and /usr/local/etc/rc.syshook.d/monitor/10-dpinger invokes
"configctl filter reload". But this also happens when firewall rules are
changed using the admin GUI.
Killed sessions include the current admin sessions to the firewall via
web GUI and ssh.
After the flush, TCP ACK packets from the browser (in reply to GUI
output) are no longer part of an established session and get dropped by
"Default deny rule".
The admin GUI web request actually succeeds in installing the new
firewall rules, but the GUI HTTP response will time out and the browser
will display a corresponding error message.
Perhaps this is even related to some instances of the famous "slow GUI"
problem seen with opnsense clusters?
Firewall / Settings / Advanced:
Gateway Monitoring
Kill states (default: checked)
Disable State Killing on Gateway Failure
Help text: The monitoring process will flush states
for a gateway that goes down if this box is not
checked. Check this box to disable this behavior.
The corresponding config value "kill_states" has a slightly confusing
name: kill_states=1 means states should _not_ be killed on gateway
failures.
The help text also is misleading: "flush states for a gateway" sounds
as if only states using the failed gateway were involved, while
actually each and every pfctl connection state will get flushed.
So far I have seen three possible values for "kill_states" in
/conf/config.xml:
a) default after install (20.1 ISO)
+ upgrades to 20.1.8_1: <kill_states/>
b) after unchecking: ... no kill_states entry at all ...
c) after checking again: <kill_states>1</kill_states>
Bug 1) misinterpreting "kill_states" default value
/usr/local/etc/inc/filter.inc:
128 function filter_delete_states_for_down_gateways()
129 {
...
145 if ($any_gateway_down == true) {
146 mwexec("/sbin/pfctl -Fs");
147 }
148 }
...
211 function filter_configure_sync($verbose = false, $flush_states = false, $load_aliases = true)
212 {
...
561 if (empty($config['system']['kill_states'])) {
562 filter_delete_states_for_down_gateways();
563 }
...
570 if ($flush_states) {
571 mwexec('/sbin/pfctl -Fs');
572 }
...
590 unlock($filterlck);
591 }
line 561 works for case c) ("<kill_states>1</kill_states>"), but not for
case a) ("<kill_states/>"). Perhaps array_key_exists() would be better?
Bug 2) flushing state on ruleset update
lines 561-563 happen to be in the same codepath for "gateway down" and
for "admin ruleset update"; they ought to be in the "gateway down"
codepath only.
Function "filter_configure_sync()" already has an explicit parameter
"flush_states", which is used e.g. from /usr/local/etc/rc.newwanip
(reacting to an updated WAN ip address).
I think lines 561-563 should be removed from
/usr/local/etc/inc/filter.inc, and be moved to
/usr/local/etc/rc.syshook.d/monitor/10-dpinger instead.
Regards
Matthias Ferdinand