BUG: There were error(s) loading the rules: /tmp/rules.debug:806: sticky-address

Started by nzkiwi68, July 25, 2022, 03:23:08 AM

Previous topic - Next topic
2022-07-23T18:51:24 Error opnsense /usr/local/etc/rc.filter_configure: There were error(s) loading the rules: /tmp/rules.debug:806: sticky-address cannot be redefined - The line in question reads [806]: pass in quick on vlan01 route-to {( vlan02 202.202.202.202 )} sticky-address inet proto {tcp udp} from $groveseg to $Marshal_updates port $http_https keep state label "9e64a311a494a21cfdbefcba91dad3a5" # : Allow ServerSEG license check


As soon as a add WAN fail-over capability to rules, this break badly.
I can't seem to pin down exactly what is going on, my best guess is the WAN fail-over "WAN1_failover_WAN2" gateway group is just not working.
Often, I can get the issue to go away by moving the rule to the top of the interface rules, or to the end. But that doesn't always work either.

Trouble shooting steps I have tried

  • Deleted the offending rule and made it again (doesn't always fix it)
  • Moving the rule around for rule order (also doesn't always fix it)
  • Rebooting OPNsense (definitely doesn't fix it)
  • Exported the config, looked the config by hand (seems fine) - re-import the config and reboot (doesn't fix it)
  • Also occurs when I take an existing rule and change the default gateway to the new WAN fail-over gateway

The WAN fail-over group looks perfect.

Any ideas anyone?


Any one have any additional troubleshooting steps I can take on this issue?

Hi
Can you try to add any (dummy?) rule without icmp-type option right above "sticky-address" rule?

IMHO it's an upstream bug and there is  a typo in pfctl parser at
https://github.com/opnsense/src/blob/47fa565344c38a0d104e3913920546f07d7807e0/sbin/pfctl/parse.y#L4086
should check pool_opts.marker for POM_STICKYADDRESS (not filter_opts.marker)
* I understand that this part of the code has been there forever  ;) but I have seen complaints about the pfctl behavior with rules with various icmp-type/sticky-address combinations before

pool options is parsed BEFORE filter options (and before previous rule filter options 'bzero'-ed) so actually rule with "sticky-address" is checked against previos rule filter_opts.marker (in this case icmp-type option from previous rule interferes with sticky-address pool option in current rule  :o )

@franco could you check my theory when you have time please? (im too bad at C)



I'm hoping to look at this tomorrow. Crunched the OpenBSD code for this a few days ago and it's also the way it is in FreeBSD. Something is off, but might not be the variable in use here.


Cheers,
Franco


pass in quick on vlan01 route-to {( vlan02 202.202.202.202 )} sticky-address inet proto {tcp udp} from $groveseg to $Marshal_updates port $http_https keep state label "9e64a311a494a21cfdbefcba91dad3a5" # : Allow ServerSEG license check

What sticks out to me is that "sticky-address" doesn't seem to follow a keyword as per grammar...

pooltype       = (   "bitmask" | "random" |
            "source-hash" [ (   hex-key   | string-key ) ] |
            "round-robin" ) [   sticky-address ]

Could it be that we emit it falsely here and it should be inside the multiple route-to conditional to follow round-robin keyword?

https://github.com/opnsense/core/blob/e5006e9e441ac19dcf04b389737fe6f4bfdf4108/src/opnsense/mvc/app/library/OPNsense/Firewall/Plugin.php#L149-L154


Cheers,
Franco

I was also surprised at first. but because this is the default pool type afaik and the parser accepts this syntax (and  obviously did not affect the error), then I did not dig deeper ...

There is definitely an issue somewhere. Maybe here even we have the parser not clearing the previous structure because the keyword triggering the struct clear was omitted?

So in that theory prepending "round-robin sticky-address" would fix this, but it also means we can just use that in our code and move on.


Cheers,
Franco

QuoteSo in that theory prepending "round-robin sticky-address" would fix this
Unfortunately no
i used
pfctl -onone -vv -nf /tmp/test_rules.debug > /tmp/out.txt
with various rules combinations (including "correct" rules with "round-robin") in test_rules.debug
(without any opnsense rules. just synthetic rules for checking the parser)
for me "icmp-type" and "sticky-address" is definitely an error triggers in pfctl parser.
did not find any reason to think that the matter is in the opnense code

Can you share your canonical test code?

I copied the rule over and tried to reproduce but failed so far.


Cheers,
Franco


Ok this is weird. It only fails with ICMP rule in front not setting sticky-address at all. So can you spot the issue? :)

root@sensey:/usr/src/sbin/pfctl # git grep 0x02
parse.y:#define FOM_ICMP        0x02
parse.y:#define SOM_MAXMSS      0x02
parse.y:#define QOM_SCHEDULER   0x02
parse.y:#define POM_STICKYADDRESS       0x02
pfctl_osfp.c:#define T_MSS      0x02    /* Allow MSS multiple */
pfctl_parser.h:#define PF_OPT_DEBUG             0x0200
pfctl_parser.h:#define PFCTL_FLAG_FILTER        0x02

It means there is two issues, not one.


Cheers,
Franco

if i understand the code right, a see that:
1) POM_STICKYADDRESS = FOM_ICMP
2) at
https://github.com/opnsense/src/blob/47fa565344c38a0d104e3913920546f07d7807e0/sbin/pfctl/parse.y#L4086
POM_STICKYADDRESS is checked against filter_opts.marker. so if filter options contains ICMP TYPE then STICKYADDRESS in pool options will trigger error (I'm pretty sure it should be checked against pool_opts.marker. because POM_STICKYADDRESS is a part of pool optons, not filter options)
3) filter options is bzero-ed on next rule parsing. so if pool options is parsed before filter options (and it looks like it is: i tried adding extra "yyerror" in different places and remaking pfctl) then if (filter_opts.marker & POM_STICKYADDRESS) can be made with filter_opts.marker of previous rule

Yep, but it looks like you were initially correct about missing the read to pool_opts.marker instead of filter_opts.marker.

It's a bit astonishing the bug appears in OpenBSD to this day and was made all the way back in 2003 when the feature was added:

https://github.com/openbsd/src/commit/fd77740701#diff-86c043ba8dbd8f91c48ffba7fce3e2da260fe937b5ed1f053fd4eaa090a3d467R2718

It's clearly an error as POM_ and FOM_ suffix amongst others indicate that these are not to be shared over different structs.

Furthermore "sticky-address" appears to be parsed but ignored if no pool option is set (like round-robin). In the case where only one router is given all connections are sticky to that one anyway. So we better move the sticky-address into the conditional for consistency.

And now comes the kicker: the original author simply wanted to prevent syntax such as "sticky-address sticky-address" which is a condition the parser can reach being able to set different pool settings, not just main pool option which cannot be redefined anyway.

Actually because of the typo you can insert "sticky-address sticky-address sticky-address" without issue at the moment. ;)

Great spot!


Cheers,
Franco