2022-07-23T18:51:24 Error opnsense /usr/local/etc/rc.filter_configure: There were error(s) loading the rules: /tmp/rules.debug:806: sticky-address cannot be redefined - The line in question reads [806]: pass in quick on vlan01 route-to {( vlan02 202.202.202.202 )} sticky-address inet proto {tcp udp} from $groveseg to $Marshal_updates port $http_https keep state label "9e64a311a494a21cfdbefcba91dad3a5" # : Allow ServerSEG license check
As soon as a add WAN fail-over capability to rules, this break badly.
I can't seem to pin down exactly what is going on, my best guess is the WAN fail-over "WAN1_failover_WAN2" gateway group is just not working.
Often, I can get the issue to go away by moving the rule to the top of the interface rules, or to the end. But that doesn't always work either.
Trouble shooting steps I have tried
- Deleted the offending rule and made it again (doesn't always fix it)
- Moving the rule around for rule order (also doesn't always fix it)
- Rebooting OPNsense (definitely doesn't fix it)
- Exported the config, looked the config by hand (seems fine) - re-import the config and reboot (doesn't fix it)
- Also occurs when I take an existing rule and change the default gateway to the new WAN fail-over gateway
The WAN fail-over group looks perfect.
Any ideas anyone?
Any one have any additional troubleshooting steps I can take on this issue?
Hi
Can you try to add any (dummy?) rule without icmp-type option right above "sticky-address" rule?
IMHO it's an upstream bug and there is a typo in pfctl parser at
https://github.com/opnsense/src/blob/47fa565344c38a0d104e3913920546f07d7807e0/sbin/pfctl/parse.y#L4086
should check pool_opts.marker for POM_STICKYADDRESS (not filter_opts.marker)
* I understand that this part of the code has been there forever ;) but I have seen complaints about the pfctl behavior with rules with various icmp-type/sticky-address combinations before
pool options is parsed BEFORE filter options (and before previous rule filter options 'bzero'-ed) so actually rule with "sticky-address" is checked against previos rule filter_opts.marker (in this case icmp-type option from previous rule interferes with sticky-address pool option in current rule :o )
@franco could you check my theory when you have time please? (im too bad at C)
I'm hoping to look at this tomorrow. Crunched the OpenBSD code for this a few days ago and it's also the way it is in FreeBSD. Something is off, but might not be the variable in use here.
Cheers,
Franco
thanks!!
pass in quick on vlan01 route-to {( vlan02 202.202.202.202 )} sticky-address inet proto {tcp udp} from $groveseg to $Marshal_updates port $http_https keep state label "9e64a311a494a21cfdbefcba91dad3a5" # : Allow ServerSEG license check
What sticks out to me is that "sticky-address" doesn't seem to follow a keyword as per grammar...
pooltype = ( "bitmask" | "random" |
"source-hash" [ ( hex-key | string-key ) ] |
"round-robin" ) [ sticky-address ]
Could it be that we emit it falsely here and it should be inside the multiple route-to conditional to follow round-robin keyword?
https://github.com/opnsense/core/blob/e5006e9e441ac19dcf04b389737fe6f4bfdf4108/src/opnsense/mvc/app/library/OPNsense/Firewall/Plugin.php#L149-L154
Cheers,
Franco
I was also surprised at first. but because this is the default pool type afaik and the parser accepts this syntax (and obviously did not affect the error), then I did not dig deeper ...
There is definitely an issue somewhere. Maybe here even we have the parser not clearing the previous structure because the keyword triggering the struct clear was omitted?
So in that theory prepending "round-robin sticky-address" would fix this, but it also means we can just use that in our code and move on.
Cheers,
Franco
QuoteSo in that theory prepending "round-robin sticky-address" would fix this
Unfortunately no
i used
pfctl -onone -vv -nf /tmp/test_rules.debug > /tmp/out.txt
with various rules combinations (including "correct" rules with "round-robin") in test_rules.debug
(without any opnsense rules. just synthetic rules for checking the parser)
for me "icmp-type" and "sticky-address" is definitely an error triggers in pfctl parser.
did not find any reason to think that the matter is in the opnense code
Can you share your canonical test code?
I copied the rule over and tried to reproduce but failed so far.
Cheers,
Franco
sure
attached
Ok this is weird. It only fails with ICMP rule in front not setting sticky-address at all. So can you spot the issue? :)
root@sensey:/usr/src/sbin/pfctl # git grep 0x02
parse.y:#define FOM_ICMP 0x02
parse.y:#define SOM_MAXMSS 0x02
parse.y:#define QOM_SCHEDULER 0x02
parse.y:#define POM_STICKYADDRESS 0x02
pfctl_osfp.c:#define T_MSS 0x02 /* Allow MSS multiple */
pfctl_parser.h:#define PF_OPT_DEBUG 0x0200
pfctl_parser.h:#define PFCTL_FLAG_FILTER 0x02
It means there is two issues, not one.
Cheers,
Franco
if i understand the code right, a see that:
1) POM_STICKYADDRESS = FOM_ICMP
2) at
https://github.com/opnsense/src/blob/47fa565344c38a0d104e3913920546f07d7807e0/sbin/pfctl/parse.y#L4086
POM_STICKYADDRESS is checked against filter_opts.marker. so if filter options contains ICMP TYPE then STICKYADDRESS in pool options will trigger error (I'm pretty sure it should be checked against pool_opts.marker. because POM_STICKYADDRESS is a part of pool optons, not filter options)
3) filter options is bzero-ed on next rule parsing. so if pool options is parsed before filter options (and it looks like it is: i tried adding extra "yyerror" in different places and remaking pfctl) then if (filter_opts.marker & POM_STICKYADDRESS)
can be made with filter_opts.marker of previous rule
Yep, but it looks like you were initially correct about missing the read to pool_opts.marker instead of filter_opts.marker.
It's a bit astonishing the bug appears in OpenBSD to this day and was made all the way back in 2003 when the feature was added:
https://github.com/openbsd/src/commit/fd77740701#diff-86c043ba8dbd8f91c48ffba7fce3e2da260fe937b5ed1f053fd4eaa090a3d467R2718
It's clearly an error as POM_ and FOM_ suffix amongst others indicate that these are not to be shared over different structs.
Furthermore "sticky-address" appears to be parsed but ignored if no pool option is set (like round-robin). In the case where only one router is given all connections are sticky to that one anyway. So we better move the sticky-address into the conditional for consistency.
And now comes the kicker: the original author simply wanted to prevent syntax such as "sticky-address sticky-address" which is a condition the parser can reach being able to set different pool settings, not just main pool option which cannot be redefined anyway.
Actually because of the typo you can insert "sticky-address sticky-address sticky-address" without issue at the moment. ;)
Great spot!
Cheers,
Franco
;D
Hooray! could you please do the rest of the corrective steps? (I'm afraid that with my knowledge of C I would still need help with the request :( )
We only need https://github.com/opnsense/src/commit/52a6acf15e44 and https://github.com/opnsense/core/commit/8f8449ebe8 for consistency.
Cheers,
Franco
And FreeBSD review here https://reviews.freebsd.org/D36050
Wow! Thanks!!
Thanks so much guys!
This has been doing my head in the error reloading rules.
What version will include a fix for this?
It will be fixed in 22.7.2.
Cheers,
Franco