22.1rc1 crashes when the Statistics checkbox in an host alias is enabled

Started by html, January 24, 2022, 05:34:02 PM

Previous topic - Next topic
I enabled the "Statistics" checkbox (previously unchecked) in an host alias entry. When I apply this change the firewall crashes and reboot immediatley.

Might be the same as https://forum.opnsense.org/index.php?topic=26367.0 though it points to an error when the alias is in use and statistics are being updated as soon as the reload takes place.

In plain GUI terms it doesn't crash when we try to reproduce.


Cheers,
Franco

Created a new (unused) alias and enabled Statistics -> no problem
Created a rule with this new alias and firewall crashed immediately after applying.

Attached is a kernel dump, can't send it directly because this firewall has no internet connection.

Is this with a specific configuration import or plain configuration? Hardware or VM?


Cheers,
Franco

It is HW and configuration import from 21.7 version.

Made some other tests.
Created a new alias and enabled Statistics in the same step, save and apply. Looking under Firewall:Diagnostics:Aliases shows 0 as values. Now created a rule with this alias -> no problem. See attached file alias10.png

Now i created a new alias with disabled Statistics (which is default), save and apply. Then I enabled Statistics in a second step, saved and applied. Looking under Firewall:Diagnostics:Aliases shows strange values. Now created a rule with this alias -> the firewall crashes. See attached file alias20.png

It seems that enabling Statistics for already existing aliases is the problem.


This isn't new to the RC, aliases with stats enabled crashes the firewall on the latest stable as well, so its nothing in the beta.  I just don't ever check statistics, as it crashes...

pretty weird but ( ;)) ... it seems that
pfctl -t tablename -vT show 
itself may crash system
(quick tested making alias with local subnet, adding rule, enabling stats and running this command in shell. 21.7.7 vm)


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80d5ad9b
stack pointer         = 0x28:0xfffffe004deee030
frame pointer         = 0x28:0xfffffe004deee030
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 63236 (pfctl)
trap number = 12
panic: page fault
cpuid = 1


Certainly fun but if it's already there we need to look into it after 22.1 is out. ;)


Cheers,
Franco

Hi All!
Any changes in behavior on 22.1?
for me although after several attempts it was not possible to reproduce the system crash, the behavior remains odd:
enabling counters "on the fly" generates huge numbers in the statistics (with rapidly growing outgoing packets/bytes) even without including table in any rule.
If I understand correctly a large number of changes in pf/pfctl recently just concerns table counters.
In my opinion this is definitely an upstream issue.

killing the table before loading the new rules.debug seems to workaround the issue, but I'm not sure if it's the right way

Looks like uninitialized memory or previous stack contents of some sort. Definitely an issue with pfctl or the kernel receiving the data. We will be looking into it, but not with a high priority.


Cheers,
Franco

@franco
thanks! at least it's not crashing whole system now (and the conditions for issue recreating are quite rare imho)

I just ran into this issue as well on version 22.1.4_1-amd64. 
Enabled statistics for an Alias and boom.

I repeated my steps after rebooting and it crashed again.

Searched here and found this. Also submitted the crash.

We are aware of the issue and a fix is supposed to be in FreeBSD 13.1 which we will be shipping in 22.7. Fingers crossed. :)


Cheers,
Franco


I also just ran into this one. It was a classic curiosity killed the cat situation. Wanted to see how well one of my block rules was working and had never tried the statistics option before. Shortly after enabling I noticed the internet was down. Logged into Proxmox and saw my OPNsense vm was crashed on the console. Rebooted and disabled statistics immediately. Everything is back to normal.

It would have been nice to at least have some sort of warning in the interface until it gets fixed. Leaving a known crashing bug available for some unsuspecting user to stumble on and crash their router is not good IMO. This is the first time I've really had a critical failure with OPNsense. I really want my routers/firewalls to be as bullet proof as possible.

It's good to know a fix is coming, but maybe in the future when a bug like this is confirmed a quick warning note in the gui might be something to consider.  :)