This makes me want to cry! WebGUI instability on all different hardware!

Started by roohoo, April 17, 2026, 01:11:29 PM

Previous topic - Next topic
Quote from: roohoo on April 20, 2026, 04:04:27 PMThe only constant factor with each of my attempts is my home network.

Could a faulty device sending malformed packets on my network take down OPNSense's webGUI?
Maybe... but it's a long shot...

You could Firewall the webGUI against all devices on your network and just allow one PC/Laptop that you know for sure is "clean" and see what happens ?!

Do you have any results from that friend you were talking about who you have given one of the systems to run on his network ?

At this point I hope Patrick is right and it's the Host Discovery Service messing around...
Weird guy who likes everything Linux and *BSD on PC/Laptop/Tablet/Mobile and funny little ARM based boards :)

Quote from: Patrick M. Hausen on April 20, 2026, 08:39:56 AMDisable Automatic Discovery.
Disable Automatic Discovery.
Disable Automatic Discovery.
...

I really hoped that this was the solution, but no: Already disabled.



Quote from: nero355 on April 20, 2026, 07:07:47 PMDo you have any results from that friend you were talking about who you have given one of the systems to run on his network ?

I haven't got the machine to him yet. Hopefully later this week.

Quote from: nero355 on April 20, 2026, 07:07:47 PMAt this point I hope Patrick is right and it's the Host Discovery Service messing around...

Sadly not.

Another thing to be mindful of-

After you've checked & fixed the hardware clock issues and made sure to replace the battery, keep in mind that NTP will still fail to sync if the system time is significantly behind the real time (I think by 1000 seconds or more).

To get NTP working in OPNsense you need to make sure to set the system time close to the actual wall clock time and then restart the NTP service.  You can use the FreeBSD 'date' command to do this.

I found this out the hard way on my Protectli that had a weak battery and NTP wouldn't sync after a power outage.

(Side note: the 'coreboot' UEFI on Protectli is very stripped down and at present doesn't have a method to set the system time, unlike the AMI BIOS, so it's left as a runtime activity for the user to set the clock.  A good CMOS battery is important.)
N5105 | 8/250GB | 4xi226-V | Community

https://www.youtube.com/watch?v=XI9NG068TwI

Quote from: roohoo on April 20, 2026, 09:20:35 PMNot enabled ☹️

Prior to my previous post, I had disabled this option too, but its effect was inconclusive.

Yesterday afternoon (my time: GMT +08:00), I re-enabled the option and left it running overnight.

Looking this morning, this is how the process appears in top (time is BST) - viewed with and without listing threads:

Quotelast pid: 13247;  load averages:  0.19,  0.30,  0.30                                                                                                                                          up 1+07:11:09  00:46:42
66 threads:    1 running, 65 sleeping
CPU:  0.3% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.6% idle
Mem: 75M Active, 101M Inact, 5424K Laundry, 466M Wired, 204M Buf, 1325M Free
Swap: 8192M Total, 189M Used, 8003M Free, 2% Inuse

  PID USERNAME    PRI NICE  SIZE    RES SWAP STATE    C  TIME    WCPU COMMAND
94669 hostd        20    0    31M  2996K  0B bpf      3  0:01  0.00% /usr/local/bin/hostwatch -c -S -p -P /var/run/hostwatch/hostwatch.pid -d /var/db/hostwatch/hosts.db -u hostd -g hostd{hostwatch}
94669 hostd        20    0    31M  2996K  0B uwait    2  0:01  0.00% /usr/local/bin/hostwatch -c -S -p -P /var/run/hostwatch/hostwatch.pid -d /var/db/hostwatch/hosts.db -u hostd -g hostd{hostwatch}


last pid: 18592;  load averages:  0.17,  0.29,  0.30                                                                                                                                          up 1+07:11:17  00:46:50
54 processes:  1 running, 53 sleeping
CPU:  0.6% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.4% idle
Mem: 75M Active, 101M Inact, 5424K Laundry, 466M Wired, 204M Buf, 1325M Free
Swap: 8192M Total, 189M Used, 8003M Free, 2% Inuse


  PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME    WCPU COMMAND
94669 hostd         2  20    0    31M  3008K   0B uwait    2   0:01   0.02% /usr/local/bin/hostwatch -c -S -p -P /var/run/hostwatch/hostwatch.pid -d /var/db/hostwatch/hosts.db -u hostd -g hostd


I've since rebooted this system and it is back to using 8GB of RAM.

Perhaps the issue could be something in the environment, like in this story - https://www.theregister.com/2026/04/17/on_call/?td=rt-4a

When you installed OPNsense on the new computers, did you restore the configuration from a file or did your reconfigure from scratch?
As far as I can ascertain, based upon these settings in OPNsense, FreeBSD will periodically write the OS time to the RTC;

 - machdep.disable_rtc_set: 0
 - machdep.rtc_save_period: 1800

There have probably been changes, but my understanding is that BSD would read the RTC at start-up and rely upon other sources for time whilst running. During shut down, the TOD would be written back to the RTC.

Perhaps you could post the current listing of /var/run/dmesg.boot.

At this time it appears you have two problems;
  • OPNsense detects an anomaly whereby the uptime is reported to be around 50 years
  • Your system is may be using excessive amounts of memory

Have you tried performing a Factory Setting of the BIOS and not making any changes to it and then installing OPNsense?

I don't have IPv6 in my environment so I can't test with this.

Did you alter anything in Services:Network Time: General?

It may be worth removing all the listed servers here so OPNsense doesn't attempt to synchronise the time from an external source.

To obtain more information about the processes in top, you could use the following in a wide screen session;

top -awo res -s 1


Thank you for the suggestions regarding time, but the system time is perfectly correct and doesn't change.  As you can see from the original screenshot I posted, it's only the webGUI reporting the uptime incorrectly.  The Current- and Last configuration change- times are spot-on.  The time reported by the date command in the shell is the same.

On each of the new machines, I have performed a clean, bare install and gone on to configure just the interfaces.  I have not imported any configurations.

I have tried a complete return to BIOS factory settings as well as curating each BIOS option individually.  Neither makes a difference.  Google Gemini, ChatGPT and Grok all told me that the problem was almost certain to be (with the Intel chips) the c-states configuration but disabling this made no difference either.

I hope to get a chance to investigate where my RAM is going to this evening...

Again, thank you all.

Well let's trace what calculates the uptime then:

https://github.com/opnsense/core/blob/9f56b9ea20d721558ae6baec07ce5adb2afbed53/src/opnsense/www/js/widgets/SystemInformation.js#L42-L43
--->
https://github.com/opnsense/core/blob/9f56b9ea20d721558ae6baec07ce5adb2afbed53/src/opnsense/mvc/app/controllers/OPNsense/Diagnostics/Api/SystemController.php#L134
--->

root@opn-dev-02:~ # configctl system sysctl values kern.boottime
{"kern.boottime":"{ sec = 1776690464, usec = 852213 } Mon Apr 20 15"}

And here without configd as wrapper:

root@opn-dev-02:~ # sysctl kern.boottime
kern.boottime: { sec = 1776690464, usec = 852213 } Mon Apr 20 15:07:44 2026

If you are using AI it should be able to go from there to see what is wrong.

EDIT: Also if that is broken you are in FreeBSD territory, since OPNsense does not rewrite such a sysctl
Hardware:
DEC740

Also very minor addition to this weird thread. Don't use .local as your domain suffix. It is reserved for mDNS (RFC 6762) and queries for it will never reach your DNS resolver. While at it also don't use .test, .example, .invalid and .localhost as they are also considered reserved per RFC 2606 and later RFC 6761.

I can imagine a lot of "weirdness" can be explained by this.

Quote from: Monviech (Cedrik) on April 21, 2026, 10:09:30 AMhttps://github.com/opnsense/core/blob/9f56b9ea20d721558ae6baec07ce5adb2afbed53/src/opnsense/mvc/app/controllers/OPNsense/Diagnostics/Api/SystemController.php#L134

Looking at this section of code and the image @roohoo has provided, which includes the Load average being reported as "N/A", would it be reasonable to say the $boottime variable contained NULL values for kern.boottime and vm.loadavg, considering the count in line 139 was not satisfied for the load averages to be represented?


What exactly is returned with the command is something that is of the highest interest. I did find a small parsing error that was fixed by a collegue while looking at them.

Pretty sure that is not related to anything of here but it was a nice catch.
https://github.com/opnsense/core/commit/293e645d89edb53c7f3fd2388740f28f5ec50346
Hardware:
DEC740

Quote from: roohoo on April 18, 2026, 06:49:10 PMThis is what my WebGUI looks like...

The other oddity in @roohoo's screen shot of OPNsense is the CPU widget which isn't listing the information about the CPU which is represented here;
https://github.com/opnsense/core/blob/9f56b9ea20d721558ae6baec07ce5adb2afbed53/src/opnsense/mvc/app/controllers/OPNsense/Diagnostics/Api/CpuUsageController.php#L42

Perhaps the high memory usage @roohoo is experiencing may all tie in with the System Information, CPU, Disk and Services widgets.


Quote from: lmoore on Today at 03:20:26 PMPerhaps the high memory usage @roohoo is experiencing may all tie in with the System Information, CPU, Disk and Services widgets.
I was also wondering what would happen if you would remove all of them ?

Maybe things will quiet down ?!
Weird guy who likes everything Linux and *BSD on PC/Laptop/Tablet/Mobile and funny little ARM based boards :)

Quote from: nero355 on Today at 03:42:01 PMI was also wondering what would happen if you would remove all of them ?

The widgets @roohoo has is stock standard. The test machine I reconfigured to use 2GB of RAM had the same widgets. I set up 5  aliases using URL IP tables, each loading the same IP set of over 3-million entries. Enabling three tables was fine, keeping in mind this is a test system doing nothing more than loading and checking for updates of these IP tables every 30-minutes, plus other house keeping tasks.

When the 4th table was loaded thrashing was observed. After several minutes it would settle down. When the IP table was being checked/updated, the level of swapping was again evident.

When the 5th table was loaded, it never seemed to settle down. I was able to disable the aliases and bring it back down to only 3 being loaded - it took a while to happen, but it did happen.

When the memory was being hammered with swapping, the Memory widget doesn't show it so much with its graph, it's just at a point in time when the system gathers the information. Using systat to show vmstat information every second, RAM usage was often on 90%+ until it settled down.

Using top and watching the Swap activity, the numbers for paging in and out were quite evident and high.

That said, the worst I experienced on the test machine were the widgets failing to retrieve data - most evident with the 5th table loaded. They would often refresh themselves after a period of time but the Uptime never went out of whack.

Hopefully, @roohoo can get to the root cause of the issue.

Quote from: Monviech (Cedrik) on Today at 02:44:13 PMPretty sure that is not related to anything of here but it was a nice catch.
https://github.com/opnsense/core/commit/293e645d89edb53c7f3fd2388740f28f5ec50346

I see. You are avoiding upsetting the apple-cart when a ":" is included in the data of a kernel state, e.g.;

dev.em.0.%location: slot=31 function=6 dbsf=pci0:0:31:6 handle=\_SB_.PCI0.GLAN

I agree, not in this case.