Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - roohoo

#1
I installed Sophos firewall to see how it fared.  For the first 15 hours, it worked perfectly, then all internet access stopped.  It had dropped the connection to my (Gigaclear) fibre modem.  Rebooting the VM had no effect.  Only physically turning off the machine and turning it back on worked.

I'm starting to think that my wildly unlikely hypotheses that something on my network - or Gigaclear's - is sending malformed packets that can kill a router might actually be the case.

I was going to try pfsense too, but I've lost my enthusiasm.

Sorry I didn't get to try your troubleshooting ideas but constantly changing my router got really old really fast and my wife and others grew most perturbed every time their connectivity was interrupted!

Thank you again.

If I decide to investigate the benefits of OPNsense again in the future, and the troubles remain the same, I'll report in on this thread.

Thank you all.
#2
I don't believe it!  I decided that I'd try reinstalling OPNsense as a VM under Proxmox so that Proxmox acted like a sort of Hardware Abstraction Layer.  I even boosted the RAM to 64GB so that OPNSense could have 16GB.

The result:  Exactly the same.  The firewall appears to keep working, with all my network's devices keeping their Internet connectivity, but the webgui is displaying nonsense - with the characteristic 20000+ days uptime (the correct uptime, as displayed in the shell, being 1 hour 45 minutes).

I'm so disappointed as I thought that this might cure my issues - with Linux drivers handing virtual hardware to OPNsense, but it wasn't to be!

#3
Thank you for the suggestions regarding time, but the system time is perfectly correct and doesn't change.  As you can see from the original screenshot I posted, it's only the webGUI reporting the uptime incorrectly.  The Current- and Last configuration change- times are spot-on.  The time reported by the date command in the shell is the same.

On each of the new machines, I have performed a clean, bare install and gone on to configure just the interfaces.  I have not imported any configurations.

I have tried a complete return to BIOS factory settings as well as curating each BIOS option individually.  Neither makes a difference.  Google Gemini, ChatGPT and Grok all told me that the problem was almost certain to be (with the Intel chips) the c-states configuration but disabling this made no difference either.

I hope to get a chance to investigate where my RAM is going to this evening...

Again, thank you all.
#4
Quote from: nero355 on April 20, 2026, 07:07:47 PMDo you have any results from that friend you were talking about who you have given one of the systems to run on his network ?

I haven't got the machine to him yet. Hopefully later this week.

Quote from: nero355 on April 20, 2026, 07:07:47 PMAt this point I hope Patrick is right and it's the Host Discovery Service messing around...

Sadly not.
#6
Quote from: Patrick M. Hausen on April 20, 2026, 08:39:56 AMDisable Automatic Discovery.
Disable Automatic Discovery.
Disable Automatic Discovery.
...

I really hoped that this was the solution, but no: Already disabled.

#7
Quote from: nero355 on April 20, 2026, 03:00:59 PM
Quote from: franco on April 20, 2026, 12:48:08 PMWhat is this thread?

20561 days uptime is roughly 56 years. 2026 minus 56 is 1970.

Congrats, fix your hardware clock.
I don't think that's it :
- Why doesn't the whole system crash ?!
- When logged in via SSH there is nothing wrong with the uptime or the time & date ?!

Quote from: franco on April 20, 2026, 01:04:37 PMYou've said a lot of things, but the math doesn't lie.
Actually he still has not told us the hardware used of all these systems except for the Ryzen one that also had issues ?!

I still think it's the widgets tripping over the Power Saving or Clockspeed Switching of the CPU or something like that...

Quote from: connervt on April 20, 2026, 12:57:04 PMHe's running a very old version.  Worked great on ARPANET.

/*grin*
LOL! NICE! ^_^


Thank you for your interest in my predicament!

System one was an HP ProDesk 400 G6 SFF with an Intel Core i5-9500 @ 3.00GHz, 8GB RAM.  When I encountered issues, I tried swapping out various components including several different sticks of RAM, singly and in pairs, and various combinations of SATA and NVME drives including SSDs and spimming disks.  I also tried different NICs including Intel x540, Intel x710, Intel i226, Intel 82571GB & Broadcom BCM5719 chipsets.

System two was a Dell Precision 3430 SFF with an i5-8500 @ 3.0GHz. 4GB RAM and a 256GB NVME drive.  I tried swapping all the same RAM sticks, drives and NICs into this one too.

System three was a home-built Ryzen 3900x-powered machine with 128GB of RAM, an M.2 nvme drive, and a Quadro video card.  Again, I swapped out the same components (RAM, drives, NICs)

All of these machines have yielded the same results with two different versions of OPNSense (25.x & 26.x).

I've been into the bios on each machine to ensure that all power-saving measures (c-states etc) are disabled.

The install is completely unmodified beyond interfaces:  clean install with one WAN interface working with DHCP and one LAN interface with a 192.168.2.1/24 static IP address.  DHCP is enabled over the range 192.168.2.50 - 192.168.2.199.  One some of my attempts I have enabled various logging options and when running on systems with lots of RAM, I have enabled logging in RAM with a maximum of 10% permitted for each of the two types.

On my current install, I have not changed anything but the interfaces.

The only constant factor with each of my attempts is my home network.  Could a faulty device sending malformed packets on my network take down OPNSense's webGUI?

If any of you are near (North) London, UK, you're very welcome to come and poke and prod my setup!

Quote from: Patrick M. Hausen on April 20, 2026, 08:39:56 AMDisable Automatic Discovery.
Disable Automatic Discovery.
Disable Automatic Discovery.
...

I shall investigate this when I get home this afternoon.  Thanks.

Quote from: Patrick M. Hausen on April 19, 2026, 06:37:28 PMWith top running type "o" for order, than "res" for resident size followed by ENTER. Other possibilities are e.g. "swap" instead of "res". Lets try to find out who is using all that memory and why.

Also with the fresh install up and running could you try to disable Interfaces: Neighbors: Automatic Discovery?

And this!

Thank you.
#8
Quote from: franco on April 20, 2026, 12:48:08 PMWhat is this thread?

20561 days uptime is roughly 56 years. 2026 minus 56 is 1970.

Congrats, fix your hardware clock.


Cheers,
Franco

The hardware clock is working fine - and has been on all three completely different systems I have tried to run OPNSense on.  The CMOS batteries had all been working fine too.
#9
Quote from: lmoore on April 19, 2026, 04:24:00 PMWhen you first sign in to the Web GUI, is the Uptime being reported correctly and does the time on your computer match OPNsense?

The next time this happens, instead of rebooting, select option 11 to restart all services.

Which time zone have you selected in OPNsense?

In your environment, where is your DNS server located?

When you SSH to OPNsense, do you use the IP address or FQDN?

The screen shot you posted on the 18th shows your memory usage at 76.5%, has it gone above this mark?


When you first sign in to the Web GUI, is the Uptime being reported correctly and does the time on your computer match OPNsense?

It is, occasionally, correct but most of the time it's >20,000 days.  OPNSense time is correct

The next time this happens, instead of rebooting, select option 11 to restart all services.

I have tied this.  Sometimes it works, most of the time it doesn't have any effect.

Which time zone have you selected in OPNsense?

Europe/London

In your environment, where is your DNS server located?

It's the OPNSense box.

When you SSH to OPNsense, do you use the IP address or FQDN?

IP address [ssh root@192.168.2.1]

The screen shot you posted on the 18th shows your memory usage at 76.5%, has it gone above this mark?

I'm not sure I trust this figure but it often goes above this.  At the moment it's 93.48% - that's nearly 19GB!  If this is correct, I put it down to either ZFS or FreeBSD making use of some free RAM.  Here's the output of a top shell command:

1568 processes:1 running, 1567 sleeping
CPU:  0.9% user,  0.0% nice,  1.6% system,  0.0% interrupt, 97.5% idle
Mem: 11G Active, 1250M Inact, 4031M Laundry, 1685M Wired, 72K Buf, 1048M Free
ARC: 760M Total, 239M MFU, 364M MRU, 32M Anon, 13M Header, 110M Other
     538M Compressed, 1384M Uncompressed, 2.57:1 Ratio
Swap: 8192M Total, 916M Used, 7276M Free, 11% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
11658 root          1  68    0    55M    25M lockf    0   0:00   5.13% php
21957 root          1  21    0    20M  6528K CPU2     2   0:02   2.54% top
  344 root       1500  68    0   657M   295M accept   0 100:31   0.46% python3.13
14678 root          4  20    0    53M  7920K kqread   3   4:17   0.23% syslog-ng
96093 root          2  37    0    24M  6424K select   3   0:00   0.13% ntpd
24195 unbound       6  20    0   134M    38M kqread   1   3:23   0.06% unbound
93316 root          1  36    0    14M  2024K bpf      1   0:00   0.05% filterlog
42146 root          1  68    0    13M  1536K select   0   0:11   0.05% dhcp6c
11732 nobody        1  20    0    15M  1816K select   5   0:29   0.04% dnsmasq
95265 root          1  20    0    28M  7752K select   1   0:17   0.01% python3.13
41669 root          1  20    0    28M  5000K select   0   0:05   0.01% python3.13
65213 _flowd        1  20    0    13M  1660K select   2   0:06   0.01% flowd
33021 root          1  20    0    32M  5428K nanslp   0   0:08   0.01% python3.13
39760 root          1  20    0    13M  1256K select   5   0:16   0.01% powerd
57172 root          1  20    0    53M    27M nanslp   1  35:15   0.00% python3.13
80715 root          1  20    0    20M  7668K select   4   0:00   0.00% sshd-session
36291 root          1  20    0    14M  1528K kqread   3   0:01   0.00% rtsold
96883 nobody        1  20    0    13M  1212K sbwait   2   0:02   0.00% samplicate
38379 root          1  20    0    14M  1508K select   1   0:01   0.00% rtsold
76958 root          1  20    0    23M  6984K kqread   4   0:08   0.00% lighttpd
34669 _dhcp         1  20    0    14M  1684K select   1   0:01   0.00% dhclient
98197 root          1  68    0    14M  1616K nanslp   4   0:01   0.00% cron
49260 root          1  20    0    69M    14M accept   4   0:01   0.00% php-cgi
91065 root          1  20    0    53M    13M accept   5   0:01   0.00% php-cgi
90927 root          1  26    0    60M  8192B accept   1   0:01   0.00% <php-cgi>
  342 root          1  68    0    29M  8192B wait     3   0:01   0.00% <python3.13>
21933 root          1   4    0    14M  1612K select   3   0:01   0.00% dhclient
 9902 root          1  20    0    69M    14M accept   0   0:01   0.00% php-cgi
96048 root          1  20    0    53M  9440K accept   2   0:01   0.00% php-cgi
82910 root          1  20    0    53M    11M accept   3   0:00   0.00% php-cgi
 4016 root          1  20    0    53M    14M accept   2   0:00   0.00% php-cgi
 3936 root          1  20    0    53M    24M accept   2   0:00   0.00% php-cgi
47989 root          1  68    0    55M    24M lockf    0   0:00   0.00% php
68226 root          1  68    0    55M    24M lockf    0   0:00   0.00% php
90853 root          1  68    0    55M    24M lockf    4   0:00   0.00% php
35933 root          1  68    0    55M    24M lockf    4   0:00   0.00% php
13691 root          1  68    0    55M    24M lockf    2   0:00   0.00% php
60305 root          1  68    0    55M    24M lockf    1   0:00   0.00% php
94602 root          1  68    0    55M    24M lockf    4   0:00   0.00% php
30761 root          1  68    0    55M    24M lockf    0   0:00   0.00% php

#11
This is what my WebGUI looks like...
#12
Quote from: Patrick M. Hausen on April 17, 2026, 09:17:39 PMAre you running OPNsense for a couple of hours and when you connect to the web UI again, it's not working?

Or are you leaving the UI open for a couple of hours? I vaguely remember the latter not working for some people. I don't know, I never use OPNsense like that. Log in, configure or check stuff, close tab.

I'm setting it running and then leaving it alone.  Nothing configured except for the interfaces and ssh access.  Every now and again I log in to see what it's up to.  I log into it from a myriad of different machines running lots of different browsers but the results are always the same.

The WebGUI currently shows an uptime of 20561 days, 16:30:07 (uptime run at the shell shows 2:10.  The WebGUI also shows "Failed to load widget" under Interface Statistics and "No services found" under Services.  The CPU, Firewall, Traffic Graph and Traffic Out graphs all appear to be showing normally.  Memory use is currently showing 63% 12828/20260 MB which seems high.

Internet access appear to be working fine for all devices on my network.

I think I'm going to try installing pfsense to see if it can handle my network.  I can't understand how the OPNSense WebGUI keeps dying on lots of different hardware, installed from different USB drives from different images downloaded at different times.  The only conclusion I can draw is that there must be a device on my network sending traffic that is fatal to OPNSense!

I an going to give one of my machines to a colleague at work who currently runs OPNSense at home.  He is going to try it on his network which should help troubleshoot further.

Thank you for all your suggestions thus far.

#13
Interestingly, this afternoon I grabbed another computer: A Ryzen 3900x-powered machine with 128GB of RAM, an M.2 nvme drive, and a Quadro video card. I installed OPNSense with a bare setup (just interfaces configured) and started it up.

Now, around two hours later, I have exactly the same issue: That's a third completely different computer used to install and run OPNSense that has the webGUI die in a few hours of running!

The only connecting factor appears to be my home network!
#14
Hi

NTP is working fine and the system time is correct.  Even after leaving the machine off for several days, the time remains correct, so the CMOS battery and motherboard time are fine!

I have searched the forum and can't find any similar complaints!  All I know is that I have tried two different computer systems and lots of different components and the outcome is always the same.  Could it be that there might be a device on my network capable of sending packets that are fatal to OPNSense's webGUI?

Thanks for the suggestion.



Quote from: chemlud on April 17, 2026, 03:02:21 PM"...and the uptime will show something like 20000 days"

System time fails? Results in all sorts of errors, as time is essential for many, many things on the interwebs. Check NTP. System board time? CMOS battery?

You are really alone with this kind of error...
#15
I have been trying to set up an OPNSense router for quite some time now but each time I get a setup that seems to die some time after between 40minutes and 18 hours.

I have tried two different computers with three different NICs (all with different chipsets at speeds of 10G, 2.5G & 1G), different amounts of RAM, made up of different DIMMS, different storage media (m.2 nvme & sata ssd).  I have tried with versions 25 & 26 of OPNSense.

Initially my network (no vlans or sub domains) was 192.168.0.0/16 but I have reconfigured it to 192.168.2.0/24 in case that was the problem.

I configure OPNSense with a single WAN ethernet connection to my fibre modem (with IP address provided by DHCP) and a single LAN connection to my network (with IP address 192.168.2.1).  I make no further configuration changes whilst trying to get everything to work.

Each time, after an amount of time (between 40 minutes & 18 hours), the webGUI starts to break: Every "display section" of the dashboard will show "Failed to load widget" and the uptime will show something like 20000 days.  Internet connectivity and DHCP services on my LAN do not seem to be affected.  I can still SSH into the server and the system time remains accurate.

I don't know what else to try!  How can I keep getting exactly the same fault on different hardware? Please, learned friends, give me the benefits of your experience and expertise!