GUI crashed? SSH unavailable. Can I restore from backup?

Started by maxxell, January 07, 2025, 05:04:52 PM

Previous topic - Next topic
Hello!

My OPNSense instance is half-crashed...?  I noticed my HomeAssistant plugin stopped working, and when I visit the OPNSense webgui, only the Announcements Widget on the Lobby: Dashboard is working. Everything else spins/waits for data, then gives up with "Failed to load widget".

I am pretty sure I am on the latest firmware, though I dont see anywhere in the only-mostly-functional webgui to confirm that.  (Home Assistant still shows OPNSense as 24.7.11_2 firmware even though the plugin is no longer working)

When I visit the plugins page, I see "os-ddclient (missing)" with a bunch of N/A's. If I try to install the missing plugin, I again jump to the Updates page where the circle spins but nothing really happens. If I try the "automatic resolver", or the "reset all conflicts" options, again I wind up at the Updates page where the circle spins but nothing really happens.  I suspect this is for my use of duckdns, but dont know why it's suddenly missing.  It's been working fine for years.

The only way to get anything interesting to happen is to hit the "Check for updates" button on the Status page. That bounces me to the Updates page, but I eventually get "No previous action log found" and nothing else happens.

The webgui works well enough for me to turn SSH back on (i dont keep it enabled), but ssh connection attempts time out. 

I was able to pull a configuration backup and am wondering whether I should just restore from backup at this point?  Any other ideas/suggestions?


Using your backup might as well be a fallback in case you can't fix your install.
I'd start with interface diagnostics (checking WAN, LAN, DNS...). Then look at the FW logs to see why SSH is blocked and fix that.

I am open to attempting diagnostics, but my GUI doesn't seem capable.  Nothing under Interfaces/Diagnostics gives me any interesting information.  They all seem to go to "No results found!" (I checked ARP Table, got no results.  DNS Lookup doesn't do anything.  NDP Table says no results.  Netstat all 6 options are all blank.   I tried a Trace Route and just got a blank response.)  I am open to alternative suggestions for further diagnosis.

If I go with the 'fallback' restore from backup, what do I lose?  I have Wireguard settings, DuckDNS tracking my public IP, and some static IP addresses.  All of that would come back with a restore from backup, right?

Broken disk/SSD and consequently corrupt installation? If you can log in via SSH, does "dmesg" show anything helpful?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Per for to last paragraph in initial message, the OP has no SSH access.

I'm actually a bit confused about state, because the OP claims diagnostics are all broken, yet he seems to have enough connectivity to access the GUI...
Console access with screen and keyboard?

I haven't taken the time to connect a keyboard/mouse and screen.  Should I do so?  It's a mild pain to do it.  I might sooner restore from backup if the group thinks that will get me functional again.

If you haven't done that, it means you actually accessed the web GUI from another machine.
You have some connectivity IP connectivity, likely enough to make forward progress (e.g. enable SSH, test WAN, fix DNS).

Yes, can confirm.  I can log into webgui from a laptop by visiting the 192.168.x.x ip address for the device.  But even after doing so, and ensuring that ssh is enabled (which I dont leave on by default), I still cant get into ssh.  My attempts to do so just time out.

I relatively-recently used ssh to get into the router to install the home assistant plugin, so I know how to enable it.  That plugin seemed to work well, but may well be the underlying cause of this problem.  I dont know, that's why I am asking for diagnostic suggestions before doing a restore from backup.

Do the firewall rules on LAN permit SSH access to the OPNsense box? If the PC you are using for UI access is connected to a different interface then the rules on that one? Is SSH listening to the standard port (22)? What is the listen interface set to? Anything but "All (recommended)"?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

I don't see anything in my Firewall rules that would prevent me from accessing the box. And the same pc that's reaching it by web is the one trying via ssh.

Beyond that, according to the webgui as of now, ssh is enabled. It's allowed over LAN. It's listening to 22. Password-based login is enabled.

All of which is semi-recently (a month ago or so) arranged so I could ssh in for that home assistant plugin.  When that project concluded all I did was disable ssh entirely. Now re-enabling it isn't helping for some reason.

Other suggestions?

You ought to be able to locate the rule that enables SSH (nothing that would prevent it is not good enough when deny all is the default).
For that matter, as you attempt to ssh in, with logging enabled for default rules, you should see a pass or fail in the live view (filter on dst_port = 22 if too noisy).

As Patrick mentioned it, the interface that's relevant is the one the PC is connected to.

The PC is on LAN. Wouldn't the default "LAN to any" rule allow the ssh?

Quote from: EricPerl on January 08, 2025, 08:34:45 AMYou ought to be able to locate the rule that enables SSH (nothing that would prevent it is not good enough when deny all is the default).
For that matter, as you attempt to ssh in, with logging enabled for default rules, you should see a pass or fail in the live view (filter on dst_port = 22 if too noisy).

As Patrick mentioned it, the interface that's relevant is the one the PC is connected to.

Can you confirm where to view the live view?

Nevermind, I found Firewall / Log Files / Live View. 

But nothing's happening.  I have Auto refresh enabled, and I hit the refresh button.  Even before applying the filter, I see ZERO activity.

I tried an SSH connection with this page up.  The connection timed out, but the log still showed nothing. 

I am not seeing anything online that talks about this kind of problem.  I'm also not getting much from reddit or this forum.  Should I just restore from backup?  Anything I should know, like "definitely dont select this option that's going to ensure the problem isn't solved"?