Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - rkubes

#1
Hello

Today I noticed a number of my services were in a stopped state, including Acme. I was able to manually restart them all except Acme. I had to go to its settings and hit apply again to get it working.

I then saw some threads on here about certain services "not auto starting" after a reboot. So I decided to try to reboot to see if that was my problem. The UI had the spinning wheel like it was rebooting, then after some time just displayed the dashboard again (not even requiring a new login). I knew this was strange so I looked at my uptime, which is about 25 days. I always reboot after upgrades, so I know for sure I also attempted a reboot last week as well.

Reading about the past CrowdSec issue, I know this is probably indicating some service is not cleanly stopping and is holding up the reboot. (Hence why some services stop but it never actually reboots.) With that said, I'm not sure how to identify which service is holding up the reboot.

Edit/Update: I was looking at the commands people ran for the CrowdSec issue, but couldn't find a process named crowdsec. So I just tried the reboot command from SSH to see if it would output any errors, but the system actually rebooted immediately. After it booted, I'm now able to also reboot from the GUI again. I'm not sure what I did to fix it.
#2
Thanks for the info. I just did another test overnight and that one worked. It's possible the other times I tested it that I happened to also update the SFTP settings and the "manual" backup covered what the scheduled backup would have done. I'll keep an eye on it going forward.
#3
25.1, 25.4 Production Series / SFTP Backup Interval
March 17, 2025, 06:10:19 AM
Are there any details on the update interval for the SFTP backup?

Since I started using it, I've only had it create backups when I save the configuration of the SFTP plugin itself and it does its self test. I'll later change some other configuration, as I assumed it'd work like the Google Drive backup where it'd create a backup overnight after some configuration change but it doesn't seem to be doing that in my case.

I believe I can create a cron job for the Remote Backup action to schedule a daily backup, but I really only want the backups created after there had been configuration changes.

Does anyone know if the SFTP backup plugin is designed to work that way? If so, what can I look at to see why it may not be triggering? And what time is it supposed to trigger (the documentation for the Google Drive backup just says "early in the morning"). Or if it's not meant to work like that, is there any advice on how I can create a script to look for configuration changes? Just take a hash of the configuration file? Or is there a "last modified" timestamp (I guess from the file system)?

Thanks.
#4
Quote from: patient0 on March 14, 2025, 06:31:38 PM
Quote from: franco on March 14, 2025, 05:31:09 PMBumpy start for all the Windows users
Mmmh, I created the key and uploaded/pasted it on a Debian machine yesterday and got the same issue. Was it supposed to only affect Windows users?

Anway, it does work now, thank you Franco!

I think due to the \r\n line ending it's assumed a Windows issue. But mine was triggered from a cert created on a Debian box and copy and pasted into Chrome running on an iPhone.

Still it's mostly fixed now and there is another commit done that fixes the issue of identity files that are already in a bad state.

Agreed overall a positive outcome and good to see a lot of interest in the feature.
#5
Quote from: franco on March 14, 2025, 09:20:24 AMHotfix was published to the main mirror now, probably takes up to a few hours for it to arrive on the others.

I installed it and the one issue is it didn't "auto correct" the existing identify file. I commented on the issue in GitHub with a fix that would probably solve that.

In the meantime, I should just have to "Edit" the Private Key to make it slightly different (or just delete the existing identify file) for it to trigger rewriting the identify file. I'm just going to generate a new Private Key anyway.
#6
Quote from: franco on March 13, 2025, 07:23:02 PMProbably the Microsoft line endings issue. We will hotfix this tomorrow.


Cheers,
Franco

This was it. I was surprised since I created the private key on a Linux box, and pasted it on an iPhone, but indeed Windows line endings got added to the saved key.

I manually edited the identity file (I realized the error told me the path to the file) to remove the line endings and I could get in from the command line. However doing the test from the GUI just put the line endings back even though I didn't edit the private key field again.

I'll wait for the hot fix and try again. Thanks.
#7
I'm working to transition from Google Drive to a locally (on my network, but not on the router) hosted SFTP server for config backups.

I created an ed25519 key pair, and for the "sftp user" on the SFTP server, I added the public key to the "authorized_keys" folder. For the private key, I copy and pasted the key text in the "SSH Private Key" field in OPNsense backup configuration page. I tried with and without the "----BEGIN..." lines, and without and without new lines.

Every time I hit save/test, I get a public key error still. My remote server (SFTP) server is already set up where I can SSH into it from the OPNsense machine, due to a different process that I use to scp certain files over. So, the "root" OPNsense account can "SSH" into User1 on my backup server. However, I cannot SFTP from "root" on OPNsense to SFTP_USER on my backup server (public key error).

One thing I'm not following is where the private key I typed into the configuration page is going. I think the issue is that SFTP isn't even pulling that private key for the keypair I set up for the SFTP server. I looked in /root/.ssh/ and the only key file is an "id_ed25519" which is the private key for the SSH connection I mentioned above.

Do I need to potentially just use that same private key I already set up internally to generate a public key for the SFTP_USER account? Or is there a specific file/path that the "private key" I'm entering into the Backups configuration page should go that I can continue troubleshooting with?
#8
24.7, 24.10 Legacy Series / Re: New Dashboard
July 26, 2024, 11:43:30 PM
Not to pile on, since it's already been reported twice, but also noting that my biggest concern since the update is the lack of ability for multiple devices to see the dashboard at once. It just doesn't handle it gracefully and individual widgets start to break down.

I also miss the configurability of the widgets. For example, the Firewall Log and the System Log I used to have something like 20+ entries configured (I typically view it on a 4K monitor). However, the currently Firewall Log only shows 5 entries and you can't expand it. The System Log only shows like 4 entries. Any time there's any noise on top of what you're looking for it makes both of them essentially useless.

I'm sure they'll continue to improve and think they have a lot of potential, but so far it's very jarring due to these issues.
#9
23.7 Legacy Series / Re: Weird Gateway/Routing Failure
January 08, 2024, 02:54:17 PM
Thanks. I checked Unbound and it is already configured to listen on all interfaces. One of the tests I did when it wasn't working was pinging IPs directly to rule out a DNS issue, and I could ping anything local, even the router itself - but could not ping anything on the other side of WAN.

Is there a service that specifically handles the routing through the gateway that I can check? Or any other configurations you suggest reviewing?

I appreciate the thought for "Solution 1", but unfortunately it is also just a work around. I could not accept an environment where unplugging and replugging the LAN cable would break access to the Internet until services on the router are restarted.

Edit:
Also, if it was an effect of services no longer listening, because the port goes down, wouldn't it be consistently reproducible? It's not every time that the port cycles that the WAN access/routing doesn't come back up, it's just more often than not. I'm not sure yet if the length of time its down has an impact.
#10
23.7 Legacy Series / Weird Gateway/Routing Failure
January 08, 2024, 05:17:39 AM
I have OPNsense installed on a device with 4 Intel 2.5GB NIC cards, and one Wi-Fi card.

One of the four NICs are used for the WAN. The Wi-Fi card is used for a Wi-Fi failover, in case WAN goes down (to connect to my hotspot). It's very rarely used, but it works correctly when needed.

The other three NICs are for three different LAN networks. All of my firewall rules are working correctly and as expected under normal circumstances.

With that said, on one of the three networks I don't always have the downstream switch powered on (usually off over the weekend). It does have a firewall rule that allows the devices on that network to reach out using the failover gateway (same as the other two networks). So, it will primarily use my main WAN, but can failover to use the Wi-Fi backup when needed.

What is odd is sometimes when I power up that network and its devices, nothing on that network will be able to reach out over the WAN. Everything internal between the LANs will work. All of the devices that are supposed to be able to route to each other over the three different LANs will all be fine. But that third LAN will not have WAN access.

I can "correct" it, by going into the firewall rules, and pick any rule, and just toggle a setting. Such as turn logging on, then turn logging back off. Once I do this the "Apply" button appears and I can hit apply and that network regains it's WAN access. I know it's some kind of issue/defect (as opposed to an incorrect or missing allow firewall rule) since I'm not actually changing a configuration to get it going, all I'm doing is making it think I changed a config so I can get it to reload the firewall.

This "workaround" will hold until the next time I power down that network for a long time. However, usually after being off for a weekend, when I get it back on, the issue occasionally (usually) presents itself again. I believe this started happening with 23.7.10, but it may have started with a release prior to that. This is definitely a "newer" issue over the last couple of months, after being stable for almost the full year of 2023.

Are there any other tips/tricks that I can use to try to diagnose exactly what is happening, and why that network does not seem to respond right away? How is it that getting the "Apply" button with no changes gets it to work? Should I just try restarting the "pf" service or some other service next time? Are there any specific logs I should review to try to identify what's going on?

Any assistance will be greatly appreciated. This isn't an "urgent" issue as I'm able to work around it. The workaround does hold as long as I keep the downstream switch powered on. However, I usually turn that whole network off over the weekend, thus it's an annoyance to have to log in and "fix" this most weeks.
#11
Thanks for pointing me in the right direction. This was a super simple change to make. I also submitted a PR on the plugins git to hopefully get the correction merged.
#12
Does anyone know what the "source" is for the "Ident" column on the SMART Status dashboard widget?

I have a USB thumb drive that has such a long "Ident" that it causes the "Status" column to get pushed behind the next column of widgets. See the attached screenshot.

I'd like to "shorten" this Ident, if it's possible. Or, preferably, not have the "da0" device show up here, since it's not SMART capable anyway. Or, as a third option, have the SMART Status widget automatically truncate the long name for display purposes. However, I didn't see any configuration available for the SMART widget itself.
#13
23.7 Legacy Series / Re: CAM Command timeout & DSM TRIM
October 28, 2023, 06:43:52 AM
There are some cheap SSDs that don't support the trim command and will get hung up on this error. For example, some (or maybe all) of the vnopn brand mini-pcs on Amazon come with SSDs that don't support it. There's documentation on their product page for how to disable the trim command in opnsense, or you might find it by just googling to disable trim in opnsense.

For the vnopn specifically, there's an SSD firmware update you can request from them. I doubt it actually implements trim, it probably just provides a dummy response so the OS doesn't get hung up.

I'm just using the vnopn brand as an example, obviously you'd need to identify your manufacturer and research accordingly. Just hoping this points you in the right direction.
#14
23.7 Legacy Series / ZFS Scrub Cron Task - Notifications?
September 17, 2023, 09:00:42 PM
My hardware is a simple single drive mini-PC, thus during a ZFS scrub, there wouldn't be any opportunity for it to correct data - just potentially identify bad data (failed checksums).

My question is does setting this Cron task provide any notifications if failures are found? Or would it just fail silently in the background and not provide benefit?

Even if there's a Monit alert that I can use in tandem that would still suffice. I just want to avoid having to log in specifically to look for errors, I'd want "proactive" alerting.

Any assistance or feedback would be greatly appreciated.

Edit:
For now, I've added a script that will run "zpool status -x" and check the output for the text that all pools are healthy. If it says that, then it returns 0, else it returns an error code.

Then I added a "custom" Monit alert for non-zero status that runs this script. This could potentially be superfluous if the scrub cron job would provide some kind of error or alert anyway.

I'm not going to mark this as "solved" yet, as I don't know if this is the right approach. I don't know the
way to simulate corruption, and don't want to spin up a test environment just to try to test the Monit service.
#15
I'm trying to get the opnsense-importer to work to test out my DR process.

When I boot from the "vga" 23.7 flash drive, it gets to the point where I interrupt to use the importer.

It then properly lists my thumb drive as device "da0"

However, when I type "da0" as the device to use, it tries to use "mount_cd9660" to mount it and of course fails - then complains that it can't find the conf/config.xml file (because it never properly mounted the USB drive).

Does anyone know why this occurs or any workarounds for it?


edit:
I was able to figure out the root issue. The USB drive I was using was formatted with a GPT table, I found I needed to reformat it as MBR and was able to get it working again.

It's odd since in the VM I was testing it was working as a GPT formatted drive, but on the actual hardware it was failing.