updates never finish

Started by securid, June 09, 2024, 04:45:54 PM

Previous topic - Next topic
I have one install running in KVM (on Arch Linux if that matters) that has always updated fine but the last time when I clicked to check it said waiting for another process to finish, a little bit later it started to update but it seemed really, really slow. Eventually all it did was just output dots and never finished. I kind of forgot about that and updated the Arch Linux server and rebooted. Luckily, opnSense did come back up but it is still not updating properly. It still seems like it is really slow for some reason even though I don't think that's the root cause.

From the CLI it says this:

root@opnsense:~ # opnsense-update -c
root@opnsense:~ #
root@opnsense:~ # opnsense-update
Nothing to do.
root@opnsense:~ # opnsense-update -p
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Checking for upgrades (0 candidates): 100%
Processing candidates (0 candidates): 100%
Checking integrity... done (0 conflicting)
Your packages are up to date.
Checking integrity... done (0 conflicting)
Nothing to do.
Checking all packages: 100%
Nothing to do.


But the GUI is still outputting dots ... its working on something.


***GOT REQUEST TO UPDATE***
Currently running OPNsense 24.1.8 at Sun Jun  9 14:26:19 CEST 2024
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Checking for upgrades (0 candidates): . done
Processing candidates (0 candidates): . done
Checking integrity... done (0 conflicting)
Your packages are up to date.
Checking integrity... done (0 conflicting)
Nothing to do.
Checking all packages: .......... done
Nothing to do.
Nothing to do.
Starting web GUI...done.
Generating RRD graphs...done.
Fetching base-24.1.8-amd64.txz: ...100 or more rows of dots ....
...


Any ideas? Do I need to run some checks to see whether something has been corrupted?

Thanks!

bump this - a friend has a OPNsense in the same situation. Proxmox QEMU VM without client tools. Updating from 24.1.5_3 there were 97 updates which took hours instead of the usual ten minutes.

Salient points in the output so far:

Checking integrity... done (0 conflicting)
Conflicts with the existing packages have been found.
One more solver iteration is needed to resolve them.
The following 113 package(s) will be affected (of 0 checked)

Fetching base-24.1.8-amd64.txz:.........................................

A week later, the dots are still writing to the screen both in the GUI and the console. I have a XML backup and the firewall is still working normally. Happy to take advice on reboot/rebuild or wait.

Bart...


Same as https://github.com/opnsense/update/issues/90 but still don't know what's going on. Usually the following works:

# killall fetch

And retry from the GUI.


Cheers,
Franco

June 12, 2024, 07:43:58 AM #3 Last Edit: June 12, 2024, 08:49:09 AM by securid
Thanks but this comes too late for me unfortunately. I should have made a snapshot  :-[

I tried the opnsense bootstrap command but the same thing happened and now its dead.

I need to reinstall and restore the backup. That will have to wait until next week.

Correction: it didn't die. I don't know how it survived but it came back up after a reboot. The problem still persists so i can have another look at it later today.

Start with a connectivity audit from the firmware page.


Cheers,
Franco

Quote from: franco on June 11, 2024, 11:54:22 PM
Same as https://github.com/opnsense/update/issues/90 but still don't know what's going on. Usually the following works:

# killall fetch

And retry from the GUI.


Cheers,
Franco

Hi  Franco, sorry to put you on the spot, but do you reckon it is likely to keep running for three weeks without intervention please?

Remote firewall troubleshooting is rather tricky  ;)

Bart...

For some reason the fetch doesn't release or it looks for a process that exited and doesn't realize it. I'm not really sure which it is, but the download is stalled either way. No need to let it run for more than a couple of hours (the dots are seconds, not bytes).


Cheers,
Franco

Thanks Franco, that's cool  8)

That reduces my fear that the process will run out of resouces and crash the system. I'll leave it for a couple weeks.

Bart...

If you have one running I'm wondering if you could help debug it?

# ps auxwww | grep fetch
# ls -lah /tmp/opnsense-fetch.*



Thanks,
Franco

To add more detail:

The wrapper script printing the dot characters loops here waiting for the actual fetch process to end.

https://github.com/opnsense/update/blob/f2a68310adc96dda4b49b928a2fe50ef7a27b974/src/fetch/opnsense-fetch.sh#L57-L62


Cheers,
Franco

I figured I could help you with those questions but my issue is gone, I'm not sure what it was to be honest. The connectivity audit showed IPv6 wasn't working properly yesterday and I ticked to prefer IPv4 over IPv6. It didn't help and updates still hanged, the connectivity audit showed no change. I left it as I had no time to spend on that, and this morning it actually worked.

I just got the kernel and core update to 24.1.8 coming from 24.1.5 and I no longer see the truncated message, it downloads, installs and reboots without delay. After the reboot another check is fast, and shows everything is up to date.

Sorry, wish I could help you with the questions they are simple enough.

June 13, 2024, 08:35:48 AM #11 Last Edit: June 13, 2024, 09:25:02 AM by bartjsmit
Quote from: franco on June 12, 2024, 04:04:03 PM
If you have one running I'm wondering if you could help debug it?

# ps auxwww | grep fetch
# ls -lah /tmp/opnsense-fetch.*

Absolutely! It will be tomorrow before I can work on it.

Hi Franco, this is the ouput of the two commands:

ps:
root    28929   0.0  0.1   12724   1524  -  Ss    4Jun24      0:00.79 daemon: fetch[28959] (daemon)
root    28959   0.0  0.3   19472   6540  -  S     4Jun24      6:21.84 fetch -a -w 1 -T 30 -q -o /var/cache/opnsense-update/24582/base-24.1.8-amd64.txz.sig https://pkg.opnsense.org/FreeBSD:13:amd64/24.1/sets/base-24.1.8-amd64.txz.sig
root    27858   0.0  0.1   13488   1992 v0  S+    4Jun24      1:14.25 /bin/sh /usr/local/sbin/opnsense-fetch -a -w 1 -T 30 -q -o /var/cache/opnsense-update/24582/base-24.1.8-amd64.txz.sig https://pkg.opnsense.org/FreeBSD:13:amd64/24.1/sets/base-24.1.8-amd64.txz.sig

ls:
-rw-------  1 root  wheel   3.3M Jun 14 14:37 /tmp/opnsense-fetch.out.rfgLhj
-rw-------  1 root  wheel     5B Jun  4 23:22 /tmp/opnsense-fetch.pid.DS1jXX

Sorry i missed the reply earlier.

> root    28959   0.0  0.3   19472   6540  -  S     4Jun24      6:21.84 fetch

So it looks as expected... the process simply isn't exiting which causes this. The global timeout is set to 30 seconds but it keeps retrying which is maybe related to "-a" option... but the scripting around it is solid IMO.


Cheers,
Franco

Thanks Franco, is your original recommendation still applicable (killall fetch)?