Home
Help
Search
Login
Register
OPNsense Forum
»
English Forums
»
24.7 Production Series
»
snapshot auto recovery
« previous
next »
Print
Pages: [
1
]
Author
Topic: snapshot auto recovery (Read 607 times)
bman
Newbie
Posts: 7
Karma: 0
snapshot auto recovery
«
on:
October 16, 2024, 10:37:50 am »
I've recently switched to ZFS to have a snapshots.
Looking a way for automatic snapshot activation and reboot to get original state.
Do not see such option in cron section. Could that be added?
The scenario when doing remote upgrade is below:
- create new snapshot with known good state
- create cron job 20/30m in future
- run upgrade
- if I cannot log in back to box and manage it -> cron will change active snapshot and reboot
- if I can log in, just disable the cron job
The goal is auto recovery from state when vpn tunnel is down, PPPoE does not come up with new version etc.
The best would up autostart the recovery some time after boot up, anyway the cron job to specific time is sufficient, just could accept the name of snaphost to activate.
Logged
franco
Administrator
Hero Member
Posts: 17665
Karma: 1611
Re: snapshot auto recovery
«
Reply #1 on:
October 16, 2024, 01:33:50 pm »
> The goal is auto recovery from state when vpn tunnel is down, PPPoE does not come up with new version etc.
That is a top tier set of goals but much more complex than you currently realise.
Cheers,
Franco
«
Last Edit: October 16, 2024, 02:00:03 pm by franco
»
Logged
Patrick M. Hausen
Hero Member
Posts: 6832
Karma: 574
Re: snapshot auto recovery
«
Reply #2 on:
October 16, 2024, 01:49:44 pm »
You would have to rework the entire update mechanism to install into a new inactive BE, then set "boot once" for that.
The necessary mechanisms in FreeBSD are all there - I am perfectly aware that still does not make it in any way simple.
Logged
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do.
(Isaac Asimov)
franco
Administrator
Hero Member
Posts: 17665
Karma: 1611
Re: snapshot auto recovery
«
Reply #3 on:
October 16, 2024, 02:04:40 pm »
The technical framework is not an issue. The issue is what defines a working PPPoE connection and how to verify that. And how to verify the PPPoE is not simply glitching for non-OPNsense reasons. Then multiply this with every other condition that should meet (and needs custom scripting also in cases when it does not apply).
Even timers aren't very good in this regard.
The simplest means is confirming the boot works which is /etc/rc finishing its startup and it doesn't run on an arbitrary timer. A functional system, but may be not a perfect system yet. This needs proper administration anyway (snapshot revert or fix settings).
Also consider people redoing (redownloading) updates because of failsafes that may be causing false-positives. It looks worse that it probably was.
Cheers,
Franco
Logged
Patrick M. Hausen
Hero Member
Posts: 6832
Karma: 574
Re: snapshot auto recovery
«
Reply #4 on:
October 16, 2024, 02:06:47 pm »
Using the BE boot once feature one could perform the update and if all is well persist the boot environment after successful login and check.
That failing a sneaker admin could simply power cycle the box to revert to the pre update BE.
Logged
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do.
(Isaac Asimov)
franco
Administrator
Hero Member
Posts: 17665
Karma: 1611
Re: snapshot auto recovery
«
Reply #5 on:
October 16, 2024, 02:13:05 pm »
Successful login might be sensible as an additional opt-in, but obviously increases the risk of a spurious reboot causing the old one to reappear.
Someone should also think of how the 3 consecutive reboots during upgrades should be handled in this regard and what the retention policy on upgrades is.
IMHO at the end of the day you're trusting the complex code to deal with what you want more than doing a 10 minute learning curve and documentation for upgrade procedures and doing it manually in 1 minute for each upgrade to maximum effect.
Cheers,
Franco
Logged
Monviech (Cedrik)
Global Moderator
Hero Member
Posts: 1613
Karma: 176
Re: snapshot auto recovery
«
Reply #6 on:
October 16, 2024, 04:29:12 pm »
This is comparable to a junos feature called "commit confirmed"
https://www.juniper.net/documentation/us/en/software/junos/cli/topics/topic-map/junos-configuration-commit.html#id-activating-a-junos-os-configuration-but-requiring-confirmation
The one who confirms that it works is the admin who initiated the commit confirmed, and will complete the commit after they are able to reach their box again.
There is no other automatism, it will just revert automatically when the admin doesn't confirm within the timeframe.
Logged
Hardware:
DEC740
franco
Administrator
Hero Member
Posts: 17665
Karma: 1611
Re: snapshot auto recovery
«
Reply #7 on:
October 17, 2024, 08:44:06 am »
> There is no other automatism, it will just revert automatically when the admin doesn't confirm within the timeframe.
I do like the manual confirm, but not the timer. People forget, systems revert for the wrong reasons.
I still don't know if a coding solution is more elegant than doing it manually all the way. The reboot won't happen automatically too if it's stuck for whatever reason (or the timer confirms it's ok when it's not because the user could not reach the box). There's just a lot of guesswork involved.
I'm not trying to shoot this down, but I want people to understand this is hard and implementing a half-baked implementation could be a maintenance nightmare causing more work than not having it.
Cheers,
Franco
Logged
Patrick M. Hausen
Hero Member
Posts: 6832
Karma: 574
Re: snapshot auto recovery
«
Reply #8 on:
October 17, 2024, 09:00:25 am »
I also would not take "successful login" as a trigger for automatic confirmation/persistence of the update. But let me again line out how this could work in light of the "boot once" feature - even with three reboots
- create new boot environment
- update to new boot environment instead of the live system - this would be the most work to implement, IMHO
- set the BE to boot once
- reboot for the first time
- discover we are in an ongoing update process
- apply the second update step - this time to the active BE
- set the BE to boot once once again (!)
- reboot for the second time
- discover we are in the final step of an ongoing update process
- apply the last update step to the active BE
- set the BE to boot once once again (!)
- reboot
If all is well the system will be coming up, the admin can login, check everything and then
manually persist the currently active BE
!
If any of these steps fails: power cycle the system, it's back to before update state.
Logged
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do.
(Isaac Asimov)
MenschAergereDichNicht
Full Member
Posts: 108
Karma: 3
Re: snapshot auto recovery
«
Reply #9 on:
October 17, 2024, 10:09:51 am »
@Patrick:
You may not always be in a position to power cycle the system easily.
I guess the whole point of this feature is about systems that are only available remote?
So maybe one could enhance your proposal with some kind of cron based success-check. E.g. reboot if the BE is not persisted after a certain amount of time?
«
Last Edit: October 17, 2024, 12:32:51 pm by MenschAergereDichNicht
»
Logged
Patrick M. Hausen
Hero Member
Posts: 6832
Karma: 574
Re: snapshot auto recovery
«
Reply #10 on:
October 17, 2024, 10:21:40 am »
If the system hangs after an update, there are not many options apart from a watchdog that automates the power cycle.
Logged
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do.
(Isaac Asimov)
MenschAergereDichNicht
Full Member
Posts: 108
Karma: 3
Re: snapshot auto recovery
«
Reply #11 on:
October 17, 2024, 10:24:50 am »
There could be problems with the WAN (PPPoE) connection after an update but the system could still be "active". For such cases some kind of auto-reboot would be nice.
Logged
bman
Newbie
Posts: 7
Karma: 0
Re: snapshot auto recovery
«
Reply #12 on:
October 17, 2024, 10:34:43 am »
My point was about remote systems only.
I think there are 2 different scenarios:
1. own upgrade is fine -> it boots up and works, but admin cannot login back for management.
For example due to some bug, the PPPoE client does not auth, vpn tunnel is down so box not accesible from outside, or other functional problem
2. the own upgrade fail - the system does not boot up -> broken kernel, kernel panic, wrongly written data to disk etc.
My point was to resolve scenario 1.
I use something similar on Mikrotiks.
point 1 on MT:
- have a script which switch active partition and reboot
- before upgrade, I enable scheduler which run the script 10m after boot up
- if I can login, do some test with positive results, then I disable the scheduler
- if cannot login the system reverts back to backup partition and all should work as before
point 2 on MT:
- they wrote in docs that there is some fallback/recovery to next partition if system does not boot up
- I've never experienced
- I do not know how that do that, but they have too own bootloader/bios (even watchdog feature), so it can be bounded
and if system not boot up then the bios will reboot and boot next partition. Do not know.
Anyway my point was to resolve somehow the scenario 1. Idea was through some cron job.
Anyway it should be manual process - enable the cron job for revert snapshot and reboot and then admin to disable the job
if all is OK.
Nothing big to do, just have an option for controlled revert.
The point 2 can much more complicated to think about and implement.
Logged
MenschAergereDichNicht
Full Member
Posts: 108
Karma: 3
Re: snapshot auto recovery
«
Reply #13 on:
October 17, 2024, 02:25:09 pm »
> There is no other automatism, it will just revert automatically when the admin doesn't confirm within the timeframe.
> I do like the manual confirm, but not the timer. People forget, systems revert for the wrong reasons.
Maybe the timer thing could be an option that defaults to off and can be manually activated for certain use cases.
«
Last Edit: October 17, 2024, 02:28:03 pm by MenschAergereDichNicht
»
Logged
bman
Newbie
Posts: 7
Karma: 0
Re: snapshot auto recovery
«
Reply #14 on:
October 21, 2024, 04:30:37 pm »
I see that freebsd has cron with @reboot and 'at' so in theory that could be general option in cron to run after reboot and after reboot + delay.
Logged
Print
Pages: [
1
]
« previous
next »
OPNsense Forum
»
English Forums
»
24.7 Production Series
»
snapshot auto recovery