Corrupted opnsense installation

Started by kevindd992002, August 09, 2024, 06:25:54 PM

Previous topic - Next topic
August 09, 2024, 06:25:54 PM Last Edit: August 09, 2024, 06:43:00 PM by kevindd992002
Yesterday at around 12PM GMT+8, my whole network at home went down. I wasn't at home until now so I thought it was just an ISP issue. But then I checked my opnsense VM (proxmox hypervisor) and saw that opnsense was rebooting in an endless loop as shown here:

https://youtu.be/EovhQ0KbWnc

I forgot which version I am on but I last updated within a month or so, so it's pretty recent. I have cloud backups to my Google drive and for some reason the latest backup i see there is July 24, 2024. I don't remember making any changes since then so I guess that's fine if I need to restore from backup.

However, before restoring, I want to know what caused the issue in the first place so I'm good with troubleshooting this. I don't have a lot going there aside from having the AdGuard Home plugin running as a plugin and a couple of other plugins that I'm not yet actively using.

Thoughts?

The root cause seems to be a bad directory inode at /, also you get a warning about an unproperly dismounted root volume.

However, it could well be that the root filesystem got corrupted because it was full. I do not know how you configured your logging, but AdguardHome might be the culprit. Such as it is, you could try to start a FreeBSD single user and try to repair the filesystem or at least to verify it is full, like I think.

To get the VM up and running again, I would either rewind to an earlier Proxmox snapshot (I use this), backup or even reinstall and reload the OpnSense configuration. Probably, the latter is the best way to proceed,  because you seem to be on UFS and I would recommend ZFS.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Ahh, that makes sense.

I forgot to also mention that I forgot to take even a single snapshot of this VM! I'm new to Proxmox and opnsense (I'm a longtime pfsense user but I had enough of their BS) so forgive my noobness.

As for the reason why I chose UFS over ZFS for opnsense is because proxmox is already running on ZFS. According to my research, running a ZFS vDisk on a ZFS Proxmox OS is useless and redundant. Is this not the case?

It is the case but only if you take regular snapshots  ;) The ZFS fundament does not protect your UFS from getting corrupted by an unclean shutdown, but it gives you a chance to rewind.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Correct, ZFS on ZFS does not excatly help with performance, but OpnSense does not need disk performance and having an indestructible filesystem like ZFS is a plus - as you can see here.

Using something like cv4pve-autosnap prevents you from ever coming into the situation where you think: Damn, I wish I had done a snapshot before this. And it comes cheap if Proxmox is running on ZFS.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

I advise against ZFS on ZFS because it thwarts thin provisioning of virtual disks. Apart from that of course it works.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: kevindd992002 on August 09, 2024, 06:53:02 PM
running a ZFS vDisk on a ZFS Proxmox OS is useless and redundant. Is this not the case?

Shrug. You've seen the result with UFS... Don't use thin provisioning. Wouldn't use snapshots on the host either. Other than that, my experience with UFS has been nothing but pure nightmare; won't touch it even with 10ft pole.

Quote from: meyergru on August 09, 2024, 06:59:06 PM
Correct, ZFS on ZFS does not excatly help with performance, but OpnSense does not need disk performance and having an indestructible filesystem like ZFS is a plus - as you can see here.

Using something like cv4pve-autosnap prevents you from ever coming into the situation where you think: Damn, I wish I had done a snapshot before this. And it comes cheap if Proxmox is running on ZFS.

So ZFS on ZFS it is then. Do I still get the advantage of ZFS even if I just have one disk for Proxmox (copies = 2) and one vDisk for opnsense?

As for the thin provisioning concern, I have thin provisioning enabled on the Proxmox ZFS disk but I can't see an option for the opnsense VM vDisk so not sure what it is set to currently. No to thin provisioning on vDisks too if I want ZFS on ZFS?

I checked the size of /dev/vtbdp03 which is the freebsd-ufs partition and only 7.1G out of 54G is used.


It does not matter what the VM sees inside the virtualised enwironment. With ZFS in the VM the disk will always grow to the maximum size provisioned outside of the VM. Even if inside only a fraction is really used.

That doesn't matter much if the disk is just 30 or maybe 60 G in size. Any reasonable hypervisor host should have that much space for a VM.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

The advantage of ZFS for the OpnSense VM in itself that is is nearly indestructible even by problems in the VM itself, unlike UFS. ZFS has checksumming on all levels and is a pure transactional COW filesystem, whatever the host does.

Redundancy like raid-z2 should be handled on the Proxmox host. The thin provisioning of the VM disks there will not be effective, because of the COW in the VM guest overwrites everything at some point, but you can use "discard" for the VM disk to reuse those free blocks.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Quote from: meyergru on August 09, 2024, 09:00:28 PM
[...] but you can use "discard" for the VM disk to reuse those free blocks.
Can you give me a pointer to that? Interesting, I wonder how that should work. How does the host know which blocks are free in the guest? TRIM?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Correct. The trim command is used to discard the blocks. Under Proxmox, you can even enable SSD emulation.

BTW: If /dev/vtbdp03 is only 7.1 of 54 GByte, the maybe it was not logging that caused the problem. But I do not know which partitions are used under UFS, so do not know it that one holds the log data.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 440 up, Bufferbloat A+

Quote from: meyergru on August 09, 2024, 09:07:05 PM
Correct. The trim command is used to discard the blocks. Under Proxmox, you can even enable SSD emulation.

BTW: If /dev/vtbdp03 is only 7.1 of 54 GByte, the maybe it was not logging that caused the problem. But I do not know which partitions are used under UFS, so do not know it that one holds the log data.

Ok, I think I'm convinced to use ZFS on the VM. However, I need to have this corrupted VM running first so I can just copy the settings from the old to the new VM manually, side-by-side. Or can I use a UFS opnsense backup xml to reload the config in a ZFS opnsense?

These are the partitions in a UFS system:

https://media.discordapp.net/attachments/1271505834261614713/1271537474346287147/image.png?ex=66b7b30d&is=66b6618d&hm=8b8941d81ef22e5536dc55d384c91b787e610f4e9614211d3f68fd740733d484&=&format=webp&quality=lossless

I also tried fixing the ufs fs:

https://cdn.discordapp.com/attachments/1271505834261614713/1271540560439541952/image.png?ex=66b7b5ed&is=66b6646d&hm=7777d32610eeb1cc0f9e01dae43fe4d59b4758ec48462719151f91294be6156f&

Then tried booting normally again and I didn't see the mounting warning anymore. However, it still rebooted like before and then I'm back to square one with the mounting warning/error.