Update issues when upgrading from version 25.7.11 to 26.1.x

Started by LP, Today at 02:46:06 PM

Previous topic - Next topic
Hello everyone,

we have a total of 4 sites running OPNsense installations. Unfortunately, we are experiencing update issues at 3 of these sites. Here are the details:

OPNsense (virtualized)
Installed version: 25.7.11_9
Hypervisor: VMware ESXi-7.0U3w-24784741-standard
Hardware: HPE ProLiant DL325 Gen10 Plus v2 server

After updating OPNsense to version 26.1.x, the following errors occur during boot:

Startup log excerpt:

Trying to mount root from ufs:/dev/ufs/OPNsense [rw,noatime]...
Root mount waiting for: CAM
Mounting filesystems...
tinefs: soft updates remains unchanged as enabled
tunefs: issue TRIM to the disk remains unchanged as enabled
** /dev/ufs/OPNsense

...

(da0:mptt0:0:1:0): UNMAP failed, switching to WRITE SAME(16) with UNMAP BIO_DELETE
(da0:mptt0:0:1:0): UNMAP. CDB: 42 00 00 00 00 00 00 00 08 00
(da0:mptt0:0:1:0): CAM status: SCSI status error
(da0:mptt0:0:1:0): SCSI status: Check Condition
(da0:mptt0:0:1:0): SCSI sense: ILLEGAL Request asc:24,0 (invalid field in CDB)
(da0:mptt0:0:1:0): Command byte 7 is invalid
(da0:mptt0:0:1:0): Error 22, Unretryable error
g_vfs_done():ufs/OPNsense[DELETE(offset=55167287296, length=4096)]error=5

...

(da0:mptt0:0:1:0): WRITE SAME(16). CDB: 93 08 00 00 00 00 00 d8 d1 8f 00 00 00 40 00 00
(da0:mptt0:0:1:0): CAM status: SCSI Status Error
(da0:mptt0:0:1:0): SCSI status: Check Condition
(da0:mptt0:0:1:0): SCSI sense: Vendor Specific asc:80, 85 (Vendor Specific ASC)
(da0:mptt0:0:1:0): Info 0
(da0:mptt0:0:1:0): Error 5, Unretryable error
g_vfs_done():ufs/OPNsense[DELETE(offset=7275184128, lenght=32768)]error=5

Even after OPNsense has booted up, error messages appear spontaneously on the login screen:

FreeBSD/amd64                 (ttyv0)

login: (da0: pvscs i0:0:1:0): WRITE SAME(16). CDB: 93 08 00 00 00 00 02 65 58 cf 0
0 00 00 40 00 00
(da0: pvscs 10:0:1:0): CAM status: SCSI Status Error
(da0: pvscs 10:0:1:0): SCSI status: Check Condition
(da0:pvscs i0:0:1:0): SCSI sense: Vendor Specific asc:80,85 (Vendor Specific ASC)
(da0:pvscsi0:0:1:0): Info: 0
(da0:pvscsi0:0:1:0): Error 5, Unretryable error
g_vfs_done():ufs/OPNsense[DELETE(offset =20580499456, length=32768) lerror = 5
g_vfs_done():ufs/OPNsense[DELETE(offset =20580564992, length=32768) Jerror = 5
g_vfs_done():ufs/OPNsense[DELETE(offset=20580466688, length=32768) lerror = 5

Difference compared to the other site where there are no issues -> Proxmox virtualization

Research into the problem revealed that the log entries may indicate an issue with the TRIM command (UNMAP) in virtualized environments. The VM resides on a thin-provisioned SSD storage pool. OPNsense attempts to free up unused storage blocks, but the virtualization host or the emulated controller does not correctly interpret the SCSI commands (UNMAP / WRITE SAME). This can lead to file system errors (Error 5, Error 22).

Tried a quick fix — disable UNMAP/TRIM

echo 'vfs.unmap_enabled=0' >> /boot/loader.conf.local
Long-term configuration (host side)

If TRIM is to be used (to keep the VM disk thin):

Proxmox: Set the controller to "VirtIO SCSI single" and enable the "Discard" option for the disk.
VMware: Check that the virtual hardware version is up to date and the correct controller type (e.g., VMware Paravirtual) is selected.


So, I switched the storage controller in VMware to Paravirtual. I also updated the firmware versions for all host server components, as outdated RAID controller firmware can sometimes be the cause. I booted OPNsense in single-user mode and ran a filesystem check using the `fsck` command; the output confirmed that the "FILE SYSTEM IS CLEAN." I tested the update repeatedly.

Despite this, the errors persist when starting OPNsense after the update, and I had to roll back to a pre-update snapshot. Does anyone have any ideas regarding the cause or how to fix this? It is possible that the VM's .vmdk file is corrupted, but right now I'm a bit stumped. I'm happy to provide further information if needed. I would appreciate any help!

Best regards,
Luca

The easiest would be to prepare a new vm on zfs, install all updates and plugins, import the configuration file for the respective vm to be replaced and swap the VMs

Hi,
your thin-provisioned SSD storage pool, is that a iscsi LUN or a DAS?

Because your problem sounds more like a SCSI error. 
I wouldn't use a PVSCSI controller for a VM like this; instead, I'd use a standard LSILogic SAS controller.
That's because a PVSCSI only really shows its speed when you have a huge amount of random I/O (e.g., HANA DB). But that's not the case here.
Why don't you set up a new OPNsense VM with an LSILogic controller and import the configuration (just to test whether your SCSI errors are gone)?

Markus