Hardware RAID bootup messages: mfi0 - Unexpected sense: PD 0x

Started by caliber-it, September 15, 2023, 08:26:31 PM

Previous topic - Next topic
Hello there,
i'm fairly new to OPNsense to be honest and already tried searching the web and this forum for something related to this "Problem?" but couldn't really find something, so i'd be very thankfull for any replies. :)
I installed the latest OPNsense Image on a DELL R320 Server with a H720 Mini Raid Controller and two (brand new) Samsung Evo 870 configured in RAID 1. I installed opnsense in UEFI Mode on bare metal, with the RAID configured on the mentioned DELL controller with the latest manufacturers firmware (Not flashed in IT Mode or anything like that). I made sure to run the newest firmware on the hardware (bios, NIC's etc.). Even run the diagnostic check of the Dell server without it finding any faults.

After reading a lot here and on other places about UFS and ZFS i decided to use ZFS, since i have 48GB of RAM in this machine, which is probably way overkill for a baremetal firewall, but it's supposed to run in a data center for my servers and RAM for those server generation is cheap these days, so why not.

Unfortunately i get thousands of these bootup messages (Already got them, when booting the installation image from the usb stick), telling me:

[...]
mfi0 70012 (boot + 21s/0x0002/info) - unexpected sense: PD 01(e0x20/s1) Path 44
33221105000000, CDB: b5 01 07 fe 80 00 00 00 00 01 00 00, Sense: b/00/00
[...]


after a while of scrolling through them it boots up normally and runs (as far as i can tell) without any problems and stable but i want to be sure about that before putting it back in the rack and praying it won't cause any problems in the future.
It's not always mentioning PD 01 in those messages but also PD 00. As far as i understood, those are the physical disks on the raid controller (mfi0?).

Does anyone know more about this? Thanks a lot in advance! :)

I haven't come across these but maybe is because the way I ordinarily install disk for zfs use. I put the HBAs cards in IT mode. That is, follow the advice available at the time, to not put ZFS on top of hardware raid. I have done in the very long past. I had trouble and could only recover from reinstall and full backup restoration (this was storage data, not boot devices so it was worse).
So it might not be related and not rid of the messages but be reminded. ZFS on top of hardware raid will work but is strongly discouraged.
p.s.: https://man.freebsd.org/cgi/man.cgi?mfi  for some additional info on the driver.

Thanks for your reply and this advice.
Really interesting hearing this, because it was difficult finding reviews and articles about others running opnsense on a hardware raid. Only some threads of ZFS taking care of the redundancy in combination with HBA's in IT Mode, which i'm kinda afraid of because i'm not running it in a homelab, but in a datacenter and stability and reliability is most important for me and since those cards and their firmware are proven and built for that it seemed logical to me to not touch or change that in any way. (Even if it brings some performance disavantages as i red in comparission to the IT Mode with zfs)

But the advice with ZFS on a hardware raid is really good, i wasn't sure about that in the first place, since most of the zfs features aren't even an advantage in my case (deactivated cache on the disks, Raid controller with phyisical battery buffered cache, so the disadvantage of a corrupt system with UFS (That's what i red and what made me choose zfs in the end) shouldn't be a problem then, right?
But that reassures my first "feeling" about the filesystem. I'll reinstall it with UFS. Thanks. :)

Probably you're right and it won't remove the boot message, since it already appeared when booting from the usb drive the first time with an empty raid disk with no zfs or anything on it at this time.

Flash to IT mode, use ZFS, be happy. So called "hardware RAID" is outdated technology. ZFS beats it in about every aspect.

Refer to e.g.

https://www.truenas.com/docs/references/zfsprimer/
https://www.truenas.com/community/resources/whats-all-the-noise-about-hbas-and-why-cant-i-use-a-raid-controller.139/

Fact: the mfi driver is known to have problems. That's why it's being replaced with mrsas. But best is flashing to IT firmware and use mpr.

Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Not saying this is the case here but in my line of work where my ex employer had alongside some hefty IBM power servers with their proprietary ways,  thousands (yes that many) physical Wintel servers. Tthe engineers got used over years of use, to have these with the Battery Backed RAID cards for the local storage and OS that they freak out when hearing that there are other alternatives. They go by the "best thing for reliability that I know of", never been let down, etc. Worse was that these are for installing vsphere so sometimes 4 SAS disks or more of 480 Gb in RAID. What a waste.
A lot of the time was the support from vendor for those machines. If it wasn't setup like that, they or third party break-fix datacentre contracts wouldn't touch it. Wintel people who would not diagnose, just wanted an easy job of take out faulted disk, replace and get the RAID to rebuild. All they knew.
Madness.

That's interesting, i always thought about that as a more "experimental" janky homelab thing, to custom-flash the firmware of those parts, especially for something that crucial like the main gateway/firewall in the rack of my datacenter. But hey i'm open to learn something new, and if both of you say that's the right way to go and even more reliable (since the mfi driver tends to have problems as you say and as i can see here), then i'll give it a try.
Well, i guess i won't reinstall it with UFS on the Hardware Raid then but try it with ZFS and the IT mode again tomorrow morning.

Just out of curiosity even if it's offtopic and not opnsense related: I also run a similar setup, Dell Battery backed hardware Raid Controllers and RAID5 and sometimes 1 or 6 with mostly SATA SSD's (No SAS or Enterprise graded ones) as well. On the Raid drives i then primarily use Ext4 (Mostly because that's what i grew up with and for always-worked-so-why-improve reasons (and because i'm not a fan of btrfs which "replaced" ext4 as the suggested FS on my main distribution). Would you guys also scrap that approach and go a similar way here, meaning threw out the battery, flash IT-Mode and something like ZFS here as well?

Thanks for the replies!  :)

Hello again,
Just wanted to give a short feedback. I flashed the Raid Controller into IT Mode and reinstalled OPNsense with ZFS and mirrored setup instead of the Hardware Raid1. Works like a charm and the bootup messages (as expected from you guys) disappeared. Probably the right way to go here for anyone having the same issue. Maybe it would have worked with ignoring the messages and running UFS on top of the Hardware Raid but this solution seems more reliable now. Thanks for your encouragement to go this way. :)

This situation is specific to FreeBSD or at least to ZFS. So I would not recommend doing it by default for any other use cases. While ZFS works on Linux the support is not as well integrated as with FreeBSD and you might get a worse experience - especially if the RAID driver did not give you any trouble so far.

Whenever your primary us is to run ZFS, do not use "hardware RAID".

HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)