[SOLVED] Boot errors after upgrading to 21.1.8_1-amd64

Started by clarknova, July 08, 2021, 09:43:56 PM

Previous topic - Next topic
July 08, 2021, 09:43:56 PM Last Edit: July 09, 2021, 10:29:42 PM by clarknova
I upgraded my OPNsense from 21.1.7_1-amd64 to 21.1.8_1-amd64, but after the reboot it fails to boot fully. I connected a keyboard and monitor and saw the screen in the attached photo. I hit RETURN and got a shell. ifconfig showed my two physical interfaces, but no vlans or IP configuration. I typed 'reboot' and ended up right back at the same screen. Any idea what went wrong? I have searched for the displayed errors but didn't find anything that looked relevant.

For the sake of searchability, an excerpt of the errors seen in the photo:

Launching the init system...done.
Initializing...
Fatal error: Uncaught Error: Call to undefined function OPNsense\Base\FieldTypes\gettext() in /usr/local/opnsense/mvc/app/models/OPNsense/Base/FieldTypes/OptionField.php:51
Stack trace:
#0 /usr/local/opnsense/mvc/app/models/OPNsense/Base/BaseModel.php(221): OPNsense\Base\FieldTypes\OptionField->setOptionValues(Array)
...


Hardware is a Supermicro X7SPA-H with Intel Atom D510, 4GB RAM, Dual on-board Intel GBE. The Wireguard and FRR packages are installed.

https://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H

July 09, 2021, 12:52:03 AM #1 Last Edit: July 09, 2021, 01:00:17 AM by clarknova
I have a second firewall with nearly the same hardware (just 2GB RAM) and a similar config (they are a CARP pair). It is still running OPNsense 21.1.7_1-amd64. On this box, /usr/local/opnsense/mvc/app/models/OPNsense/Base/FieldTypes/OptionField.php has 57 lines and looks like this. I'm omitting the first 28 lines as they're just comments:
namespace OPNsense\Base\FieldTypes;

/**
* Class OptionField
* @package OPNsense\Base\FieldTypes
*/
class OptionField extends BaseListField
{
    /**
     * setter for option values
     * @param $data
     */
    public function setOptionValues($data)
    {
        if (is_array($data)) {
            $this->internalOptionList = array();
            // copy options to internal structure, make sure we don't copy in array structures
            foreach ($data as $key => $value) {
                if (!is_array($value)) {
                    if ($key === "__empty__") {
                        $this->internalOptionList[""] = gettext($value);
                    } else {
                        $this->internalOptionList[$key] = gettext($value);
                    }
                }
            }
        }
    }
}


Of course I don't know if the same file looks the same on the box that is broken. I'll be on site tomorrow and can look into that.

The problem with UFS is that sometimes you install all packages and a new kernel but after reboot the data is suddenly gone and you end up with no kernel in the worst case. In your case there is the PHP gettext package missing after upgrade.

It might also be the disk for one reason or another.

We do have the health audit for this, but it's obviously problematic when you can't get the box to boot due to these situations.

As mentioned in the other thread 21.7-RC1 has a new installer that supports ZFS as well which might be a more stable solution if the hardware is capable.


Cheers,
Franco

Quote from: franco on July 09, 2021, 08:24:45 AM
The problem with UFS is that sometimes you install all packages and a new kernel but after reboot the data is suddenly gone

lol, that does sound like a problem!

QuoteIn your case there is the PHP gettext package missing after upgrade.

It might also be the disk for one reason or another.

Fair enough. I will check the disk, but what is the recommended fix at this point? I'm thinking I could

  • Install clean and restore a recent backup
  • Boot from a live USB and attempt to restore the missing file
  • ...?
It's late here and I can't really come up with a third option. Once I have this firewall up and running I intend to upgrade its mate to 21.1.8. Is it reasonable to hope the upgrade process will be more successful on that one?

Quote from: clarknova on July 09, 2021, 08:34:05 AM
It's late here and I can't really come up with a third option. Once I have this firewall up and running I intend to upgrade its mate to 21.1.8. Is it reasonable to hope the upgrade process will be more successful on that one?

It's relatively rare to happen in general, but with common factors like same disks and hardware and same location it *might* be clustering behaviour. Sometimes during testing e.g. I get file checksum errors when the power supply or battery state is inadequate/out of spec for the devices being fed.

The most effective way is to use the 21.1 installer image, import configuration and reinstall in-place.

Though if you can manage to manually set up networking (dhclient on WAN interface) you can use the "opnsense-bootstrap" utility instead to get back to 21.1.8 quicker.


Cheers,
Franco

Quote from: franco on July 09, 2021, 10:47:20 AMif you can manage to manually set up networking (dhclient on WAN interface) you can use the "opnsense-bootstrap" utility instead to get back to 21.1.8 quicker.

Interesting. I was not aware of that utility. Is this still the best information on it?:
https://forum.opnsense.org/index.php?topic=3116.0

I don't have a DHCP service on the WAN network, but if I can configure my static on the WAN I should just be able to:

fetch https://raw.githubusercontent.com/opnsense/update/master/bootstrap/opnsense-bootstrap.sh
sh ./opnsense-bootstrap.sh


Is that the theory? I can give it a try.

Sorry for the pedestrian questions, I'm quite unfamiliar with github. The link to the bootstrap script in the forum post that I linked is invalid. Some web searches brought me to a couple of links on github, but the first one has a very old copyright stamp and the second one has a .in file ending, and I'm not sure what the significance of that is. Should I be running either of these?

https://raw.githubusercontent.com/opnsense/update/9d5ccfac89/bootstrap/opnsense-bootstrap.sh

https://github.com/opnsense/update/blob/master/bootstrap/opnsense-bootstrap.sh.in

Static is fine too with an ifconfig and resolv.conf edited.

opnsense-bootstrap is ready to use on any system in case of such failures too.

You can get the most information from the manual page:

# man opnsense-bootstrap


Cheers,
Franco

PS: I double checked and the latest instructions for opnsense-bootstrap from GitHub are there: https://github.com/opnsense/update#opnsense-bootstrap

It was recently changed so the version could be added at build time instead of having to hardcode it.

ok, I see it sitting in /usr/local/sbin/opnsense-bootstrap on the working system. Thanks, will give it a try and report.

We were able to get this firewall back on its feet with the opnsense-bootstrap script. I had remote hands support, but I believe this was the procedure he followed on the console:
ENTER (to get a shell)
ifconfig em0 inet x.y.z.171 netmask 255.255.255.192
route add default x.y.z.129
echo "nameserver 1.1.1.1" > /etc/resolv.conf
sh /usr/local/sbin/opnsense-bootstrap


At this point we got the attached error about a php conflict, so we proceeded as follows:
rm /usr/local/bin/php
sh /usr/local/sbin/opnsense-bootstrap


The script failed again, this time with the error that pkg.freebsd.org was unreachable, so we re-added the default route and DNS server and ran the script a third time:
route add default x.y.z.129
echo "nameserver 1.1.1.1" > /etc/resolv.conf
sh /usr/local/sbin/opnsense-bootstrap


This time the script completed and I was able to log into the web UI. The configuration appeared to be intact but I had to reinstall the Wireguard and FRR packages. Then I ran a firmware audit with the following result:
***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 21.1.8_1 (amd64/OpenSSL) at Fri Jul  9 11:56:16 MDT 2021
>>> Check installed kernel version
Version 21.1.8 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 21.1.8 is correct.
>>> Check for missing or altered base files
No problems detected.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: .......... done
>>> Check for core packages consistency
Core package "opnsense" has 66 dependencies to check.
Checking packages: ..................................................
pkg-1.16.3 repository mismatch: unknown-repository
Checking packages: .................. done
***DONE***


I hope this is helpful to anyone that encounters a similar problem.

The second box upgraded to 21.1.8_1 without incident.

Thanks for reporting back. I wonder why php73 got stuck there since the bootstrap process ought to remove all packages first, but in that particular case it seemed like the right course of action.


Cheers,
Franco