OPNsense Forum

Archive => 16.7 Legacy Series => Topic started by: Alphabet Soup on November 21, 2016, 11:28:23 pm

Title: [SOLVED] A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: Alphabet Soup on November 21, 2016, 11:28:23 pm
I upgraded my A10 from 16.7.5 to 16.7.8 remotely last night via SSH CLI.  No errors during the updating but the box never came back after the reboot.  I didn't have a console cable plugged in at that time unfortunately so I don't know if it reported any errors as it tried to boot.

When I arrived on site, I plugged the console cable into the A10 but could not get any response.  Eventually pulled the A10's power plug, then replugged it.  Now output appeared on the serial console.  The usual POST and OPNsense/FreeBSD boot stuff followed, kernel modules loaded, devices enumerated, etc.  Until it got to the point where it was UP'ing my interfaces... the last serial console output was about two of my three used em* interfaces now being UP.  Then silence, no response.  The OS appears to continue to boot as eventually I can ping and SSH and HTTP to the A10, and it seems to be following it's OPNsense configuration.  Looking at /var/log/system.log shows that it just kept right on booting and logging:
Code: [Select]
(snip)
Nov 22 06:34:29 OPNsense kernel: Root mount waiting for: usbus4 usbus2
Nov 22 06:34:29 OPNsense kernel: uhub2: 4 ports with 4 removable, self powered
Nov 22 06:34:29 OPNsense kernel: uhub4: 4 ports with 4 removable, self powered
Nov 22 06:34:29 OPNsense kernel: Trying to mount root from ufs:/dev/mmcsd0s1a [rw,noatime]...
Nov 22 06:34:29 OPNsense kernel: em0: link state changed to UP
Nov 22 06:34:29 OPNsense kernel: em2: link state changed to UP
*****  About here is where the serial console stops outputting.  *****
Nov 22 06:34:29 OPNsense kernel: em3: link state changed to UP
Nov 22 06:34:29 OPNsense kernel: done.
Nov 22 06:34:30 OPNsense kernel:
Nov 22 06:34:30 OPNsense kernel: em0: link state changed to DOWN
Nov 22 06:34:30 OPNsense devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop em0'
Nov 22 06:34:30 OPNsense sshlockout[16911]: sshlockout/webConfigurator v3.0 starting up
Nov 22 06:34:30 OPNsense configd.py: [563b1c51-64da-4082-8468-30c6defac169] Linkup stopping em0
Nov 22 06:34:30 OPNsense opnsense: /usr/local/etc/rc.bootup: The command '/sbin/ifconfig 'pppoe0' inet6 -accept_rtadv' returned exit code '1', the output was 'ifconfig: interface pppoe0 does not exist'
Nov 22 06:34:30 OPNsense kernel:
Nov 22 06:34:30 OPNsense kernel: em2: link state changed to DOWN
Nov 22 06:34:30 OPNsense devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop em2'
Nov 22 06:34:30 OPNsense configd.py: [09a8de75-38c4-4711-90e3-addbc5b2166b] Linkup stopping em2
Nov 22 06:34:31 OPNsense kernel:
Nov 22 06:34:31 OPNsense kernel: ng0: changing name to 'pppoe0'
Nov 22 06:34:32 OPNsense opnsense: /usr/local/etc/rc.bootup: The command '/sbin/ifconfig 'pppoe1' inet6 -accept_rtadv' returned exit code '1', the output was 'ifconfig: interface pppoe1 does not exist'
Nov 22 06:34:32 OPNsense kernel:
Nov 22 06:34:32 OPNsense kernel: em3: link state changed to DOWN
Nov 22 06:34:32 OPNsense devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup stop em3'
Nov 22 06:34:33 OPNsense configd.py: [9ec92e6b-b17b-4575-8078-0e1e263e8156] Linkup stopping em3
Nov 22 06:34:33 OPNsense kernel:
Nov 22 06:34:33 OPNsense kernel: ng1: changing name to 'pppoe1'
Nov 22 06:34:33 OPNsense kernel: em2: link state changed to UP
Nov 22 06:34:33 OPNsense devd: Executing '/usr/local/opnsense/service/configd_ctl.py interface linkup start em2'
Nov 22 06:34:34 OPNsense configd.py: [19561326-c76a-4047-a362-1c6295b7c439] Linkup starting em2
Nov 22 06:34:34 OPNsense kernel: em0: link state changed to UP
(snip)

I tried rebooting a couple of times, and each time the serial console goes silent and unresponsive after em* UP.  I rechecked for updates, but it reports being up-to-date.

Last time I updated (to 16.7.5) I think I did it via the serial console, so I can sort-of confirm that it worked then.

I can dismiss the first box-dead-after-reboot as a one-time post-upgrade thing, but this serial console issue doesn't want to self-heal so nicely.  Any advice?
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: franco on November 22, 2016, 09:42:17 am
Hi there,

Can you check your System: Administration settings? Serial console should be first, second one off?


Cheers,
Franco
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: chol on November 22, 2016, 12:17:57 pm
Can confirm the broken update process on

console
&
WebGUI

Hardware:
 Atom D510 amd64 4GB ram 2 x SSD 64GB (mirror)

Software-system:
OPNsense 16.7-amd64
FreeBSD 10.3-RELEASE-p5
OpenSSL 1.0.2h 3 May 2016
 Firmware Mirror    Amsterdam, NL
 Firmware Flavour      OpenSSL

.. both console and webgui upgrades of my vanilla OPNsense 16.7 (default, fresh install) broke whilst fetching packets. After reboot, new upgrade atempt failed .. etc.pp. Sent error report from webgui.

Code: [Select]
***GOT REQUEST TO UPGRADE: all***
Updating OPNsense repository catalogue...
OPNsense repository is up-to-date.
All repositories are up-to-date.
Updating OPNsense repository catalogue...
OPNsense repository is up-to-date.
All repositories are up-to-date.
Checking for upgrades (93 candidates): .......... done
Processing candidates (93 candidates): ........ done
The following 91 package(s) will be affected (of 0 checked):

Installed packages to be REMOVED:
openldap-client-2.4.44

New packages to be INSTALLED:
heimdal: 1.5.3_7
db5: 5.3.28_6
libwww: 5.4.0_5
openldap-sasl-client: 2.4.44
cyrus-sasl: 2.1.26_12
zip: 3.0_1

Installed packages to be UPGRADED:
unbound: 1.5.9 -> 1.5.10
suricata: 3.1.1 -> 3.1.3
sudo: 1.8.17p1 -> 1.8.18p1
strongswan: 5.4.0 -> 5.5.0
sqlite3: 3.13.0_2 -> 3.14.1_1
samplicator: 1.3.7.b6_2 -> 1.3.8.r1
py27-setuptools27: 23.1.0 -> 28.1.0
py27-requests: 2.10.0 -> 2.11.1
py27-pytz: 2016.6.1,1 -> 2016.7,1
png: 1.6.23 -> 1.6.25
php56-zlib: 5.6.24 -> 5.6.28
php56-xml: 5.6.24 -> 5.6.28
php56-sqlite3: 5.6.24 -> 5.6.28
php56-sockets: 5.6.24 -> 5.6.28
php56-simplexml: 5.6.24 -> 5.6.28
php56-session: 5.6.24 -> 5.6.28
php56-pdo: 5.6.24 -> 5.6.28
php56-openssl: 5.6.24 -> 5.6.28
php56-mcrypt: 5.6.24 -> 5.6.28
php56-ldap: 5.6.24 -> 5.6.28
php56-json: 5.6.24 -> 5.6.28
php56-hash: 5.6.24 -> 5.6.28
php56-gettext: 5.6.24 -> 5.6.28
php56-filter: 5.6.24 -> 5.6.28
php56-dom: 5.6.24 -> 5.6.28
php56-curl: 5.6.24 -> 5.6.28
php56-ctype: 5.6.24 -> 5.6.28
php56: 5.6.24 -> 5.6.28
php-suhosin: 0.9.38 -> 0.9.38_3
phalcon: 2.0.13 -> 3.0.1
pftop: 0.7_6 -> 0.7_8
perl5: 5.20.3_13 -> 5.24.1.r4
pecl-radius: 1.3.0 -> 1.4.0.b1
opnsense-update: 16.7 -> 16.7.7_1
opnsense-lang: 16.7 -> 16.7.7
opnsense: 16.7 -> 16.7.8
openvpn: 2.3.11 -> 2.3.13_1
openssl: 1.0.2_14 -> 1.0.2j_1,1
openssh-portable: 7.2.p2,1 -> 7.3.p1_1,1
ntp: 4.2.8p8 -> 4.2.8p8_1
lighttpd: 1.4.39_1 -> 1.4.43_2
libxml2: 2.9.3 -> 2.9.4
jansson: 2.7_3 -> 2.9
isc-dhcp43-server: 4.3.4 -> 4.3.5
isc-dhcp43-relay: 4.3.4 -> 4.3.5
isc-dhcp43-client: 4.3.4 -> 4.3.5
indexinfo: 0.2.4 -> 0.2.6
hyperscan: 4.2.0 -> 4.3.1
curl: 7.49.1 -> 7.51.0_1
ca_root_nss: 3.25 -> 3.27.1
bsdinstaller: 16.7 -> 16.7_1
bind910: 9.10.4P2 -> 9.10.4P4

Installed packages to be REINSTALLED:
wol-0.7.1_2 (options changed)
squid-3.5.20 (options changed)
rrdtool12-1.2.30_7 (options changed)
relayd-5.5.20140810_2 (needed shared library changed)
python27-2.7.12 (needed shared library changed)
py27-Jinja2-2.8 (options changed)
py27-Babel-2.3.4 (options changed)
pcre-8.39 (options changed)
nettle-3.2 (options changed)
miniupnpd-1.9.20160113,1 (needed shared library changed)
lzo2-2.09 (options changed)
libyaml-0.1.6_2
libucl-0.8.0
libnet-1.1.6_4,1 (options changed)
libmcrypt-2.5.8_3
libltdl-2.4.6
libiconv-1.14_9 (options changed)
libffi-3.2.1
libevent2-2.0.22_1 (needed shared library changed)
libedit-3.1.20150325_2,1
libart_lgpl-2.3.21_2,1
ldns-1.6.17_5 (options changed)
idnkit-1.0_5 (options changed)
gmp-5.1.3_3
gettext-runtime-0.19.8.1
freetype2-2.6.3
flowd-0.9.1_3 (direct dependency changed: perl5)
expat-2.2.0
easy-rsa-3.0.1_1 (options changed)
dnsmasq-2.76,1 (options changed)
dhcp6-20080615_7 (options changed)
GeoIP-1.6.9 (options changed)

Number of packages to be removed: 1
Number of packages to be installed: 6
Number of packages to be upgraded: 52
Number of packages to be reinstalled: 32

The operation will free 4 MiB.
69 MiB to be downloaded.
Fetching wol-0.7.1_2.txz: ... done
Fetching unbound-1.5.10.txz: .......... done
Fetching suricata-3.1.3.txz: .......... done
Fetching sudo-1.8.18p1.txz: .......... done
Fetching strongswan-5.5.0.txz: .......... done
Fetching squid-3.5.20.txz: .......... done
Fetching sqlite3-3.14.1_1.txz: .......... done
Fetching samplicator-1.3.8.r1.txz: .. done
Fetching rrdtool12-1.2.30_7.txz: .......... done
Fetching relayd-5.5.20140810_2.txz: .......... done
Fetching python27-2.7.12.txz: .......... done
Fetching py27-setuptools27-28.1.0.txz: .......... done
Fetching py27-requests-2.11.1.txz: .......... done
Fetching py27-pytz-2016.7,1.txz: .......... done
Fetching py27-Jinja2-2.8.txz: .......... done
Fetching py27-Babel-2.3.4.txz: .......... done
Fetching png-1.6.25.txz: .......... done
Fetching php56-zlib-5.6.28.txz: .. done
Fetching php56-xml-5.6.28.txz: .. done
Fetching php56-sqlite3-5.6.28.txz: .. done
Fetching php56-sockets-5.6.28.txz: .... done
Fetching php56-simplexml-5.6.28.txz: ... done
Fetching php56-session-5.6.28.txz: .... done
Fetching php56-pdo-5.6.28.txz: ..... done
Fetching php56-openssl-5.6.28.txz: ..... done
Fetching php56-mcrypt-5.6.28.txz: .. done
Fetching php56-ldap-5.6.28.txz: ... done
Fetching php56-json-5.6.28.txz: .. done
Fetching php56-hash-5.6.28.txz: .......... done
Fetching php56-gettext-5.6.28.txz: . done
Fetching php56-filter-5.6.28.txz: .. done
Fetching php56-dom-5.6.28.txz: ...... done
Fetching php56-curl-5.6.28.txz: ... done
Fetching php56-ctype-5.6.28.txz: . done
Fetching php56-5.6.28.txz: .......... done
Fetching php-suhosin-0.9.38_3.txz: ...... done
Fetching phalcon-3.0.1.txz: .......... done


...broke kernel
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: chol on November 22, 2016, 12:38:07 pm
 :o  To get text output from update process, I did run the upgrade 5 times. Now the last 6th time the new kernel got installed and the upgrade to 16.7.8 (amd64/OpenSSL) finally succeeded.

1)Got a webgui error:
Code: [Select]
An API exception occured

Error at /usr/local/opnsense/mvc/app/library/OPNsense/Core/Backend.php:94 - stream_socket_client(): unable to connect to unix:///var/run/configd.socket (Connection refused) (errno=2)

Did "Restart web interface" from console and logged in new and got:

Code: [Select]
A problem was detected. Click here for more information.
(..)
<6>pid 35631 (python2.7), uid 0: exited on signal 4 (core dumped)
panic: bad pte va 401000 pte 0
cpuid = 1
KDB: enter: panic
panic.txt0600002713015025260  7131 ustarrootwheelbad pte va 401000 pte 0version.txt06000016513015025260  7607 ustarrootwheelFreeBSD 10.3-RELEASE-p5 #0 48f6860(master): Fri Jul 22 17:54:41 CEST 2016
    root@sensey64:/usr/obj/usr/src/sys/SMP

Sent the full error report via webgui.

New reboot:
 no webgui access from browser, did upgrade via console
resulted in  kernel and base-16-7-7 install, the second time and auto-reboot again
got webgui access (this time), showing:

Code: [Select]
OPNsense 16.7.8-amd64
FreeBSD 10.3-RELEASE-p11
OpenSSL 1.0.2j 26 Sep 2016

Your system is up to date.

 ;D ;D ;D

NOW - I did upgrade to LibreSSL (w/OPNsense, Amsterdam, NL mirror):

.. this resulted in a rather quick upgrade to LibreSSL flavour (console output said (amd64/LibreSSL)) but broke the WebGUI access  :-\

... got back to OpenSSL (this time 16.7.9 (amd64/OpenSSL)

then did:
-login via webgui
-changed to LibreSSL repository
-saved
-triggered upgrade from console
-no reboot required

FINALLY:

it got me an up-to-date my OPNsense 16.7.9 (amd64/LibreSSL)   :D :D :D

Code: [Select]
OPNsense 16.7.9-amd64
FreeBSD 10.3-RELEASE-p11
LibreSSL 2.4.4

everything is fine now!

Thanks to Franco,Ad and the team  :-*



Hope that helps 
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: franco on November 22, 2016, 07:16:57 pm
Hi chol,

I suspect there is something problematic with the hardware or your install setup. This shouldn't happen, especially not in this frequency and end up being fine after enough tries.

Unfortunately, you also seemed hit the update window with your upgrades / side-grades to LibreSSL so the versions flickered. It's better not to use the mirrors unless you can't find what you're looking for in latest updates.


Cheers,
Franco
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: chol on November 22, 2016, 10:49:02 pm
#1 .. not use the mirrors ??

Just for clarifications, what do you call the <default>? ?

Isn't it the OPNsense, Amsterdam, NL - mirror?

#2 .. hardware is fine (why I know? because a -temporary (  ;) ) reference install of pfSense worked flawlessly (even: all updates and connections to PPPoE bare-bone VDSL-modem).
Happens to be coincidental that also the updates on my two Alix 2d3 machines w/OPNsense 16.7 had issues only with the upgrade process. This caused it, that I did (<irony>) prose at lenght here ..  so please accept my sorry for that, but all that time spent got to my nerves ...  ::)

Regards, Chol.
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: Alphabet Soup on November 23, 2016, 02:22:42 am
Can you check your System: Administration settings? Serial console should be first, second one off?

Yes, that's what it shows.  /etc/ttys looks unchanged, /boot.config is still "-S115200 -D", and at the tail of /boot/loader.conf:
Code: [Select]
# dynamically generated settings follow
comconsole_speed="115200"
#boot_multicons
boot_serial="YES"
console="comconsole"

Also, when I gracefully reboot the A10, the serial console comes to life starting with:
Code: [Select]
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, nodes remaining...1 0 0 done
All buffers synced.
Uptime: 3h26m6s
SeaBIOS (version 1.7.5-20141105_115023-ubuntu)
Found mainboard Deciso Netboard A10
Relocating init from 0x000e76b9 to 0xbf0e5df0 (size 41283)
(snip)

Now running 16.7.9.  No change to this problem.
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: franco on November 23, 2016, 08:44:27 am
@chol, sorry, I meant to say "consider not switching your preferred mirror during a single box upgrade cycle", especially when changing from version and flavour. Mirrors can go out of sync, some showing older versions for a couple of hours. The system will attempt to downgrade into a former OpenSSL/LibreSSL version, but may end up with a defunct system. It doesn't have to, but it will sure cause a bad experience.

I would agree that other software uses different approaches. Also, I can't think of a problem with the firmware upgrades. It merely copies files. It makes sure kernels are there. It will immediately stop and not reboot if there is an actual problem. I suspect the reboot does not fully sync the file system in some instances (due to the way we handle it -- maybe by forcing a reboot command), leaving the file system in worse shape than before reboot. Unfortunately, that's hard to debug and thus inconclusive.

Maybe there are ways to solve this. OTOH, firmware upgrades have a challenging history for many projects. And if a different way of unpacking files is the solution, it could be that we're looking at the wrong problem?

@Alphabet Soup, would you mind contacting Deciso about this? I have the A10 here, it looks ok from my end.
Title: Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: chol on November 24, 2016, 09:02:27 am
@franco: yes thank you: acknowledged.

Interesting and good thing is, that I could upgrade my x86-32bit Alix 2c3/2d3 nano-OPNsense machines' cf cards by use of the x86-64 Intel atom D510 machine (only 1 reboot required this time), see:

Code: [Select]
Versions OPNsense 16.7.9-i386
                FreeBSD 10.3-RELEASE-p11
                LibreSSL 2.4.4
Updates Click to check for updates.
CPU Type Intel(R) Atom(TM) CPU D510 @ 1.66GHz (4 cores)

Works steady now! That was a relief yesterday. Kind regards, chol  :)
Title: [Solved] Re: A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: Alphabet Soup on November 29, 2016, 06:48:23 am
@Alphabet Soup, would you mind contacting Deciso about this? I have the A10 here, it looks ok from my end.

The A10 now does not output anything on the serial console and does not seem to boot at all.  Power light and NIC lights work, but otherwise dead.  Looks like this was a hardware problem, and this A10 is headed for the trash.

Not an OPNsense problem, sorry for the noise.
Title: Re: [SOLVED] A10 Serial Console stops working mid-boot after update to 16.7.8
Post by: franco on December 03, 2016, 08:42:24 pm
We do have a new image for the 16.7 Nano:

https://pkg.opnsense.org/snapshots/OPNsense-16.7-OpenSSL-nano-amd64.img.bz2

This one only has one slice, but should fit more cards and seems to not have file system weaknesses like you've described. Feedback welcome.


Cheers,
Franco