OPNsense Forum

English Forums => 24.7, 24.10 Production Series => Topic started by: chemlud on August 21, 2024, 05:16:52 PM

Title: OPNsense suddenly unreachable - How to debug
Post by: chemlud on August 21, 2024, 05:16:52 PM
Hy!

Have here a Dell Optiplex box (doing fine for years) that failed some weeks ago after an update (new SSD installed some weeks before). As a result I reinstalled another fresh SSD and ZFS some weeks ago. Afterwards box was stable.

About 1 week ago I upgraded to 24.7.1 and today all of a sudden one LAN client was unreachable, a reboot resulted in no IP via DHCP. Moreover I could not reach the OPNsense via GUI or serial console. Rebooting (hard reset) brought the OPNsense back, but I want to learn what is failing here.

Where to start looking? Which logs might help?
Title: Re: OPNsense suddenly unreachable - How to debug
Post by: crazywolf13 on August 21, 2024, 08:03:05 PM
What are the boot logs showing?
Title: Re: OPNsense suddenly unreachable - How to debug
Post by: chemlud on August 22, 2024, 08:44:54 AM
...looks normal, no errors reported, just in the end a little chatter (Suricata at work, I guess)

# dmesg
...
845.542461 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
845.549880 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
845.557405 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
845.678136 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
igb4: link state changed to DOWN
846.207238 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
846.214659 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
846.222222 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
846.792433 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
846.800203 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
846.807833 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
846.820656 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
igb3: link state changed to DOWN
847.368629 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
847.376714 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
847.384138 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
847.917440 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
847.924851 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
847.932375 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
847.944330 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
igb2: link state changed to DOWN
848.469548 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
848.476990 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
848.484500 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
849.031964 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
849.039428 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
849.046955 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
849.060860 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
igb0: link state changed to DOWN
849.974814 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
849.982272 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
849.989856 [ 850] iflib_netmap_config       txr 2 rxr 2 txd 1024 rxd 1024 rbufsz 2048
igb4: link state changed to UP
850.891927 [ 850] iflib_netmap_config       txr 1 rxr 1 txd 1024 rxd 1024 rbufsz 2048
850.899350 [ 850] iflib_netmap_config       txr 1 rxr 1 txd 1024 rxd 1024 rbufsz 2048
850.906881 [ 850] iflib_netmap_config       txr 1 rxr 1 txd 1024 rxd 1024 rbufsz 2048
850.918983 [ 850] iflib_netmap_config       txr 1 rxr 1 txd 1024 rxd 1024 rbufsz 2048
igb3: link state changed to UP
em0: link state changed to DOWN
igb2: link state changed to UP
igb0: link state changed to UP
853.993664 [ 850] iflib_netmap_config       txr 1 rxr 1 txd 1024 rxd 1024 rbufsz 48
854.001073 [ 850] iflib_netmap_config       txr 1 rxr 1 txd 1024 rxd 1024 rbufsz 2048
854.008590 [ 850] iflib_netmap_config       txr 1 rxr 1 txd 1024 rxd 1024 rbufsz 2048
em0: link state changed to UP


What would be the equivalent to

journalctl -r -b -1

in BSD, maybe there is something in the systemlogs before the box apparently crashed?
Title: Re: OPNsense suddenly unreachable - How to debug
Post by: doktornotor on August 22, 2024, 08:52:39 AM
Quote from: chemlud on August 22, 2024, 08:44:54 AM
What would be the equivalent to

journalctl -r -b -1

in BSD, maybe there is something in the systemlogs before the box apparently crashed?

Fortunately, none...  :-X ::) I'd suggest to use the old-fashioned grep, tail or whatever, looking at /var/log/*/latest.log

Latest 20 lines from each log:

# tail -n 20 /var/log/*/latest.log

Title: Re: OPNsense suddenly unreachable - How to debug
Post by: chemlud on August 22, 2024, 09:02:15 AM
....I didn't invent systemd :-P only have to use it...

In the last 20 lines of each log nothing special.

Is there a way to view specifically the logs before the (second) last boot?

It crashed yesterday afternoon, in the evening I updated to 24.7.2 (including reboot), so the interesting stuff might be much more than 20 lines away...
Title: Re: OPNsense suddenly unreachable - How to debug
Post by: doktornotor on August 22, 2024, 09:14:26 AM
Quote from: chemlud on August 22, 2024, 09:02:15 AM
....I didn't invent systemd :-P only have to use it...

;D ;D ;D

Quote from: chemlud on August 21, 2024, 05:16:52 PM
I could not reach the OPNsense via serial console.

Well, that is rather weird. Maybe the hardware has had it overall... not just the SSD. Perhaps run some memtests, CPU stress tests or whatever to how it goes.
Title: Re: OPNsense suddenly unreachable - How to debug
Post by: chemlud on August 22, 2024, 10:14:17 AM
Had a look in the system logs for yesterday, especially afternoon, nothing special in there, some unbound error

# tail -n 300 /var/log/system/system_20240821.log

....
<11>1 2024-08-21T18:14:51+02:00 OPN.arpa opnsense 3186 - [meta sequenceId="159"] /usr/local/etc/rc.linkup: The command '/sbin/mount -r -t nullfs '/lib' '/var/unbound/lib'' returned exit code '1', the output was 'mount_nullfs: /var/unbound/lib: Resource d'
<12>1 2024-08-21T18:14:51+02:00 OPN.arpa opnsense 19375 - [meta sequenceId="160"] /usr/local/etc/rc.newwanip: warning: ignoring missing default tunable request: debug.pfftpproxy
<12>1 2024-08-21T18:14:51+02:00 OPN.arpa opnsense 3186 - [meta sequenceId="161"] /usr/local/etc/rc.linkup: warning: ignoring missing default tunable request: debug.pfftpproxy
<13>1 2024-08-21T18:14:53+02:00 OPN.arpa opnsense 19375 - [meta sequenceId="162"] /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : vxlan_configure_do())
<13>1 2024-08-21T18:14:53+02:00 OPN.arpa opnsense 19375 - [meta sequenceId="163"] /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : webgui_configure_do(,wan))
<13>1 2024-08-21T18:14:53+02:00 OPN.arpa opnsense 19375 - [meta sequenceId="164"] /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : wireguard_sync(,wan))
....


...but that should not block the serial console, huh?

Boss says: No problem...

https://forum.opnsense.org/index.php?topic=39469.msg196003#msg196003

Think of changing hardware (except NICs, RAM and SSD) and see what happenz next.
Title: Re: OPNsense suddenly unreachable - How to debug
Post by: doktornotor on August 22, 2024, 10:50:18 AM
Quote from: chemlud on August 22, 2024, 09:02:15 AM
It crashed yesterday afternoon, in the evening I updated to 24.7.2 (including reboot), so the interesting stuff might be much more than 20 lines away...

Well, there are timestamps, use them with grep to get something around the time it got frozen. The unbound bind-mount remount is certainly not causing the freeze.

grep 2024-08-21 /var/log/*/latest.log

or more specific, incl. hour, minutes...
Title: Re: OPNsense suddenly unreachable - How to debug
Post by: chemlud on August 22, 2024, 11:05:58 AM
Quote from: doktornotor on August 22, 2024, 10:50:18 AM
...

grep 2024-08-21 /var/log/*/latest.log

or more specific, incl. hour, minutes...

Thanks! What would be the grep for anything after 10 am on 21-AUG-2024?

If there is nothing in system.log, what should freeze the whole box and write something meaningful to a log before saying bye-bye?

I changed the hardware now (keeping SSD, RAM and NICs), hope that the nightmare is over for now.

Need a fresh spare box though...

PS: You should really add your avatar here. I liked it... :-p

Perfect! :-D