System Hang - zroot issue

Started by jdiesel, February 26, 2025, 05:12:45 PM

Previous topic - Next topic
Seems like I had a  zroot error last night, which resulted in a complete hang.

Solaris: WARNING: Pool 'zroot' has encountered an uncorrectable I/O failure and has been suspended.
Pool 'zroot' has encountered an uncorrectable I/O failure and has been suspended.

I actually do not have most of these logs locally - I do remote logging and that system captured the logs just prior to the hang, localtime was 00:15:18.
I do not seem to have access to disk hygene and this could relevant?
I restarted and all seems just fine now.


2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="15"] AMD Features2=0x121<LAHF,ABM,Prefetch>
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="14"] AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="13"] Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="12"] Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="11"] Origin="GenuineIntel" Id=0x406e3 Family=0x6 Model=0x4e Stepping=3
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="10"] CPU: Intel(R) Core(TM) i3-6006U CPU @ 2.00GHz (2000.00-MHz K8-class CPU)
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="9"] VT(efifb): resolution 800x600
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="8"] FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67)
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="7"] FreeBSD 14.2-RELEASE-p1 stable/25.1-n269632-cc316253c68 SMP amd64
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="6"] FreeBSD is a registered trademark of The FreeBSD Foundation.
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="5"] The Regents of the University of California. All rights reserved.
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="4"] Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="3"] Copyright (c) 1992-2023 The FreeBSD Project.
2025-02-26T01:23:13-05:00 firewall kernel - - [meta sequenceId="2"] ---<<BOOT>>---
2025-02-26T01:23:13-05:00 firewall syslog-ng 11475 - [meta sequenceId="1"] syslog-ng starting up; version='4.8.1'
2025-02-26T00:15:18-05:00 firewall kernel - - [meta sequenceId="6485644"] <7>sonewconn: pcb 0xfffff80017f06a80 ([::]:53 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (139 occurrences), euid 0, rgid 0, jail 0
2025-02-26T00:14:18-05:00 firewall kernel - - [meta sequenceId="6485643"] <7>sonewconn: pcb 0xfffff80017f06a80 ([::]:53 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (176 occurrences), euid 0, rgid 0, jail 0
2025-02-26T00:14:01-05:00 firewall kernel - - [meta sequenceId="6485642"] Solaris: WARNING: Pool 'zroot' has encountered an uncorrectable I/O failure and has been suspended.
2025-02-26T00:14:01-05:00 firewall kernel - - [meta sequenceId="6485641"] (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
2025-02-26T00:14:01-05:00 firewall kernel - - [meta sequenceId="6485640"] (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
2025-02-26T00:14:01-05:00 firewall kernel - - [meta sequenceId="6485639"] (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
2025-02-26T00:14:01-05:00 firewall kernel - - [meta sequenceId="6485638"] ahcich0: is 00000000 cs 00000200 ss 00000000 rs 00000200 tfd 1d0 serr 00000000 cmd 0000c917
2025-02-26T00:14:01-05:00 firewall kernel - - [meta sequenceId="6485637"] ahcich0: Timeout on slot 9 port 0
2025-02-26T00:13:31-05:00 firewall kernel - - [meta sequenceId="6485636"] (aprobe0:ahcich0:0:0:0): Retrying command, 0 more tries remain
2025-02-26T00:13:31-05:00 firewall kernel - - [meta sequenceId="6485635"] (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
2025-02-26T00:13:31-05:00 firewall kernel - - [meta sequenceId="6485634"] (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
2025-02-26T00:13:31-05:00 firewall kernel - - [meta sequenceId="6485633"] ahcich0: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 1d0 serr 00000000 cmd 0000c817
2025-02-26T00:13:31-05:00 firewall kernel - - [meta sequenceId="6485632"] ahcich0: Timeout on slot 8 port 0
2025-02-26T00:13:17-05:00 firewall kernel - - [meta sequenceId="6485631"] <7>sonewconn: pcb 0xfffff80017f06a80 ([::]:53 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (204 occurrences), euid 0, rgid 0, jail 0
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485630"] Solaris: WARNING: Pool 'zroot' has encountered an uncorrectable I/O failure and has been suspended.
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485629"] Solaris: WARNING: Pool 'zroot' has encountered an uncorrectable I/O failure and has been suspended.
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485628"] Pool 'zroot' has encountered an uncorrectable I/O failure and has been suspended.
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485627"] (ada0:ahcich0:0:0:0): Error 6, Periph was invalidated
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485626"] Solaris: WARNING: (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-queue Request
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485625"] (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 f0 28 63 f6 40 06 00 00 00 00 00
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485624"] (ada0:ahcich0:0:0:0): Error 6, Periph was invalidated
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485623"] (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-queue Request
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485622"] (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 f0 38 62 f6 40 06 00 00 00 00 00
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485621"] (ada0:ahcich0:0:0:0): Error 6, Periph was invalidated
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485620"] (ada0:ahcich0:0:0:0): CAM status: Command timeout
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485619"] (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 10 be e7 40 0e 00 00 00 00 00
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485618"] (ada0:ahcich0:0:0:0): Error 6, Periph was invalidated
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485617"] (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-queue Request
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485616"] (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485615"] ahcich0: is 00000000 cs 00000000 ss 000000e0 rs 000000e0 tfd 40 serr 00000000 cmd 0000c717
2025-02-26T00:13:01-05:00 firewall kernel - - [meta sequenceId="6485614"] ahcich0: Timeout on slot 5 port 0
2025-02-26T00:12:31-05:00 firewall kernel - - [meta sequenceId="6485613"] (aprobe0:ahcich0:0:0:0): Retrying command, 0 more tries remain
2025-02-26T00:12:31-05:00 firewall kernel - - [meta sequenceId="6485612"] (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
2025-02-26T00:12:31-05:00 firewall kernel - - [meta sequenceId="6485611"] (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
2025-02-26T00:12:31-05:00 firewall kernel - - [meta sequenceId="6485610"] ahcich0: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 1d0 serr 00000000 cmd 0000df17
2025-02-26T00:12:31-05:00 firewall kernel - - [meta sequenceId="6485609"] ahcich0: Timeout on slot 31 port 0
2025-02-26T00:12:17-05:00 firewall kernel - - [meta sequenceId="6485608"] <7>sonewconn: pcb 0xfffff80017f06a80 ([::]:53 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (154 occurrences), euid 0, rgid 0, jail 0
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485607"] (ada0:ahcich0:0:0:0): Error 6, Periph was invalidated
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485606"] (ada0:ahcich0:0:0:0): CAM status: Command timeout
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485605"] (ada0:ahcich0:0:0:0): READ_DMA. ACB: c8 00 10 2a 08 41 00 00 00 00 10 00
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485604"] (ada0:ahcich0:0:0:0): Error 6, Periph was invalidated
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485603"] (ada0:ahcich0:0:0:0): CAM status: Unconditionally Re-queue Request
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485602"] (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 10 bc e7 40 0e 00 00 00 00 00
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485601"] ahcich0: is 00000000 cs 40000000 ss 00000000 rs 40000000 tfd d0 serr 00000000 cmd 0000de17
2025-02-26T00:12:01-05:00 firewall kernel - - [meta sequenceId="6485600"] ahcich0: Timeout on slot 30 port 0
2025-02-26T00:11:50-05:00 firewall filterlog 91002 - [meta sequenceId="6485599"] 117,,,78ab8ef80d8e923479f99b265b627394,igb3,match,pass,out,4,0x0,,62,38021,0,DF,6,tcp,60,216.212.95.186,37.120.205.197,57294,36069,0,S,998208525,,64240,,mss;sackOK;TS;nop;wscale
2025-02-26T00:11:48-05:00 firewall filterlog 91002 - [meta sequenceId="6485598"] 114,,,fae559338f65e11c53669fc3642c93c2,vlan0.2,match,pass,out,4,0x0,,127,19463,0,DF,6,tcp,52,10.20.1.230,10.20.2.128,3435,24800,0,S,342899722,,65535,,mss;nop;wscale;nop;nop;sackOK

Your disk/SSD is probably dying and needs to be replaced. A controller failure is also possible but less common.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Those CAM errors are saying the systen could not talk to the storage subsystem, leading to the failure on zfs.
Transient problem: yes a loose connection for instance.
Non-transient: a disk, cable or storage controller is failing but not yet completely.