Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - zzyzx

#1
Today the gui was very slow, practically unresponsive. When the dashboard finally loaded all the tables were empty. Login via ssh had no problems.

Before reboot this was in the system log. Do these entries indicate a problem?

<13>1 2023-06-26T16:07:54-07:00 thechekt.lunas.lan dhclient 43547 - [meta sequenceId="1"] Creating resolv.conf
<11>1 2023-06-26T21:03:00-07:00 thechekt.lunas.lan configctl 79000 - [meta sequenceId="1"] error in configd communication  Traceback (most recent call last):   File "/usr/l
ocal/sbin/configctl", line 66, in exec_config_cmd     line = sock.recv(65536).decode() socket.timeout: timed out
<11>1 2023-06-26T22:02:00-07:00 thechekt.lunas.lan configctl 51892 - [meta sequenceId="1"] error in configd communication  Traceback (most recent call last):   File "/usr/l
ocal/sbin/configctl", line 66, in exec_config_cmd     line = sock.recv(65536).decode() socket.timeout: timed out
<11>1 2023-06-26T22:03:00-07:00 thechekt.lunas.lan configctl 36621 - [meta sequenceId="1"] error in configd communication  Traceback (most recent call last):   File "/usr/l
ocal/sbin/configctl", line 66, in exec_config_cmd     line = sock.recv(65536).decode() socket.timeout: timed out
1 line changed; 5 lines deleted


Earlier entries were the usual repeats:

pid 29620 (python3.9), jid 0, uid 0: exited on signal 11 (core dumped)

load average from top seemed ok. Temps are often on the high side of 50-55C but not crazy.


last pid: 82800;  load averages:  0.38,  0.38,  0.31                                                                                                up 0+19:21:47  22:22:23
49 processes:  1 running, 48 sleeping
CPU:  0.8% user,  0.0% nice,  2.4% system,  0.0% interrupt, 96.8% idle
Mem: 87M Active, 367M Inact, 622M Wired, 40K Buf, 6624M Free
ARC: 263M Total, 65M MFU, 162M MRU, 280K Anon, 2329K Header, 33M Other
     183M Compressed, 527M Uncompressed, 2.88:1 Ratio
Swap: 8192M Total, 8192M Free
#2
Thanks for the responses.

I agree, the zfs filesystem issues are likely a symptom of another underlying issue. Swapping out hardware this weekend and I'll run some ram tests to see if there are any culprits that are highlighted.

One thing I'm considering is these lockups happen most frequently when wireguard is in heavier use. Hard to test, but I'll report back of something more conclusive surfaces.
#3
crash report!
#4
More info from the most recent lockup. Same symptoms, firewall becomes unresponsive and hardware is very hot. Hard reset often results in kernel panic on reboot:
Solaris(panic): zfs: removing nonexistent segment from range tree (offset (4a7172000 size=1000)

although I think this is a result of the hard reset and not the root cause of the initial lockup.
#5
Hardware is a fitlet2 with Celeron J3455 quad-core, 8GB RAM, 105GB SSD

no SMART error issues listed. The only strangeness in dmesg/system logs I could see was this error multiple times:
pid 29620 (python3.9), jid 0, uid 0: exited on signal 11 (core dumped)

Which logs can I provide to help diagnose?

Thanks for the help.
#6
Since updating to the 23 series, maybe just coincidental, my firewall frequently locks up (three times in the past month) and becomes unresponsive. When it does lock up, the hardware gets much hotter, so the CPU seems to be chewing on something.

When I (hard) reset, it sometimes recovers normally, but I've had to reinstall/restore twice now due to a kernel panic, I assume from the reset. What is the best way to diagnose the cause?

Thanks.