OPNsense Forum

Archive => 19.7 Legacy Series => Topic started by: sporkman on September 25, 2019, 11:23:38 pm

Title: [SOLVED] configd not running, won't start
Post by: sporkman on September 25, 2019, 11:23:38 pm
Logged-in to the GUI today to see if the box is still panicing every night and most of the data in the dashboard was blank.

In the "services" pane, I saw that "configd" was not running.  On trying to start it, this is logged in the system logs:

Code: [Select]
opnsense: /status_services.php: The command '/usr/local/etc/rc.d/configd start' returned exit code '1',
the output was 'Starting configd. Traceback (most recent call last): File "/usr/local/opnsense/service/configd.py",
line 37, in <module> import logging File "/usr/local/lib/python3.7/logging/__init__.py", line 26, in <module> import sys,
os, time, io, traceback, warnings, weakref, collections.abc File "/usr/local/lib/python3.7/traceback.py", line 5,
in <module> import linecache File "/usr/local/lib/python3.7/linecache.py", line 8, in <module> import functools
ModuleNotFoundError: No module named 'functools' /usr/local/etc/rc.d/configd: WARNING: failed to start configd'

Any idea what that's about?

No access via ssh, I assume something is broken there as well.
Title: Re: configd not running, won't start
Post by: franco on September 25, 2019, 11:26:08 pm
Can you run this from the console?

# pkg check -da
# pkg check -sa

Assuming multiple files missing or checksum clobbering.


Cheers,
Franco
Title: Re: configd not running, won't start
Post by: sporkman on September 26, 2019, 01:59:50 am
Yep, first one is OK, second fails:

Code: [Select]
root@SporkLab:/home/sporkadmin # pkg check -da
Checking all packages: 100%
root@SporkLab:/home/sporkadmin # pkg check -sa
Checking all packages:  83%
py37-yaml-5.1: missing file /usr/local/lib/python3.7/site-packages/yaml/__pycache__/constructor.cpython-37.pyc
Checking all packages:  85%
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/optparse.cpython-37.opt-2.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/random.cpython-37.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/stat.cpython-37.opt-1.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/stat.cpython-37.opt-2.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/functools.py
python37-3.7.4: missing file /usr/local/lib/python3.7/lib2to3/pgen2/__pycache__/token.cpython-37.opt-1.pyc
Checking all packages: 100%
root@SporkLab:/home/sporkadmin
Title: Re: configd not running, won't start
Post by: sporkman on September 26, 2019, 04:43:55 am
I did a "pkg install -f pkgname" for all of them with mismatches (a later run of "pkg check -sa" turned up a few more).

I assume this is corruption/data loss from the box panicing.
Title: Re: configd not running, won't start
Post by: banym on September 26, 2019, 02:56:49 pm
Was configd working after reinstall the packages?
Title: Re: configd not running, won't start
Post by: sporkman on September 26, 2019, 10:13:43 pm
Yep, started right up.
Title: Re: configd not running, won't start
Post by: franco on September 27, 2019, 08:54:31 am
> I assume this is corruption/data loss from the box panicing.

Indeed. Glad the package manager still worked to recover the system. :)


Cheers,
Franco
Title: Re: [SOLVED] configd not running, won't start
Post by: sporkman on September 27, 2019, 10:18:51 pm
So those panics...

That seems kind of abnormal and something that's new to this box after migrating from pfsense.

It's an old Dell SFF, Core2Duo. If I get a clean bill of health on the RAM from memtest86+, what's next?

I did report the last two with the included "report this problem" tool, nothing stood out to me, but what do I know?
Title: Re: [SOLVED] configd not running, won't start
Post by: packet loss on September 28, 2019, 05:56:04 pm
sporkman you can submit any issues with OPNsense you encounter at the following locations:

OPNsense Issues - github:
Title: Re: [SOLVED] configd not running, won't start
Post by: sporkman on September 29, 2019, 01:57:47 am
I will, but I'm looking for some guidance on what I can do on the hardware side first.

I just ran a single pass of memtest86+ and that completed fine.

I ran "stresscpu" for about 20 minutes and nothing odd happened.

I ran the manufacturer's disk diag tools and they came up empty.

I should probably run memtest again and let it do like 10 passes or so to be sure.

After that though, my gut feeling is it's a HardenedBSD issue - perhaps my older hardware is an edge case or something. In addition to the panics, I'll occasionally see a message about python SIGSEGV'ing with an extra note from HBSD's stack protection or something.

Anyhow, some pointers before I waste time on the issue tracker would be appreciated.
Title: Re: [SOLVED] configd not running, won't start
Post by: sporkman on September 30, 2019, 12:16:42 am
And just noting when I went out last night I let memtest run for a few hours, still no errors.

Do I open a github issue or no?
Title: Re: [SOLVED] configd not running, won't start
Post by: franco on September 30, 2019, 04:52:05 pm
Panics are in the domain of the operating system. Some may come from bad use of the software, but ideally the OS shouldn't be prone to "denial of service" from userland. Sometimes it is, but it is still a bug in the OS to react that way.

HardenedBSD may be involved in panics, but I've rarely seen that to be the case.

The bulk of panics comes from FreeBSD base code by nature of the code base: drivers and networking.

Now, the question is what type of panic are you getting? Do you have a stack trace? This can help looking for clues on e.g. https://bugs.freebsd.org/bugzilla/ and isolating the crashing component. It may be network card or some other aux hardware that doesn't crash while doing memtests, but does so reliably when you put traffic through it. Possibilities are many so the stack trace is key.

Your best bet other than identifying the problematic piece of hardware (if any) is waiting for the next OS update. We've scheduled HardenedBSD 12.1 for OPNsense 20.1 which is quite a jump forward in the code base so if it is a software error there's a valid chance that the bug has been fixed (if any).


Cheers,
Franco
Title: Re: [SOLVED] configd not running, won't start
Post by: sporkman on September 30, 2019, 10:36:47 pm
Is there any way for me to retrieve the stack traces that were submitted via the built-in reporting system? I sent the last two in.

Not sure if opnsense runs the "daily" scripts like FreeBSD does, but both panics happened shortly after 3 a.m., which is when the scripts kick off on stock FreeBSD.

Prior to the crash there's usually a bunch of "signal 11" crashes of configd and other applications.

Older releases have had this issue and it's a similar pattern - a few panics every night after updating and over the time, fewer panics, which seems odd. This started with my move from pfsense, for whatever that's worth - I know the base OS in these two firewall distros is pretty different at this point.
Title: Re: [SOLVED] configd not running, won't start
Post by: sporkman on October 18, 2019, 02:59:21 am
I think I might have found something here.

Putting together a few things:

- lots of panics, which means just repeatedly trashing the filesystem
- background fsck on "/" (does it really work?)
- "pkg check" returning bad checksums after each panic
- today's panics happening during "health check", which likely walks large parts of the filesystem
- panics come in clusters - maybe one of the last ones fsck finally fixes something (but not everything because UFS w/SU does not get the attention ZFS does these days)
- SU is enabled on root, which may or may not be a good idea depending on who you ask

I have a theory... One of the first panics after moving from pfsense was just random - it happens. But it mangled something, which at some point caused another panic, which then also lead to more corruption of the fs and another panic, etc.

All the panics I have logged right now look like this:

panic: ufs_dirbad: /: bad dir ino 7877697 at offset 0: mangled entry
panic: ufs_dirbad: /: bad dir ino 7877697 at offset 0: mangled entry
panic: handle_written_inodeblock: Invalid link count 65535 for inodedep 0xfffff8001da83000
cpuid = 0

I wish I had access to some older ones, but these are all clearly dirty filesystem issues, no?

Going to the garage to shutdown and do a fsck in single user...
Title: Re: [SOLVED] configd not running, won't start
Post by: sporkman on October 18, 2019, 03:22:21 am
Also, from command line, any easy way to force a reinstall of the base? I'd like to be sure I don't have any corrupt files hanging around. The base OS is not pkg-ified, right?
Title: Re: [SOLVED] configd not running, won't start
Post by: sporkman on October 21, 2019, 10:21:53 pm
And another panic yesterday, but no mention of UFS.

Code: [Select]
Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0xffffffff80c1166d
stack pointer         = 0x28:0xfffffe0119a9b750
frame pointer         = 0x28:0xfffffe0119a9b7c0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 52270 (ntpd)

Code: [Select]
db:0:kdb.enter.default>  bt
Tracing pid 52270 tid 100179 td 0xfffff8001df9c000
__mtx_lock_sleep() at __mtx_lock_sleep+0xcd/frame 0xfffffe0119a9b7c0
_cv_wait_sig() at _cv_wait_sig+0x1f8/frame 0xfffffe0119a9b810
seltdwait() at seltdwait+0xc3/frame 0xfffffe0119a9b850
kern_select() at kern_select+0x850/frame 0xfffffe0119a9ba40
sys_select() at sys_select+0x56/frame 0xfffffe0119a9ba80
amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe0119a9bbb0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0119a9bbb0
--- syscall (93, FreeBSD ELF64, sys_select), rip = 0x2e98a427d0a, rsp = 0x7249552dfca8, rbp = 0x7249552dfce0 ---

Submitted via the built-in reporting tool.