Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - sporkman

#16
Also, from command line, any easy way to force a reinstall of the base? I'd like to be sure I don't have any corrupt files hanging around. The base OS is not pkg-ified, right?
#17
I think I might have found something here.

Putting together a few things:

- lots of panics, which means just repeatedly trashing the filesystem
- background fsck on "/" (does it really work?)
- "pkg check" returning bad checksums after each panic
- today's panics happening during "health check", which likely walks large parts of the filesystem
- panics come in clusters - maybe one of the last ones fsck finally fixes something (but not everything because UFS w/SU does not get the attention ZFS does these days)
- SU is enabled on root, which may or may not be a good idea depending on who you ask

I have a theory... One of the first panics after moving from pfsense was just random - it happens. But it mangled something, which at some point caused another panic, which then also lead to more corruption of the fs and another panic, etc.

All the panics I have logged right now look like this:

panic: ufs_dirbad: /: bad dir ino 7877697 at offset 0: mangled entry
panic: ufs_dirbad: /: bad dir ino 7877697 at offset 0: mangled entry
panic: handle_written_inodeblock: Invalid link count 65535 for inodedep 0xfffff8001da83000
cpuid = 0

I wish I had access to some older ones, but these are all clearly dirty filesystem issues, no?

Going to the garage to shutdown and do a fsck in single user...
#18
19.7 Legacy Series / Re: Recommend me a VPN
October 11, 2019, 05:25:46 AM
Never had much luck with IPSEC and since it's a kernel-level thing instead of a userland daemon, generally a real pain in the ass to debug.
#19
19.7 Legacy Series / Recommend me a VPN
October 10, 2019, 01:02:57 AM
I'm kind of annoyed with OpenVPN as I could never get it to work in my particular scenario for site-to-site use. I find it's great for getting from a coffee shop to my home net though, so I'll leave that as-is.

But I have 3-4 other sites where I would like to have site-to-site setups between my home (simple network - two WANs, one just for backup, one LAN net, that's it) and some remote networks.

My requirements are:

- The other end only has proprietary stuff that only does IPSEC, so I have to tunnel back to a FreeBSD host at the other end rather than the router (I know this complicates things)
- I need to filter the traffic on my end - I should be able to reach out, none of the remote sites should reach in
- I do need to add additional routes, accessed via the remote sites
- The other end is FreeBSD in all cases, so whatever I run has to support FreeBSD

OpenVPN confuses me in these type of use cases as it has it's own internal/hidden routing table. If anyone thinks it could support the above, I'd give it a try, but I've had no luck with this on OPNSense (worked on pfsense, but not with any setup that let me filter traffic).

Or if you want to make a case for using the Cisco and SonicWall IPSEC VPNs at these sites instead, I'm all ears, but I fear interoperability headaches, and it seems like adding additional remote routes is a real pain.

Or pitch me on something I've not mentioned! :)
#20
Is there any way for me to retrieve the stack traces that were submitted via the built-in reporting system? I sent the last two in.

Not sure if opnsense runs the "daily" scripts like FreeBSD does, but both panics happened shortly after 3 a.m., which is when the scripts kick off on stock FreeBSD.

Prior to the crash there's usually a bunch of "signal 11" crashes of configd and other applications.

Older releases have had this issue and it's a similar pattern - a few panics every night after updating and over the time, fewer panics, which seems odd. This started with my move from pfsense, for whatever that's worth - I know the base OS in these two firewall distros is pretty different at this point.
#21
And just noting when I went out last night I let memtest run for a few hours, still no errors.

Do I open a github issue or no?
#22
I will, but I'm looking for some guidance on what I can do on the hardware side first.

I just ran a single pass of memtest86+ and that completed fine.

I ran "stresscpu" for about 20 minutes and nothing odd happened.

I ran the manufacturer's disk diag tools and they came up empty.

I should probably run memtest again and let it do like 10 passes or so to be sure.

After that though, my gut feeling is it's a HardenedBSD issue - perhaps my older hardware is an edge case or something. In addition to the panics, I'll occasionally see a message about python SIGSEGV'ing with an extra note from HBSD's stack protection or something.

Anyhow, some pointers before I waste time on the issue tracker would be appreciated.
#23
So those panics...

That seems kind of abnormal and something that's new to this box after migrating from pfsense.

It's an old Dell SFF, Core2Duo. If I get a clean bill of health on the RAM from memtest86+, what's next?

I did report the last two with the included "report this problem" tool, nothing stood out to me, but what do I know?
#24
19.7 Legacy Series / Re: configd not running, won't start
September 26, 2019, 10:13:43 PM
Yep, started right up.
#25
19.7 Legacy Series / Re: configd not running, won't start
September 26, 2019, 04:43:55 AM
I did a "pkg install -f pkgname" for all of them with mismatches (a later run of "pkg check -sa" turned up a few more).

I assume this is corruption/data loss from the box panicing.
#26
19.7 Legacy Series / Re: configd not running, won't start
September 26, 2019, 01:59:50 AM
Yep, first one is OK, second fails:


root@SporkLab:/home/sporkadmin # pkg check -da
Checking all packages: 100%
root@SporkLab:/home/sporkadmin # pkg check -sa
Checking all packages:  83%
py37-yaml-5.1: missing file /usr/local/lib/python3.7/site-packages/yaml/__pycache__/constructor.cpython-37.pyc
Checking all packages:  85%
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/optparse.cpython-37.opt-2.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/random.cpython-37.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/stat.cpython-37.opt-1.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/__pycache__/stat.cpython-37.opt-2.pyc
python37-3.7.4: missing file /usr/local/lib/python3.7/functools.py
python37-3.7.4: missing file /usr/local/lib/python3.7/lib2to3/pgen2/__pycache__/token.cpython-37.opt-1.pyc
Checking all packages: 100%
root@SporkLab:/home/sporkadmin
#27
Logged-in to the GUI today to see if the box is still panicing every night and most of the data in the dashboard was blank.

In the "services" pane, I saw that "configd" was not running.  On trying to start it, this is logged in the system logs:

opnsense: /status_services.php: The command '/usr/local/etc/rc.d/configd start' returned exit code '1',
the output was 'Starting configd. Traceback (most recent call last): File "/usr/local/opnsense/service/configd.py",
line 37, in <module> import logging File "/usr/local/lib/python3.7/logging/__init__.py", line 26, in <module> import sys,
os, time, io, traceback, warnings, weakref, collections.abc File "/usr/local/lib/python3.7/traceback.py", line 5,
in <module> import linecache File "/usr/local/lib/python3.7/linecache.py", line 8, in <module> import functools
ModuleNotFoundError: No module named 'functools' /usr/local/etc/rc.d/configd: WARNING: failed to start configd'


Any idea what that's about?

No access via ssh, I assume something is broken there as well.
#28
19.7 Legacy Series / Re: System Reboots Itself!
September 24, 2019, 04:54:01 AM
Quote from: JdeFalconr on September 20, 2019, 03:27:49 PM
Thanks in advance for your help. Newly-built system that looks to be rebooting itself randomly. I'm really not sure how to troubleshoot this. I don't see much in logs but one of the problem/crash reports comes up every time after this happens.

Interesting. On a Core2Duo box I've had nightly panics since the last update. Always happens shortly after 3:00 a.m., which on stock FreeBSD is when a bunch of daily cron jobs run.

It's odd because some prior releases had random panics and I just assumed it was either the particular kernel + whatever patches these folks use or some combo of under-tested modules (for me, turning off IDS in one of the old releases seemed to stop the panics). Prior to that this was a pfsense box and no panics there, so I'm not really thinking it's hardware. I sometimes wonder if it's the HardenedBSD stuff - it's always in the logs and in the old days of OpenBSD I remember lots of their security precautions/checks could end up causing stability issues on quirky systems they'd not tested with.

Trying to find a time when I can go without internet for the many hours memtest86 consumes. :)
#29
Spoke too soon, got another one tonight:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x65
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff810bb549
stack pointer         = 0x28:0xfffffe011a1e8890
frame pointer         = 0x28:0xfffffe011a1e88e0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 67259 (php)


I sent a report in with the automated thing.

Is there interest in debugging this or no? There are things I dislike about pfsense, but the basic "it doesn't panic" feature I do enjoy.
#30
Sounds not unlike this:

https://forum.opnsense.org/index.php?topic=9916

Basically I'm seeing traffic that should go down the tunnel go out the main WAN interface. Probably something weird with how OpenVPN has its own routing table, or something to do with outbound NAT rules...