Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - stuckoff

#1
Hi everyone,

I'm experiencing a recurring system instability issue with one of my appliances that started around December 21st, 2025. After 6 months of perfect stability, the system now becomes unresponsive almost every night at exactly 00:01.

Symptoms:
    Connectivity: Internet access stops for the network.
    Management: No access to WebGUI or SSH.
    Persistence: The management IP still responds to Pings.
    HA/CARP: Interestingly, services do not fail over to the secondary node because the primary node keeps its CARP VIPs (the kernel is still "alive" enough to prevent failover, but the userland is dead).

Logs: The system logs point clearly to an Out of Memory (OOM) event and swap exhaustion:

2026-01-06T00:11:26 Notice lockout_handler lockout 138.197.98.69 [using table sshlockout] after 6 attempts
2026-01-06T00:06:30 Notice kernel swp_pager_getswapspace(15): failed
2026-01-06T00:06:26 Notice kernel <3>pid 67504 (i2RsVQl2), jid 0, uid 0, was killed: failed to reclaim memory
2026-01-06T00:06:21 Notice kernel swp_pager_getswapspace(14): failed
2026-01-06T00:06:21 Notice kernel swap_pager: out of swap space
2026-01-06T00:05:41 Notice kernel <3>pid 70232 (i2RsVQl2), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:05:26 Notice kernel <3>pid 66425 (8bcK6gTx), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:05:06 Notice kernel <3>pid 64687 (8bcK6gTx), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:04:47 Notice kernel <3>pid 62108 (8bcK6gTx), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:04:22 Notice kernel <3>pid 61411 (i2RsVQl2), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:04:07 Notice kernel <3>pid 60957 (8bcK6gTx), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:03:53 Notice kernel <3>pid 59937 (i2RsVQl2), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:03:33 Notice kernel <3>pid 59157 (8bcK6gTx), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:03:14 Notice kernel <3>pid 59042 (i2RsVQl2), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:02:54 Notice kernel <3>pid 57475 (8bcK6gTx), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:02:37 Notice kernel <3>pid 57642 (i2RsVQl2), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:02:24 Notice kernel <3>pid 54836 (8bcK6gTx), jid 0, uid 0, was killed: failed to reclaim memory
2026-01-06T00:02:21 Notice kernel <3>pid 54500 (i2RsVQl2), jid 0, uid 0, was killed: failed to reclaim memory
2026-01-06T00:02:20 Notice kernel <3>pid 50993 (i2RsVQl2), jid 0, uid 0, was killed: failed to reclaim memory
2026-01-06T00:02:18 Notice kernel <3>pid 51388 (8bcK6gTx), jid 0, uid 0, was killed: failed to reclaim memory
2026-01-06T00:02:13 Notice kernel swp_pager_getswapspace(24): failed
2026-01-06T00:02:13 Notice kernel swap_pager: out of swap space
2026-01-06T00:02:01 Notice kernel swap_pager: out of swap space
2026-01-06T00:02:00 Notice kernel <3>pid 40094 (i2RsVQl2), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:01:47 Notice kernel <3>pid 40583 (8bcK6gTx), jid 0, uid 0, was killed: a thread waited too long to allocate a page
2026-01-06T00:01:10 Notice kernel <3>pid 49376 (PfNxCZtE), jid 0, uid 0, was killed: a thread waited too long to allocate a page

Scheduled Tasks: The crash timing (00:01) coincides with a cron job that I have set to run hourly:

1      *     *       *       *       (/usr/local/sbin/configctl -d syslog archive) > /dev/null
My Questions:

    Identification: How can I identify which process is actually causing the leak? The PIDs mentioned in the logs (i2RsVQl2, 8bcK6gTx, PfNxCZtE) have randomized/obfuscated names—is this normal for certain plugins, or a sign of something else?
    Timing: If the cron job runs every hour, why does the crash only occur at the midnight (00:01) run and not at 23:01 or 01:01?
    Root Cause: Since this was stable for 6 months, could this be related to log rotation/archiving of a specifically large "daily" log file that builds up?

Any advice on how to debug this via console or remote logging before the crash occurs would be greatly appreciated.
#2
25.7, 25.10 Series / DEC2752 - How to check hardware
January 16, 2026, 07:59:14 AM
Hi

I have some issues with my appliance and I want to run some hardware tests.
Does someone knows how to run such tests on the system:
- memtest86 or similar
- disk for bad sectors
- CPU etc..

Thanks
#3
Development and Code Review / API result:failed
February 02, 2025, 10:14:48 AM
Hi

I'm trying to automate some simple tasks on OPNSense via the API, but for some reason here is my cURL commnad:
curl -H Accept: application/json -k -u  XXXXXXXXX:XXXXXXXXX https://192.168.100.3/api/interfaces/vlan_settings/addItem  -d '{vlanif:vlan0.777,if: lagg1,tag: 777,pcp:0,proto:,descr:VLAN_777_TEST}'
The result of this is:
{"result":"failed"}
At the same time if I change the request to:
curl -H Accept: application/json -k -u  XXXXXXXXX:XXXXXXXXX https://192.168.100.3/api/interfaces/vlan_settings/get
the system returns correct response.

Do I need to change the data format?