Since v2.0 upgrade, packet engine won't stay running

Started by RutgerDiehard, June 12, 2025, 10:30:36 AM

Previous topic - Next topic
Quote from: RutgerDiehard on June 15, 2025, 12:51:49 PM
Quote from: Lurick on June 15, 2025, 12:46:54 PMQuick question, do you all have "Do not pin engine packet processor to dedicated CPU cores" checked or unchecked?
I had mine checked but I tried unchecking it now and will see if that does anything.
I have Suricata installed but not enabled for IPS mode.

Mine was unchecked. I didn't test with it checked.

Hmmm, ok, I had crashes with it checked so that's likely just a red herring then

Helpdesk suggested for me as well  dev.netmap.ring_num=1024 fix and now I'm observing different behavior. It still reporting "eastpect   stack overflow detected; terminated" but process keeps running and firewall appears to be working. I restarted manually engine ~6h ago and htop is reporting that process has been running since.
61 processes:  1 running, 60 sleeping
CPU:  2.2% user,  0.0% nice,  0.7% system,  0.0% interrupt, 97.1% idle
Mem: 1692M Active, 4284M Inact, 792K Laundry, 8716M Wired, 192K Buf, 993M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
34985 root         13  20  -20  8457M   274M nanslp   1   6:36  11.39% eastpect
25132 root         11  20  -20  1254M    39M uwait    0   1:51   0.91% ipdrstreamer

Quote from: vutt01 on June 15, 2025, 03:55:17 PMHelpdesk suggested for me as well  dev.netmap.ring_num=1024 fix and now I'm observing different behavior. It still reporting "eastpect   stack overflow detected; terminated" but process keeps running and firewall appears to be working. I restarted manually engine ~6h ago and htop is reporting that process has been running since.
61 processes:  1 running, 60 sleeping
CPU:  2.2% user,  0.0% nice,  0.7% system,  0.0% interrupt, 97.1% idle
Mem: 1692M Active, 4284M Inact, 792K Laundry, 8716M Wired, 192K Buf, 993M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
34985 root         13  20  -20  8457M   274M nanslp   1   6:36  11.39% eastpect
25132 root         11  20  -20  1254M    39M uwait    0   1:51   0.91% ipdrstreamer

It helps to stabilize it a bit. But doesn't fixes it. If the Engine starts to crash in cascade ZA stops it and it needs to be manually started.

Issue is not fixed by increasing the ring_num.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Yup, just had another crash after about 5-6 hours of stability

Hi all,

Thank you for your patience. We've identified a fix for the issue and are currently testing it. If you'd like to test it as well, please reach out to support for detailed instructions. The fix is scheduled to be included in the 2.0.1 maintenance release later this week.


Quote from: sy on June 16, 2025, 04:18:33 PMHi all,

Thank you for your patience. We've identified a fix for the issue and are currently testing it. If you'd like to test it as well, please reach out to support for detailed instructions. The fix is scheduled to be included in the 2.0.1 maintenance release later this week.



2.0.1 was offered as an upgrade yesterday after checking for updates in OPNsense. I've installed, removed and re-added my subscription key - to enable additional device and policy support - and can confirm the packet engine is running without issue this morning.

Same here after 2.0.1 update yesterday no crashes so far

Same here, 2.0.1 appears to have fixed this issue. No crashes for almost 24 hours so far since upgrading.

Thanks for the 2.0.1 update reports, I was sitting back and waiting to do this on my production system until things were "burned in" a little longer. This was mostly a time consideration right now since there were some workarounds in place.

Upgrading to 2.0.1 addressed my issues as well.

Sad for me, 2.0 and 2.0.1 does not work for me ... No data through graph, no device detected as online ...
Revert to 1.18.6 and everything is working ...

I'm using VLAN with zenarmor to protect only one of them (VLAN0.1 is the main and VLAN0.10 is the one protected for the kids)
I've excluded the VLAN0.1 and some IPs of the VLAN0.10 (the wifi router .1 and .2 ips)

So do not know what to do for now

Hi @deuch,

Could you share the logs via Have Feedback option in the bottom left orner of UI by selecting all checkboxes?



June 20, 2025, 07:55:13 PM #42 Last Edit: June 20, 2025, 08:08:51 PM by ColonelKurtz
I was having network disconnects, WiFi dropping, and packet engine stopping after upgrading to 2.0. Increasing ring to 1024 did increase WiFi and network stability but engine still kept stopping. Installing 2.0.1 fixed this without the edited ring setting (back to default value).

I also noticed that my RAM usage dropped from ~80% usage out of 8GB with V1.X to now ~30% with V2.0 and V2.0.1. I have been running SQLite database.

Quote from: sy on June 19, 2025, 11:41:23 AMHi @deuch,

Could you share the logs via Have Feedback option in the bottom left orner of UI by selecting all checkboxes?




I've a lot of this in dmesg

pid 8814 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 30133 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 64611 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 38308 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 23508 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 27460 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 51134 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 23321 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 41747 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 88047 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 52813 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 18000 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 12243 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 12324 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 54145 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 25375 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 78260 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 79066 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)

Hi,

Version 2.0.2 is expected to resolve all crash issues. Could you please confirm if it does?