After upgrading to v2.0 yesterday, I noticed this morning the packet engine was not started. Started from the dashboard which updates to "Running" but a page refresh shows it as stopped.
This is a pretty signifficant issue which I've also raised through the "Have Feedback" section. Let's hope a swift response is received and fix provided.
It seems the eastpect service started to have issues way before the last event at 02:44 this morning where it filled the logs every 30 seconds with:
2025-06-12T02:44:11 Critical eastpect stack overflow detected; terminated
Now, attempting to start the packet service results in nothing in logs at all.
Quote from: RutgerDiehard on June 12, 2025, 11:43:37 AMIt seems the eastpect service started to have issues way before the last event at 02:44 this morning where it filled the logs every 30 seconds with:
2025-06-12T02:44:11 Critical eastpect stack overflow detected; terminated
Now, attempting to start the packet service results in nothing in logs at all.
Same issue. This update caused a disruption in all environments. Seems to be eastpect based on the logs. Additionally, network traffic stops. The issue is present in emulated and native modes. I'm unsure as to why this wasn't caught during QA.
Yep same error here as well after upgrade.
2025-06-12T13:47:14 Critical eastpect stack overflow detected; terminated
Zenarmor has just responded to the ticket and suggest it's a netmap buffer issue.
They've asked to follow the step in the last post of this thread (https://forum.opnsense.org/index.php?topic=47283.0).
I've just added the bottom three tunables in the referenced post (the first was already there) and rebooted.
Zenarmor packet engine is now running and continues to run.
Note, I didn't change the MTU setting as I'm using PPPoE on the WAN and is already set lower than the suggested 1500.
This seems to have fixed the issue but will monitor for changes.
Hi,
Please could you share the logs via Have Feedback option.
Hello again,
Could you please provide detailed information about the issue, such as the OPNsense version and whether the Zenarmor engine is crashing? Submitting a report through the "Have Feedback" option will supply the necessary details, and our team will investigate it promptly. Thank you for your cooperation.
Quote from: sy on June 12, 2025, 06:33:50 PMHello again,
Could you please provide detailed information about the issue, such as the OPNsense version and whether the Zenarmor engine is crashing? Submitting a report through the "Have Feedback" option will supply the necessary details, and our team will investigate it promptly. Thank you for your cooperation.
Hi sy, as stated in the first post, I had already submitted a report using Have Feedback (request 12614) and had received the suggestion of netmap buffer issues as an email reply to that from Salih.
Woke up this morning to the same issue; packet engine stopped and won't restart. Logs show this every 30 seconds or so up until the last at 02:23:
2025-06-13T02:23:04 Critical eastpect stack overflow detected; terminated
This is getting tedious now.
interested to know what more you're running on the crashing instances. Mine has been doing fine and I've tinkered with my zenarmor installation, putting the db in ram etc. Aside from zenarmor I'm only using suricata, so not much that can interfere.
Quote2025-06-13T12:30:02 Critical eastpect stack overflow detected; terminated
is still there. Bug reported via Feedback chat"
Hi,
It seems that a Tunable setting resolves the issue. To fix it, add "dev.netmap.ring_num" with the value "1024" to System - Settings - Tunables or change if it is existing, and then restart OPNsense. After making this change, the Zenarmor engine should function correctly.
I was given the dev.netmap.ring_num = 1024 recommendation and that did not work for me. However when I changed it to 2048 (because why not, bigger must be better lol) and that corrected my issue. Now I might be an edge case as I get a lot of incoming traffic from a lot of IP address since I host a public ntp server.
So is the answer 1024 or 2048, and what's the downside of either? (Particularly the higher value).
The dev.netmap.ring_num = 1024 recommendation also did not work for me. Setting the value to 2048 addressed my issue. I'm no longer seeing this error: Critical eastpect stack overflow detected; terminated.
Never mind. I found out what I was "working". Zenarmor crashed and nothing was being filtered and until I looked at the dashboard, I did not notice. I'm sorry about the invalid data. I am personally working with emulated netdata driver and 1024 did not make a difference.
Same here 2048 is to large. This problem has not been resolved ZA team.
Hi,
The value range is between 2 and 1024. Therefore, 2048 is not applicable. Additionally, 1024 is not functioning for some users, and we are currently investigating the issue.
Same issue again this morning; packet engine stopped and unable to be restarted.
2025-06-14T03:42:57 Critical eastpect stack overflow detected; terminated
These are the settings that Support worked on remotely yesterday:
dev.netmap.buf_num runtime 1000000
dev.netmap.admode runtime 0
dev.netmap.ring_num runtime 1024
dev.netmap.buf_size runtime 2048
How can I downgrade to the previous stable version while this is worked on and resolved? I cannot have these crashes in the middle of the night.
Further to this, I run Suricata on the WAN interface which has been perfectly fine and have not had any issues for months. Recently, Suricata has been very quiet and there have been no detections for several weeks. So, questioning its value, I've just disabled Suricata IDS/IPS. I am now able to start the Zenarmor packet engine and it stays running.
One to note for troubleshooting purposes and maybe a workaround while this is resolved.
For those that it's failing are you configured for RSS? I see a warning in the console regarding RSS, but it worked in the past.
Quote from: Cljackhammer on June 14, 2025, 12:45:02 PMFor those that it's failing are you configured for RSS? I see a warning in the console regarding RSS, but it worked in the past.
I do have RSS enabled as well and in the past (a month ago) enabled it and haven't had issues until now.
RSS isn't the issue. The problem remains if it's disabled. Something changed with eastpect.
Are you all using Suricata?
FWIW as another data point, have been updated for a few hours now and no issues. Not using Suricata.
I have same problem,
ring_num increased to 1024, ZA still crashing after 7 hours of stability.
Regards,
S.
Same here. I have increased ring_num to 1024, not running Suricata, and ZA crashes within hours. Even sooner when I do any performance testing. I have temporarily uninstalled ZA as to have a fresh start when a fix is released (have been running 2.1 alpha releases as well).
Crashed again this morning with Suricata completely disabled. However, I could start it again from the dashboard page.
Zenarmor has provided instructions to downgrade to previous stable version, so I will roll back while the issue is worked on.
Quick question, do you all have "Do not pin engine packet processor to dedicated CPU cores" checked or unchecked?
I had mine checked but I tried unchecking it now and will see if that does anything.
I have Suricata installed but not enabled for IPS mode.
Quote from: Lurick on June 15, 2025, 12:46:54 PMQuick question, do you all have "Do not pin engine packet processor to dedicated CPU cores" checked or unchecked?
I had mine checked but I tried unchecking it now and will see if that does anything.
I have Suricata installed but not enabled for IPS mode.
Mine was unchecked. I didn't test with it checked.
Quote from: RutgerDiehard on June 15, 2025, 12:51:49 PMQuote from: Lurick on June 15, 2025, 12:46:54 PMQuick question, do you all have "Do not pin engine packet processor to dedicated CPU cores" checked or unchecked?
I had mine checked but I tried unchecking it now and will see if that does anything.
I have Suricata installed but not enabled for IPS mode.
Mine was unchecked. I didn't test with it checked.
Hmmm, ok, I had crashes with it checked so that's likely just a red herring then
Helpdesk suggested for me as well dev.netmap.ring_num=1024 fix and now I'm observing different behavior. It still reporting "eastpect stack overflow detected; terminated" but process keeps running and firewall appears to be working. I restarted manually engine ~6h ago and htop is reporting that process has been running since.
61 processes: 1 running, 60 sleeping
CPU: 2.2% user, 0.0% nice, 0.7% system, 0.0% interrupt, 97.1% idle
Mem: 1692M Active, 4284M Inact, 792K Laundry, 8716M Wired, 192K Buf, 993M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
34985 root 13 20 -20 8457M 274M nanslp 1 6:36 11.39% eastpect
25132 root 11 20 -20 1254M 39M uwait 0 1:51 0.91% ipdrstreamer
Quote from: vutt01 on June 15, 2025, 03:55:17 PMHelpdesk suggested for me as well dev.netmap.ring_num=1024 fix and now I'm observing different behavior. It still reporting "eastpect stack overflow detected; terminated" but process keeps running and firewall appears to be working. I restarted manually engine ~6h ago and htop is reporting that process has been running since.
61 processes: 1 running, 60 sleeping
CPU: 2.2% user, 0.0% nice, 0.7% system, 0.0% interrupt, 97.1% idle
Mem: 1692M Active, 4284M Inact, 792K Laundry, 8716M Wired, 192K Buf, 993M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
34985 root 13 20 -20 8457M 274M nanslp 1 6:36 11.39% eastpect
25132 root 11 20 -20 1254M 39M uwait 0 1:51 0.91% ipdrstreamer
It helps to stabilize it a bit. But doesn't fixes it. If the Engine starts to crash in cascade ZA stops it and it needs to be manually started.
Issue is not fixed by increasing the ring_num.
Regards,
S.
Yup, just had another crash after about 5-6 hours of stability
Hi all,
Thank you for your patience. We've identified a fix for the issue and are currently testing it. If you'd like to test it as well, please reach out to support for detailed instructions. The fix is scheduled to be included in the 2.0.1 maintenance release later this week.
Quote from: sy on June 16, 2025, 04:18:33 PMHi all,
Thank you for your patience. We've identified a fix for the issue and are currently testing it. If you'd like to test it as well, please reach out to support for detailed instructions. The fix is scheduled to be included in the 2.0.1 maintenance release later this week.
2.0.1 was offered as an upgrade yesterday after checking for updates in OPNsense. I've installed, removed and re-added my subscription key - to enable additional device and policy support - and can confirm the packet engine is running without issue this morning.
Same here after 2.0.1 update yesterday no crashes so far
Same here, 2.0.1 appears to have fixed this issue. No crashes for almost 24 hours so far since upgrading.
Thanks for the 2.0.1 update reports, I was sitting back and waiting to do this on my production system until things were "burned in" a little longer. This was mostly a time consideration right now since there were some workarounds in place.
Upgrading to 2.0.1 addressed my issues as well.
Sad for me, 2.0 and 2.0.1 does not work for me ... No data through graph, no device detected as online ...
Revert to 1.18.6 and everything is working ...
I'm using VLAN with zenarmor to protect only one of them (VLAN0.1 is the main and VLAN0.10 is the one protected for the kids)
I've excluded the VLAN0.1 and some IPs of the VLAN0.10 (the wifi router .1 and .2 ips)
So do not know what to do for now
Hi @deuch,
Could you share the logs via Have Feedback option in the bottom left orner of UI by selecting all checkboxes?
I was having network disconnects, WiFi dropping, and packet engine stopping after upgrading to 2.0. Increasing ring to 1024 did increase WiFi and network stability but engine still kept stopping. Installing 2.0.1 fixed this without the edited ring setting (back to default value).
I also noticed that my RAM usage dropped from ~80% usage out of 8GB with V1.X to now ~30% with V2.0 and V2.0.1. I have been running SQLite database.
Quote from: sy on June 19, 2025, 11:41:23 AMHi @deuch,
Could you share the logs via Have Feedback option in the bottom left orner of UI by selecting all checkboxes?
I've a lot of this in dmesg
pid 8814 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 30133 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 64611 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 38308 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 23508 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 27460 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 51134 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 23321 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 41747 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 88047 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 52813 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 18000 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 12243 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 12324 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 54145 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 25375 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 78260 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
pid 79066 (eastpect), jid 0, uid 0: exited on signal 6 (no core dump - bad address)
Hi,
Version 2.0.2 is expected to resolve all crash issues. Could you please confirm if it does?