Since v2.0 upgrade, packet engine won't stay running

Started by RutgerDiehard, June 12, 2025, 10:30:36 AM

Previous topic - Next topic
June 12, 2025, 10:30:36 AM Last Edit: June 12, 2025, 11:44:03 AM by RutgerDiehard
After upgrading to v2.0 yesterday, I noticed this morning the packet engine was not started. Started from the dashboard which updates to "Running" but a page refresh shows it as stopped.

This is a pretty signifficant issue which I've also raised through the "Have Feedback" section. Let's hope a swift response is received and fix provided.

It seems the eastpect service started to have issues way before the last event at 02:44 this morning where it filled the logs every 30 seconds with:

2025-06-12T02:44:11 Critical eastpect stack overflow detected; terminated
Now, attempting to start the packet service results in nothing in logs at all.

Quote from: RutgerDiehard on June 12, 2025, 11:43:37 AMIt seems the eastpect service started to have issues way before the last event at 02:44 this morning where it filled the logs every 30 seconds with:

2025-06-12T02:44:11 Critical eastpect stack overflow detected; terminated
Now, attempting to start the packet service results in nothing in logs at all.

Same issue. This update caused a disruption in all environments. Seems to be eastpect based on the logs.  Additionally, network traffic stops. The issue is present in emulated and native modes. I'm unsure as to why this wasn't caught during QA.

Yep same error here as well after upgrade.
2025-06-12T13:47:14 Critical eastpect stack overflow detected; terminated

Zenarmor has just responded to the ticket and suggest it's a netmap buffer issue.

They've asked to follow the step in the last post of this thread.

I've just added the bottom three tunables in the referenced post (the first was already there) and rebooted.

Zenarmor packet engine is now running and continues to run.

Note, I didn't change the MTU setting as I'm using PPPoE on the WAN and is already set lower than the suggested 1500.

This seems to have fixed the issue but will monitor for changes.

Hi,

Please could you share the logs via Have Feedback option.

Hello again,

Could you please provide detailed information about the issue, such as the OPNsense version and whether the Zenarmor engine is crashing? Submitting a report through the "Have Feedback" option will supply the necessary details, and our team will investigate it promptly. Thank you for your cooperation.

Quote from: sy on June 12, 2025, 06:33:50 PMHello again,

Could you please provide detailed information about the issue, such as the OPNsense version and whether the Zenarmor engine is crashing? Submitting a report through the "Have Feedback" option will supply the necessary details, and our team will investigate it promptly. Thank you for your cooperation.

Hi sy, as stated in the first post, I had already submitted a report using Have Feedback (request 12614) and had received the suggestion of netmap buffer issues as an email reply to that from Salih.

Woke up this morning to the same issue; packet engine stopped and won't restart. Logs show this every 30 seconds or so up until the last at 02:23:

2025-06-13T02:23:04 Critical eastpect stack overflow detected; terminated
This is getting tedious now.


interested to know what more you're running on the crashing instances. Mine has been doing fine and I've tinkered with my zenarmor installation, putting the db in ram etc. Aside from zenarmor I'm only using suricata, so not much that can interfere.

Quote2025-06-13T12:30:02   Critical   eastpect   stack overflow detected; terminated
is still there. Bug reported via Feedback chat"

Hi,

It seems that a Tunable setting resolves the issue. To fix it, add "dev.netmap.ring_num" with the value "1024" to System - Settings - Tunables or change if it is existing, and then restart OPNsense. After making this change, the Zenarmor engine should function correctly.

I was given the dev.netmap.ring_num = 1024 recommendation and that did not work for me.  However when I changed it to 2048 (because why not, bigger must be better lol) and that corrected my issue.  Now I might be an edge case as I get a lot of incoming traffic from a lot of IP address since I host a public ntp server.

So is the answer 1024 or 2048, and what's the downside of either? (Particularly the higher value).