OPNsense down after update, multiple netmap errors, Zenarmor conflict?

Started by doug_phoenix, July 23, 2023, 12:33:24 AM

Previous topic - Next topic
Hi all,

Yesterday I updated OPNsense to (what I presume is) the latest production version. It look a couple of reboots to bring the network back up. All seemed well for several hours, then the network went down hard. (Prior to this OPNsense has been running great as-configured below.)

QuoteHardware: Protectli VP2410 (four port), 16 MB DRAM, 250 GB SSD

After restoring an old backup (see below):
OPNsense 23.1.11-amd64
FreeBSD 13.1-RELEASE-p8
OpenSSL 1.1.1u 30 May 2023

Assignments: Wan (igb0), LAN (igb1), LACP (igb 2 and 3) with a few vlans

Plugins: os-sunnyvalley, os-sensei, os-sensei-agent, os-sensei-updater
ZenArmor is configured with elasticsearch.

Today I reconnected the console and I've rebooted a few times. I've also restored backups from June via the console. My network does come up briefly after a ~25min reboot cycle (!), but goes back down (occasionally going up again for a few moments).

BTW my version after update showed 23.1.11, and I don't think that's the latest.

My console shows a large number of messages relating to netmap and emulated adapters. For example:
Quote... generic_netmap_unregister Emulated adapter for vlan02 deactivated
...generic_netmap_d tor Emulated adapter for vlan02 destroyed

Same for vlans 03 and 04.

Watching the console for awhile I see:
QuoteIgb2 (and 3): Interface stopped DISTRIBUTING, possible flapping

I spent hours troubleshooting and rebooting etc. Checked and rebooted my switch. Access to the UI was hit and mostly-miss. I recall from some time ago that there could be some issues with LAGG and netmap with ZenArmor, so I put ZenArmor packet engine in bypass mode after several attempts. This seemed to help. But the same errors reappeared. Next I stopped all three ZenArmor services (packet engine, Elasticsearch, and Cloud agent) and set each NOT to start on reboot. Seeing the same behavior, I rebooted again.

Now OPNsense is running and my network is back up. I do not see netmap or other errors on the console. So my 'emergency' is contained.  :)

However, the UI is sluggish and I'm unable to update OPNsense from the UI.
Quotepkg: sqlite error while executing CREATE TABLE packages ...

Could someone please offer some help/advice? I'd appreciate it!

Thanks, Doug

Update:

I've removed the Zenarmor/Sensei plugins, and was able to download updates to OPNsense (minor updates, still on v. 23.1.11).

Running an audit I see:

Quote***GOT REQUEST TO AUDIT HEALTH***
Currently running OPNsense 23.1.11 at Sun Jul 23 11:30:08 MST 2023
>>> Check installed kernel version
Version 23.1.11 is correct.
>>> Check for missing or altered kernel files
No problems detected.
>>> Check installed base version
Version 23.1.11 is correct.
>>> Check for missing or altered base files
Error 2 ocurred.
etc/sysctl.conf:
   size (311, 345)
   sha256digest (0x8c57d647047d84b9be4cddbb0b6d58c1d5839f148b62d1137b8bf2611f681cfd, 0x06ec8255e5fdfb4ccaf2059bc0d12c92554e4ba8f92b9d4c51af74ba58ba00c9)
>>> Check installed repositories
OPNsense
>>> Check installed plugins
No plugins found.
>>> Check locked packages
No locks found.
>>> Check for missing package dependencies
Checking all packages: .......... done
>>> Check for missing or altered package files
Checking all packages: .......... done
>>> Check for core packages consistency
Core package "opnsense" has 67 dependencies to check.
Checking packages: .................................................................... done
***DONE***

So it looks like the chksum for sysctl.conf is incorrect. Looking for a way to resolve this.

Any ideas??

Install OPNsense in a VM, uprade to 23.1.11, copy the sysclt.conf and then open both files in WinMerge.

The changes you have there could be related to Sensei - or some other manual tweaks you ma have done.

Thank you.  :)

That seems like a good idea. Unfortunately for me, I'm not running any VM's. So it would be a major activity for me to use this approach.  :(

Looking for something easier.

This is what I'm seeing on my FWs and on the file copied from the ISO# $FreeBSD$
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0

Interesting; thank you.  :D

Mine looks the same except there is a last line that looks like a location for a kernel core file:

kern.corefile = /root/%N.%P.core

I'm a linux novice, but I don't see any files in root that end with .core.

I'm not certain that it would be safe to delete this last line - what is your opinion?

Thanks

Put a # in front of it, save and reboot.

Doubt it's being used by anything. You can leave it commented out for a while and delete later

Done. System rebooted with no issues.

Running the health audit, I see an equivalent error. But I'm no longer concerned about this.

Digging a bit deeper, I've learned that this is one of two config files used by OPNsense to tweak performance.  See "Tuneables" at: https://docs.opnsense.org/manual/settingsmenu.html

Also https://teklager.se/en/knowledge-base/opnsense-performance-optimization/ and https://calomel.org/freebsd_network_tuning.html

I'll leave that stuff to the experts like you.  :)

I think I'll take a break from troubleshooting the issue with Zenarmor. But I'm already missing the ad-blocking feature!

Thank you very much for you help with this.

Doug

Not entirely sure what happened there, in terms of cleaning up the traffic there are 3 more options:


1) use AdguardHome on the firewall, installation is easy from mimugmail's repo

https://github.com/mimugmail/opn-repo


2) Docker/Podman containers with AdguardHome or pi-hole


3) use the blocking lists in Unbound on OPNsense

Thank you!

I hadn't realized I could run AdGuard directly from the firewall.

I do like Zenarmor, except of course for the issues I dealt with lately.

It's curious that most of the errors I saw were related to emulated netmap. I had configured native netmap for ZoneArmor.

I appreciate your excellent help.