Hi,
I notice in the Zenarmor Dashboard that the Cloud Nodes Status drops over time to 0%
This has been the case since 1.17.1 (maybe earlier, I am a new user to Opnsense and Zenarmor),
in 1.17.2 the Global CTI Server disappeared ( as described in the release notes) , showing only Europe and Europe2 for me.
In addition, in 1.17.2, in the Zenarmor -> Settings -> Cloud Thread Intelligence -> Cloud Reputation Servers
I see no longer any Servers. In 1.17.1 I had a bunch of Servers shown there (US, Australia, etc).
The "Re-check Reputation Servers" Button is still there.
In the Log under System -> Log Files -> Backend I have Log entries like the following coming once per
Minute (so likely triggered by the "sensai periodicals" cron Job), which could be related:
[d65d2101-36a1-46cf-861d-7cfaaa43934b] Script action failed with Command '/usr/local/opnsense/scripts/OPNsense/Zenarmor/nodes_status.py --mode 'read'' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 44, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/OPNsense/Zenarmor/nodes_status.py --mode 'read'' returned non-zero exit status 1.
Also, I have Logs once per Minute pointing to
/usr/local/opnsense/scripts/OPNsense/Zenarmor/userenrich.py
which may or may not be related.
Does Cloud Nodes Status = Down or Cloud Nodes Status < 100% imply loss of security,
because Reputation Data cannot be retreived, or does it cause Network Delays because
Queries to the Reputation Servers are stalling?
I am not 100% sure but it could be the case that the CTI Servers disappeared after the thast
Upgrade to Opnsense itself, which came 2 days or so after Zenarmor 1.17.2. Opnsense ist at 24.1.7.
best regards,
Stephan
I have the same problem :(
Hello,
I'm also experiencing the same issue as you are and contacted the ZenArmor support with the same issue. Let's see what comes out of their recommendations, latest version update broke the Cloud Reputation Servers list.
Backend log full of this, like the with original thread poster.
2024-05-19T16:23:06 Error configd.py [1736955c-ddf2-4ca4-99b4-6864a9bec47e] Script action failed with Command '/usr/local/opnsense/scripts/OPNsense/Zenarmor/userenrich.py ' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 44, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/OPNsense/Zenarmor/userenrich.py ' returned non-zero exit status 1.
Hi,
Thanks for the reporting. It doesn't seem a general problem. All servers are up and running. Can you check if OPNsense can reach both ICMP and UDP port 5355 to the used Servers. The server IP list is below
US-West 104.198.6.78
US-Central 104.155.129.221
US-East 34.74.12.235
Europe 35.198.172.108
Europe2 34.65.117.157
Asia 34.92.15.156
Asia2 35.244.50.89
Australia 35.189.37.160
Fact still remains that the problem started immediately after the last ZenArmor update and Cloud Reputation Server functionality broke, why and how to fix it?
Second time already in short while spending timing fixing unknown problems after ZenArmor updates, getting somewhat frustrated.
Tests with UDP to the Europe servers, ICMP works fine and UDP test with NC:
Connection to 35.198.172.108 5355 port [udp/*] succeeded!
^C
root@OPNsense:~ # nc -u -v 34.65.117.157 5355
Connection to 34.65.117.157 5355 port [udp/*] succeeded!
^C
SY,
I tried a Zenarmor reset as that was the only thing working in the Zenarmor menu. The reset seemed to work initially up to selecting a database type. I noticed there is now an Elastic 5 and 8 version database you can choose. I tried the version 8, and the installer said to make sure the Zenarmor cloud agent was connected. So after running that routine twice with no joy, I decided to uninstall from the Opnsense package manager and then reinstall. The reinstall failed as no Zenarmor entry was set in the Opnsense menu. So I have just uninstalled again and will leave it for a bit until the next Opnsense update. I am on the dev firmware so understand these things will happen.
Cheers!
I can ping the CTI Servers.
The test with nc -u seems to be a bit pointless,
I also get the succeeded reply, but when I try random ports
or IP adresses I get the same succeeded message.
I always have to use CRTL-C to get out of nc then - no further activity
occours after the succeeded message.
What would be the correct way to test UDP Connectivity to those servers?
running
root@opnsense:~ # nc -u -v 104.198.6.78 5355
Connection to 104.198.6.78 5355 port [udp/*] succeeded!
in one window and
root@opnsense:~ # tcpdump -n -i pppoe1 host 104.198.6.78
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pppoe1, link-type NULL (BSD loopback), capture size 262144 bytes
11:20:20.368732 IP 113.30.181.18.2034 > 104.198.6.78.5355: UDP, length 1
11:20:20.368768 IP 113.30.181.18.2034 > 104.198.6.78.5355: UDP, length 1
11:20:20.368801 IP 113.30.181.18.2034 > 104.198.6.78.5355: UDP, length 1
11:20:20.368818 IP 113.30.181.18.2034 > 104.198.6.78.5355: UDP, length 1
in another window tells me nothing comes back.
Adding the Source of the CTI Servers and the Source Port 5535 into an ACL didn't change
anything, so either really nothing is coming back, or the servers only respond to
specific UDP requests which authenticate as legitime queries, which is what I think and hope ...
This used to happen in the past as well, IIRC it had something to do with Zenarmor flagging the cloud servers as down after let's say 10 minutes, but then it only pings them once every 20 minutes. So they're always shown as down towards the end of the ping interval.
Hi All,
Do you protect the WAN interface on Zenarmor?
I do not protect WAN Interfaces with Zenarmor, only internal LAN and Guest Networks
I was LAN only with Zenarmor as well. Use CROWDSEC and SURICATA for WAN.
Recent Updates (OPNsense 24.1.7_4-amd64 and Zenarmor 1.17.3 - May 20, 2024 4:08 PM)
brought back the CTI Server list in the Zenarmor Settings -> Cloud Threat Intelligence.
Cloud Nodes Status still goes down to "DOWN 0%"
Same here.
Cloud Nodes Statu ist down.
Europe and Europe2.
This happens after update
Ok,
after make off->on in "~ui/zenarmor/#/0/settings/cloud-threat-intelligence" for Europe Nodes it runs now.
lewald,
can you please describe in a liitle more detail? I do not have / can not find those files / directories.
cd ~ui/zenarmor/#/0/settings gives me "no such user"
Thanks a lot,
Regards,
Stephan
He ment in the UI, there is enable disable button on top of that section.
@Sy
I upgrated today to 24.1.7 & 1.17.3
The Cloud Reputation Servers are indeed empty there is nothing in that section after upgrading to 1.17.3
Looks like something is failing
6c348fd6-a31a-4f60-805e-85accda81ccb] Script action failed with Command '/usr/local/opnsense/scripts/OPNsense/Zenarmor/nodes_status.py --mode 'read'' returned non-zero exit status 1. at Traceback (most recent call last): File "/usr/local/opnsense/service/modules/actions/script_output.py", line 44, in execute subprocess.check_call(script_command, env=self.config_environment, shell=True, File "/usr/local/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/usr/local/opnsense/scripts/OPNsense/Zenarmor/nodes_status.py --mode 'read'' returned non-zero exit status 1.
This looks like is related to the issues in your previous post >
https://forum.opnsense.org/index.php?topic=40513.msg198743#msg198743
Running
pkg install -fy os-sensei
Fixes it.
Regards,
S.
Hi Stephan,
Do you still have this issue?
yes, node status sill "DOWN" 0% after some time. Still not shure if it is really an issue or if the system is using
the cache. Only me and my wife behind this zenarmor, so not too much different websites visited.
Hi,
They should appear as online even if the results come from the cache. Is it the same if you restart Engine service?
Hi,
I have been some days off in holoday, so I didn't follow this activity, sorry.
I now have 1.17.4 installed, same behaviour. But I have been able to nail it down.
Because my ISP forces a PPPoE Reset every 24 hours to renew the IP Address assigned to my Firewall,
I am running a cron job which performs a regular interface Reset of the PPPoE Interface in the very early
morning at 4.00 to always have the IP Renewal at night.
It seems that this is the point in time where the Node status goes down and does not recover (what I would be expecting). I verified this by having another cron job resetting the interface at "wake hours" ;-), right before
execution I have restarted the Zenarmor to have the Nodes Status 100%, right after execution It was Down at 0%
and didn't recover in the following 10 Minutes.
Thanks a lot,
Stephan
Hi Stephan,
Zenarmor checks Node status in every 10 minutes.
... repeated the test. Node Status goes Down right after PPPoE Interface reset via cron job and stays down,
now for > 2 hours.
Hi,
Please increase the log level in Settings - Logging - Level from INFO to DEBUG4 then share the /usr/local/zenarmor/log/active/worker*.log file with the support team. You can usee Have Feedback option to create a support ticket then attach the worker *.log files to the ticket.
looks like zenarmor either misses that the public IP adress changes, or it ignores it and does not open
a new connection to the CTI servers.
root@opnsense:/usr/local/etc # netstat -n | grep 5355
udp4 0 0 185.171.XXX.YYY.13285 35.198.172.108.5355
udp4 0 0 185.171.XXX.YYY.44679 34.65.117.157.5355
udp4 0 0 185.171.XXX.YYY.25812 34.65.117.157.5355
udp4 0 0 185.171.XXX.YYY.56199 35.198.172.108.5355
after PPPoE interface reset still shows the udp connections from the old IP address instead of the new IP address on the PPPoE interface.
In the logs the Cloud Reputation Servers just go from healthy to unhealty.
My workaround now is to restart zenarmor after a new IP is aquired by adding this to
/usr/local/etc/rc.newwanip
/* restart zenarmor engine to re-establish connection to CTI servers */
log_msg("Restarting zenarmor engine");
mwexecf('/usr/local/sbin/zenarmorctl engine restart');
seems to work ok for now.
Hi,
Thanks for reporting the issue. We are going to investigate it and get back to you.
The workaround works fine since some days now. So it is clear, that under normal circumstances,
Zenarmor maintains a good Cloud Node Status. It just does not recover from a PPPoE Interface
reset. ( it may do so if the Carrier justs disconnects the interface to enforce IP adress renewal,
I didn't test that scenario, but it may be different from performing an interface reset from a cron job
to force the IP address renewal to a specific time.)
Upgrading to OPNsense 24.1.9_3-amd64 didn't improve this. I had to re-add the zenarmorctl engine restart into
rc.newwanip.
best regards,
Stephan
I would like to give the info I have exactly the same problem.
I configured my modem to reconnect 5:00 early in the morning, so that no IP change willl occure within this day.
Every day the Cloud Nodes Status switch to Offline state.
Thx @just4fun for posting a temporary fix.
This is a known cosmetic issue, you can check by restarting the Engine. You can follow the updates.
I don't think it is just a cosmetic issue. Zenarmor looses connectivity to it's cloud nodes, so it is unable
to query them for reputation information. Restarting the engine re-establishes the connection, that's why I still use my workaround decribed earlier in this thread, by adding a zenarmor engine restart command to the end of
/usr/local/etc/rc.newwanip.
I have to re-edit the file after every update. While I can live with it, I still think it should be fixed,
I think it cannot be so difficult to re-establish those connections from scratch when they are lost,
instead of sitting there with known-to-be-offline connections and waiting forever. I'd expect few lines of code
doing that.
Regards,
Stephan
Hi @just4fun,
It's right and a known issue. It will be fixed for the next major version. Thanks for reporting it.
Hi,
after the recent updates to 24.7.8 and Zenarmor 1.18.2 the cloud Node status shows "Up" again after a
PPPoE Interface reset. It may have been fixed also already in one tor two earlier Update Iterations
(quite sure it also worked with Zenarmor 1.18.1, I did not test every single Version).
Great - and thanks for your support
Greetings,
Stephan