Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - wstemb

#1
25.7, 25.10 Series / Re: HA syncing issue
November 07, 2025, 08:50:52 AM
Installed a 25.7.7_2 directly from 25.7.6. The configuration is now synchronizing to the slave.
Problem solved for me. Obviously my cluster was "the edge case" :-)
Thank you.
#2
Quote from: franco on October 30, 2025, 11:26:20 AMIt's been about 24 hours since the hotfix went live and I haven't heard the issue is still happening. Incidentally, FreeBSD released pkg 2.4.0 which also includes this particular fix. So all of this went as expected.


Cheers,
Franco

Tried zhis morning on third, test virtual OPNsense server. The upgrade from 25.7 (base) to 25.7.6 finished without any problems or messages.
Tried to upgrade two production server in the cluster, alredy on 25.7.6, on both I got

Quote...
[1/1] Upgrading pkg from 2.3.1 to 2.3.1_1...
[1/1] Extracting pkg-2.3.1_1: .......... done
...

So I think the "problems" I described were before your intervention.


#3
25.7, 25.10 Series / Re: HA syncing issue
October 29, 2025, 12:11:37 PM
Some of message are similar in my cluster. Tried to delete a disabled firewall rule on LAN and synchronize the cluster. The slave remains the same.

So, there are  some syncing issue after some of last upgrades.
#4
Installed on my cluster and created some blocking rules on internal and WAN interfaces. All is working, at least form outside.

Some questions:

1. Beside Q_Feeds Community, I use maltrail. Use same blocking rule for both aliases. Analyzing some IP addresses in both aliases, I found maltrail blocks some IP addresses Q_Feeds did not. I decided to maintain both, because this indicated me the Q_feeds Community protection is not complete (as expected from your documents and web site :-) ).  OK as policy or there can be some "internal conflict"?

2. It is possible to ad some "Export to csv" or "Download" button in Security/Q-Feeds Connect/Events? There are 50k log entries, almost impossible to analyze just on this screen. I know, it is possible to export the whole firewall log, but it is to big to be useful.

3. How to report false positives, if found any?

4. In the portal log i found the message: "Rate limit exceeded for company: xxxxxxx's Company on feed malware_ip".  I have two firewalls, master and slave in a cluster. The message is for the master IP address. Which is the limit?  How to avoid it?   
#5
Seen the same error message on both machines in the cluster, no Web error, the upgrade continued and  finished. One server rebooted, because of the bigger jump between upgrades, the on other (slave/backup) not. 
After the upgrade all services went up (except frr on slave, which is expected). I  made the health check, everything OK.

Only bad news is that some pages, mostly pages that lists something (activity, logs) are now slow and unresponsive, with the browser message that the page is slowing down the browser (seen very seldom before).

Worst of them was the firewall live log, applied the first correction as instructed in another thread and now is working better, or the "System/Diagnostics/Activity" screen, which blocks the page for some seconds (with the upper browser message), but finishes at the end.   
#6
On the next upgrade which needs reboot, I will be personally present, and try to collect (and write) as much as possible about the problem, working on the console. I could try to see if the SSH or web gui reboot have the same effect, never used last months :-).
 
I usually use Remmina for the SSH, so thinking now about your comment on the last incident, I cannot tell if SSH rejected my password, or the connection itself was dropped (not listening), I just did not get the session. 

Next time I will try to manually ssh from my Linux terminal.

But on the VGA console I found this situation more than once, I am very confident in my claims.  I usually never touch it, so it goes auto logoff. After a failed reboot on upgrade, pressing the console key you receive the login prompt, non successful until power-off/power-on.   
#7
Quote from: cookiemonster on April 30, 2025, 11:38:17 AMYou need an administrator user to login to the console. Admin rights allow it to see other user processes.
#ps -aux to list all processes and their owners ?
This is basic knowledge, a learned it decades ago on Unix (and Unix-like): HP-UX, Linux (Yggdrasil was the first distribution), Solaris... :-)
The problem is elsewhere: I have the  root user credentials, but in this phase, between the non finished upgrade and a forced reboot, they are simple rejected on the console or SSH, at the end I got " Permission denied, please try again." message

So:
1. What is blocking the root logon during the upgrade?
2. Which process (and why) is stopping the shutdown process? 

Next time I will try to do the upgrade (one which needs reboot)  with the console logged on (on SSH and on VGA) to gather more data using the method you (also) suggested, if possible.

Until now, the problem was not destructive, but was very uncomfortable: You have to power-off and power-on the working system, which is the ICT approach to the world I like less. 


#8
After an upgrade with reboot, the reboot failed. On the console I can see "waiting for process to finish: and a PID, but I can't access the console to check which process it is. If I enter the credentials, I receive "Invalid password" (or something similar). SSH also don't  connect. Web GUI reboot does not work.

The Web GUI Dashboard after the upgrade is displaying the exact version of the upgraded system, but if you try to enter "System->Firmware" you got the screen with the small window "Waiting the reboot..." you usually get at the end of first phase of upgrade.

It seems that the main functionality of the firewall is maintained. 

The only solution I found to bypass this situation was to kill the machine on power-on button. After the reboot all is working again: console, SSH, Web GUI...

On the first upgrade (from 24.7 to 25.1) it happened for the first time on both firewalls (backup and primary), later only on primary.

How to find the userid/password of the system during the upgrade process, to check what is behind the PID the shutdown is waiting for forever?

Regards, Walter

#9
Resolved.

It was the trailing whitespace in the myspace.prom file to cause the error.

I killed the node_exporter process and restarted it from CLI. After the starting lines, I got the error:

Quotets=2025-02-06T11:14:06.216Z caller=textfile.go:245 level=error collector=textfile msg="failed to collect textfile data" file=myfile.prom err="failed to parse textfile data from \"/var/tmp/node_exporter/myfile.prom\": text format parsing error in line 3: expected integer as timestamp, got \"\""

Searching the web, I found: https://github.com/prometheus/common/issues/33. In short: the node-exporter textfile parser does not tolerate trailing whitespaces. 
#10
I installed the os_node_exporter plugin, and it is working, it is serving data from opnsense to prometheus and grafana.

The problem (or my lack of knowledge) is that, although the flag "--collector.textfile.directory=/var/tmp/node_exporter" is here (as seen from "ps aux | grep node"), the plugin is not reading nor including content of *.prom files placed in this directory:

Quote/var/tmp/node_exporter # ls -al
total 12
drwxr-xr-t  2 nobody nobody 512 Feb  5 13:21 .
drwxrwxrwt  6 root   wheel  512 Feb  5 13:05 ..
-rw-r--r--  1 nobody nobody 521 Feb  5 12:39 myfile.prom

When I look at the http://fw_IP_ADDR:9100/metrics, the rows from myfile.prom are not here, instead I can find: 

Quote# HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
# TYPE node_scrape_collector_duration_seconds gauge
...
node_scrape_collector_duration_seconds{collector="textfile"} 0.000197219
...
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
# TYPE node_scrape_collector_success gauge
...
node_scrape_collector_success{collector="textfile"} 1
...
# HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
# TYPE node_textfile_scrape_error gaugedata
node_textfile_scrape_error 1

Same behaviour on 24.7 or 25.1 firewall.

So there must be some read errors.
As seen in upper quote, the file is inside the /var/tmp/node_exporter directory, it has the prom extension and i changed owner to nobody:nobody since the process is working under this user. The documentation on plugin github is weak about this, somebody has some advice? 
#11
As  proposed in the support documentation link in previous post, use the "Have Feedback"  link bottom left on Zenarmor Web Gui. You will have the opportunity to check the mark to add logs (and to see the position in filesystem).

If you want to see them, you must go to CLI (/usr/local/zenarmor/log)
#12
DISCLAIMER: what I  write in next rows is not a solution, it is a brute force workaround for just one fixed scenario (SMTP server, No security) if you desperately need  the mail report.

OPNsense 23.7.12_5
Zenarmor 1.16.2

Edit send.py and comment around line 246:

#       if password:
#           smtp.login(username, password)

to avoid bug 2 from previous post.

Then you MUST choose (if/when posible and applicable):
Mail provider: SMTP Server
Mail server hostname: Hostname  or IP of a server without authentication
Connection Security: No Security

Mail server port will change to Port 25

You have to put some dummy data for username and password to avoid the bug 1 from previous post.
#13
I sent a feedback/bug report following the instructions: https://www.zenarmor.com/docs/support/reporting-bug
#14
Yes, the maintain and upgrade process has some rules. 

I changed the script a little (two lines), to make it work just for me in one strict scenario (No security). It is a brute force approach,  my copy will be probably overwritten soon by upgrading Zenarmor.
#15
Found the same error using the local mail server and investigated it a little:

The script cannot be run just as is in the post, the command must receive arguments from the caller, your error is because of lack of arguments.

usage: send.py [-h] [-b PDF] [-S SERVER] [-R PROVIDER] [-P PORT] [-s SECURED] [-u USERNAME] [-p PASSWORD] [-f SENDER] [-t TO] [-v NOSSLVERIFY]

Bug no  1.: When you use it on plain  SMTP with No security (without userid and password), the switches --userid and --password are still in the command without arguments:

added echo $@ > filename in the script:

--provider smtp-server --pdf false --server a.b.c.d --port 25 --secured true --username --password --sender i@am.here --to you@are.there --nosslverify false

producing a send.py error message:   

end.py: error: argument -u/--username: expected one argument
or
send.py: error: argument -p/--password: expected one argument  - if you put userid and empty password

and the error in GUI: Error (200) There was an issue on our end. Sorry about that.


bug no 2:

If you stay with plain SMTP on port 25, and No security and put some  userid/password data to bypass the bug no.1,  the script send.sh passes valid data to send.py, which wrongly answers with:

{"successful": false, "message": "Smtp :No suitable authentication method found."}

Which authentication methods? -  I am using plain SMTP on port 25, No security! 

What works?


When in CLI I run the script send.sh with all switches, except --userid and --password, the result of the script (send.py called by send.sh) is:

{"successful": true, "message": "Mail has been send successfully!"}

and test mail is sent and received.

Conclusion:
the script send.py has to be rewritten with  better argument parsing:
a) permiting the empty --userid --and password;
b) dropping them as parameters if the Connection security is "No Security".  Now the script wrongly assumes that the existing password in arguments  means login, plain SMTP with No security do not need  login.

I needed the report to work, so I hard-coded a little the script send.py to make it work in the simplest scenario (No security), but I stilll have to check with STARTTLS. 

Where is the PDF check un the GUI?