Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - wstemb

Pages1 2 3

25.1, 25.4 Production Series / Re: Failed reboot on upgrades on 25.1

May 06, 2025, 09:25:57 AM

On the next upgrade which needs reboot, I will be personally present, and try to collect (and write) as much as possible about the problem, working on the console. I could try to see if the SSH or web gui reboot have the same effect, never used last months :-).

I usually use Remmina for the SSH, so thinking now about your comment on the last incident, I cannot tell if SSH rejected my password, or the connection itself was dropped (not listening), I just did not get the session.

Next time I will try to manually ssh from my Linux terminal.

But on the VGA console I found this situation more than once, I am very confident in my claims. I usually never touch it, so it goes auto logoff. After a failed reboot on upgrade, pressing the console key you receive the login prompt, non successful until power-off/power-on.

25.1, 25.4 Production Series / Re: Failed reboot on upgrades on 25.1

May 05, 2025, 09:43:58 AM

Quote from: cookiemonster on April 30, 2025, 11:38:17 AMYou need an administrator user to login to the console. Admin rights allow it to see other user processes.
Code Select Expand
#ps -aux to list all processes and their owners ?

This is basic knowledge, a learned it decades ago on Unix (and Unix-like): HP-UX, Linux (Yggdrasil was the first distribution), Solaris... :-)
The problem is elsewhere: I have the root user credentials, but in this phase, between the non finished upgrade and a forced reboot, they are simple rejected on the console or SSH, at the end I got " Permission denied, please try again." message

So:
1. What is blocking the root logon during the upgrade?
2. Which process (and why) is stopping the shutdown process?

Next time I will try to do the upgrade (one which needs reboot) with the console logged on (on SSH and on VGA) to gather more data using the method you (also) suggested, if possible.

Until now, the problem was not destructive, but was very uncomfortable: You have to power-off and power-on the working system, which is the ICT approach to the world I like less.

25.1, 25.4 Production Series / Failed reboot on upgrades on 25.1

April 30, 2025, 11:34:58 AM

After an upgrade with reboot, the reboot failed. On the console I can see "waiting for process to finish: and a PID, but I can't access the console to check which process it is. If I enter the credentials, I receive "Invalid password" (or something similar). SSH also don't connect. Web GUI reboot does not work.

The Web GUI Dashboard after the upgrade is displaying the exact version of the upgraded system, but if you try to enter "System->Firmware" you got the screen with the small window "Waiting the reboot..." you usually get at the end of first phase of upgrade.

It seems that the main functionality of the firewall is maintained.

The only solution I found to bypass this situation was to kill the machine on power-on button. After the reboot all is working again: console, SSH, Web GUI...

On the first upgrade (from 24.7 to 25.1) it happened for the first time on both firewalls (backup and primary), later only on primary.

How to find the userid/password of the system during the upgrade process, to check what is behind the PID the shutdown is waiting for forever?

Regards, Walter

General Discussion / Re: os_node_exporter plugin not reading textfile collector prom files

February 06, 2025, 12:24:12 PM

Resolved.

It was the trailing whitespace in the myspace.prom file to cause the error.

I killed the node_exporter process and restarted it from CLI. After the starting lines, I got the error:

Quotets=2025-02-06T11:14:06.216Z caller=textfile.go:245 level=error collector=textfile msg="failed to collect textfile data" file=myfile.prom err="failed to parse textfile data from \"/var/tmp/node_exporter/myfile.prom\": text format parsing error in line 3: expected integer as timestamp, got \"\""

Searching the web, I found: https://github.com/prometheus/common/issues/33. In short: the node-exporter textfile parser does not tolerate trailing whitespaces.

General Discussion / SOLVED: os_node_exporter plugin not reading textfile collector prom files

February 05, 2025, 01:26:08 PM

I installed the os_node_exporter plugin, and it is working, it is serving data from opnsense to prometheus and grafana.

The problem (or my lack of knowledge) is that, although the flag "--collector.textfile.directory=/var/tmp/node_exporter" is here (as seen from "ps aux | grep node"), the plugin is not reading nor including content of *.prom files placed in this directory:

Quote/var/tmp/node_exporter # ls -al
total 12
drwxr-xr-t 2 nobody nobody 512 Feb 5 13:21 .
drwxrwxrwt 6 root wheel 512 Feb 5 13:05 ..
-rw-r--r-- 1 nobody nobody 521 Feb 5 12:39 myfile.prom

When I look at the http://fw_IP_ADDR:9100/metrics, the rows from myfile.prom are not here, instead I can find:

Quote# HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
# TYPE node_scrape_collector_duration_seconds gauge
...
node_scrape_collector_duration_seconds{collector="textfile"} 0.000197219
...
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
# TYPE node_scrape_collector_success gauge
...
node_scrape_collector_success{collector="textfile"} 1
...
# HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
# TYPE node_textfile_scrape_error gaugedata
node_textfile_scrape_error 1

Same behaviour on 24.7 or 25.1 firewall.

So there must be some read errors.
As seen in upper quote, the file is inside the /var/tmp/node_exporter directory, it has the prom extension and i changed owner to nobody:nobody since the process is working under this user. The documentation on plugin github is weak about this, somebody has some advice?

Zenarmor (Sensei) / Re: Zenarmor Scheduled Reports not Sending Emails

April 10, 2024, 12:34:27 PM

As proposed in the support documentation link in previous post, use the "Have Feedback" link bottom left on Zenarmor Web Gui. You will have the opportunity to check the mark to add logs (and to see the position in filesystem).

If you want to see them, you must go to CLI (/usr/local/zenarmor/log)

Zenarmor (Sensei) / Re: Zemarmor Scheduled Reports Email Not Working

February 14, 2024, 09:49:19 AM

DISCLAIMER: what I write in next rows is not a solution, it is a brute force workaround for just one fixed scenario (SMTP server, No security) if you desperately need the mail report.

OPNsense 23.7.12_5
Zenarmor 1.16.2

Edit send.py and comment around line 246:

# if password:
# smtp.login(username, password)

to avoid bug 2 from previous post.

Then you MUST choose (if/when posible and applicable):
Mail provider: SMTP Server
Mail server hostname: Hostname or IP of a server without authentication
Connection Security: No Security

Mail server port will change to Port 25

You have to put some dummy data for username and password to avoid the bug 1 from previous post.

Zenarmor (Sensei) / Re: Zemarmor Scheduled Reports Email Not Working

February 14, 2024, 08:35:40 AM

I sent a feedback/bug report following the instructions: https://www.zenarmor.com/docs/support/reporting-bug

Zenarmor (Sensei) / Re: Zemarmor Scheduled Reports Email Not Working

February 13, 2024, 08:25:42 PM

Yes, the maintain and upgrade process has some rules.

I changed the script a little (two lines), to make it work just for me in one strict scenario (No security). It is a brute force approach, my copy will be probably overwritten soon by upgrading Zenarmor.

#10

Zenarmor (Sensei) / Re: Zemarmor Scheduled Reports Email Not Working

February 13, 2024, 11:20:13 AM

Found the same error using the local mail server and investigated it a little:

The script cannot be run just as is in the post, the command must receive arguments from the caller, your error is because of lack of arguments.

usage: send.py [-h] [-b PDF] [-S SERVER] [-R PROVIDER] [-P PORT] [-s SECURED] [-u USERNAME] [-p PASSWORD] [-f SENDER] [-t TO] [-v NOSSLVERIFY]

Bug no 1.: When you use it on plain SMTP with No security (without userid and password), the switches --userid and --password are still in the command without arguments:

added echo $@ > filename in the script:

--provider smtp-server --pdf false --server a.b.c.d --port 25 --secured true --username --password --sender i@am.here --to you@are.there --nosslverify false

producing a send.py error message:

end.py: error: argument -u/--username: expected one argument
or
send.py: error: argument -p/--password: expected one argument - if you put userid and empty password

and the error in GUI: Error (200) There was an issue on our end. Sorry about that.

bug no 2:

If you stay with plain SMTP on port 25, and No security and put some userid/password data to bypass the bug no.1, the script send.sh passes valid data to send.py, which wrongly answers with:

{"successful": false, "message": "Smtp :No suitable authentication method found."}

Which authentication methods? - I am using plain SMTP on port 25, No security!

What works?

When in CLI I run the script send.sh with all switches, except --userid and --password, the result of the script (send.py called by send.sh) is:

{"successful": true, "message": "Mail has been send successfully!"}

and test mail is sent and received.

Conclusion:
the script send.py has to be rewritten with better argument parsing:
a) permiting the empty --userid --and password;
b) dropping them as parameters if the Connection security is "No Security". Now the script wrongly assumes that the existing password in arguments means login, plain SMTP with No security do not need login.

I needed the report to work, so I hard-coded a little the script send.py to make it work in the simplest scenario (No security), but I stilll have to check with STARTTLS.

Where is the PDF check un the GUI?

#11

Zenarmor (Sensei) / Re: Local vs Remote confusion

October 18, 2023, 09:11:28 AM

https://forum.opnsense.org/index.php?topic=33270.msg160924#msg160924

Graphs were reverted, but the drill down filters were working only if manually changed in right (expected, not offered) way.

Since the issue was visible only in passive mode and disappeared using routed mode, I did not work on it last months.

Have to check if it is still present in the new versions/releases of Zenarmor.

#12

High availability / Re: migrating from single FW/router to HA setup

June 14, 2023, 05:06:08 PM

Quote from: tessus on June 14, 2023, 05:00:52 AM

I have no way to assign the interface IDs myself, since they are chosen automatically when creating an interface. There is no way to do that manually.

It is possible, a boring process of manually defining interfaces on second firewall one by one following the order OPT1 -> OPT23. But it can be done in less than an hour on this number of interfaces.

Quote

Unless a restore keeps the same assignments, this is impossible. Otherwise a backup and restore should do the trick.

You can try it, backup the main, edit the xml (IP addresses and so on), restore on backup (new)

Quote

Here lies the issue. I have N (about 25) VLANs. This means I have to change 2xN interfaces and create N CARP VIP entries.

You have to define a new IP address on every interface on backup node (something that has to be done in any case), replace a IP address on every interface on main, define with previously used address new CARP VIPs on both nodes. So 4 actions per interface. Manual boring work again, but it can be done relatively fast. Until you have the second firewall disabled, it can be done sequentially, in phases.

But, maybe it can be done, at least partially, editing the backup config xml file and restoring on main, combining part of interface config from the main into the backup node config xml and restoring on backup. I preferred the manual work, where all was under control.

Quote

Then I have to change all firewall rules, because the FW now has to use the virtual interfaces, which are using new interface IDs.

No, I did not touch any rules after building the HA, neither on main, neither on synchronized rules on backup. All was working if you maintain cluster IP addresses = former FW addresses. I had to change only the OpenVPN server Interface from WAN to the VIP CARP address of the WAN.

Quote

I also use OpenVPN (out) and Wireguard (in/out). I certainly would have to figure out how to make this work as well.

For OpenVPN in client access mode I will tell you later, I defined everything, services are working, but I have to check if failover is working (in some maintenance window time)

Quote
Quote from: wstemb on June 13, 2023, 09:13:51 AM
3. Defining the High Availability on main and second node, and defining all the synchronization (XMLRPC Sync) you need. This will copy the chosen definitions to the second node.

Yes, this should not be too complicated.

Thanks for the link, but I actually had read that one before I posted this topic.

Unfortunately all this is a moot conversation unless there is an answer to my first question.
I can't be the only one who has a cable modem, can I? Additionally, anyone who uses OPNsense is most likely using the modem in bridged mode, so someone should have an answer to my question.

I have a simplest routing scenario on WAN with a standalone managed switch connecting the cluster nodes and the ISP router, using a small IP segment on WAN side and fixed IP for all nodes on this segment.

Can you specify better how the WAN definition is configured in bridge mode? I have not experience with cable modems, I had to use several years ago a ISP ADSL bridge/router configured as bridge, moved ASAP to router definition...

#13

High availability / Re: migrating from single FW/router to HA setup

June 14, 2023, 10:31:42 AM

it can be done and I done it (IPv4 only) and it is was a smooth, straightforward few hours manual work.

I have 6 real interfaces (including PFSYNC) and 13 VLAN-s on some of real interfaces on every node. The firewall (which will become master) was in production for few months. I had to work "in place", since I was missing the third machine.

First, I made a IP address plan - 3 addresses per interface. The address on the "old", existing firewall have to become VIP addresses, other two are for nodes.

I manually reconstructed the interfaces on the new firewall (identical machine as the MASTER) , first the real ones, after that the VLANs, just following the order of OPTxx interfaces. Where I had the gap in the numbering of OPT interfaces (just one, luckily) I defined one "placeholder", defined the next, after that I deleted the placeholder - few hour of non intrusive work, can be done whenever you want. I had also to define manually the Virtual IP, OTHER type definitions on new firewall, since during the test I did not see copying them (just few of them, so it was easier to define them than solve the issue)

After that, I defined the VIPS one by one, changing the Master ipv4 IP to one reserved to the node, and moving the old address to VIP. After that, I synchronized the backup with the master. I had no need to change any rule or NAT definition, Just the OpenVPN server interface address.

All work was done in two evenings, in the maintenance time window, first day the backup switch trunk and VLAN definitions, IP address planing, testing, basic functionality and main interfaces, the second all the remaining. In the meantime, the Backup node was disabled. All the time, on every step I made backups of configurations of both firewalls, to step back if needed.

I am working now on two last functions: OpenVPN client access (using internal CA :-( ), and FRR.

Probably there is a better way, but I had to do the work, I had deadlines. So I done it manually this way, knowing that "The Better is the Enemy of the Good".

#14

High availability / Re: migrating from single FW/router to HA setup

June 13, 2023, 09:13:51 AM

I cannot answer you on the first question.

About the adding another node to a highly defined cluster without defining it from scratch, the first part (network and firewall topology) is possible:
1. You have to build another node with exact copy of interfaces as on first (exact means exact OPTx assignment, since OPTX definitions are used during the synchronization phase (copying the rules to second node).
2. Defining a new set of IP address on every pair of interfaces, defining CARP VIP on all interfaces with the IP address previously used on the single firewall interfaces (so yiu do not have to change Default gateways on the network nodes.
3. Defining the High Availability on main and second node, and defining all the synchronization (XMLRPC Sync) you need. This will copy the chosen definitions to the second node.

The guide https://www.thomas-krenn.com/en/wiki/OPNsense_HA_Cluster_configuration is enough for this phase, if you extrapolate it to a more complex situation and if you maintain the OPTx order of interfaces.

I am working now on porting the OpenVPN to the cluster, so I cannot add anything on this.

#15

23.1 Legacy Series / Re: Build a cluster on top of already highly configured FW

June 05, 2023, 03:47:29 PM

Work half done.

Installed a second firewall on a identical hardware and upgraded to same firmware version.

Defined all interfaces (I have a lot of them 8, most of them VLANs) ). Had to follow strictly the same order of OPTx names during definitions on the second firewall, if not the HA "Synchronize states" will copy definitions on wrong interfaces.

Defined corresponding CARP VIP-s on both firewalls for all defined interfaces.

On first tests is seems all (defined) is working, but since the work is not finished and important functions have to be redefined - the most important are OpenVPN servers and OSPF definition, I disabled the second firewall for now, so the cluster is working on one node only.

I had to change the OpenVPN server interface to the cluster one on WAN.

Pages1 2 3