Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - drivera

#1
Hi!

I have 3 firewalls deployed, all fully updated (per the updater). I just reinstalled one to replace a failing hard drive, and found something interesting: even after full configuration restore and package restore, the package list still doesn't match the other two:


--- fw1 2021-06-29 18:17:06.041170626 -0600
+++ fw2 2021-06-29 18:17:09.829162384 -0600

-p5-Error   0.17029 64.0KiB OPNsense    GPLv1+, ART10   Error/exception handling in object-oriented programming style
-p5-File-Slurp  9999.27 42.2KiB OPNsense    GPLv1+, ART10   Perl module for single call read and write file routines
+p5-Error   0.17029 64.2KiB OPNsense    GPLv1+, ART10   Error/exception handling in object-oriented programming style
+p5-File-Slurp  9999.27 42.3KiB OPNsense    GPLv1+, ART10   Perl module for single call read and write file routines

-py37-openssl   20.0.1  556KiB  OPNsense    APACHE20    Python interface to the OpenSSL library
+py37-openssl   20.0.1  555KiB  OPNsense    APACHE20    Python interface to the OpenSSL library

-py37-pycodestyle   2.7.0   283KiB  OPNsense    MIT Python style guide checker
+py37-pycodestyle   2.7.0   276KiB  OPNsense    MIT Python style guide checker

-py37-six   1.16.0  90.8KiB OPNsense    MIT Python 2 and 3 compatibility utilities
+py37-six   1.16.0  90.6KiB OPNsense    MIT Python 2 and 3 compatibility utilities

-suricata   5.0.6   6.52MiB OPNsense    GPLv2   High Performance Network IDS, IPS and Security Monitoring engine
+suricata   5.0.6   6.42MiB OPNsense    GPLv2   High Performance Network IDS, IPS and Security Monitoring engine


I removed identical packages for briefer output, and added spaces for easier reading. Notice that the versions numbers do match, but not the package sizes. This seems most odd to me. Any ideas?

Before you ask: no, I haven't installed anything manually - everything has been installed using the in-system installer tools (pkg on the command-line, or the UI firmware/package manager).

Thanks!
#2
I take that back - those metrics are indeed tracked for health reporting, but now I'd have to mine them for the Prometheus exporter ... at least I now know where to look!

As you were :D
#3
Hi!

I'm on the verge of deploying a monitoring station using Prometheus and Grafana for my home network - just to keep track of overall health and metrics that I'm interested in seeing. As an aside to that, I'm looking to keep track of my internet circuits' (two of them) health, as measured by dpinger: RTT, RTTD, loss %, etc.

However, looking at the Prometheus Exporter, these statistics aren't being tracked. Can these metrics be tracked in any meaningful way for reporting via the Prometheus Exporter? Can y'all think of any hints about what the best way to go about that might be?

Thanks!
#4
Yeah I have the sources for OPNSense and I know how to find where it's referenced. However, I don't know under which circumstances it's launched - and this is what I'm looking for.

That said, I enabled PowerD to HiAdaptive and it seems to be eating less CPU now ... so there's that :D

Also, the system seems to have a swapfile that I wasn't aware was there ... I might remove it later on (not that it's generating pressure ... I just generally dislike swapfiles unless I'm 100% certain I need one ... box has 8GB RAM which should be more than enough).

Cheers!
#5
I'll try #1 and #3, though the top output doesn't show me the parent process - just the process itself. I'll figure something out here...

I thought you might suggest #2 and I already tried it, to no avail ... still gets launched. This sounds like a bug that needs reporting as it shouldn't run once disabled, but still does.

Thanks!
#6
Thanks!!  It's off in my system so that's not it.

I also turned off the RRD graphing backend and the Prometheus Exporter for good measure, but flowd_aggregate.py still runs and consumes a fair chunk of CPU ... not as much as before, but enough that I'm annoyed it's consuming any CPU at all...

How can I identify what's launching it and why?
#7
So how does one...

a) check that PowerD is running?
b) set the performance profile?
#8
I have an mSATA SSD as well, plenty of RAM, and I still see fairly consistent, sustained (though relatively short-lived) CPU usage.  The problem is that it seems that the job completes just barely fast enough to not make it fully sustaned, b/c it takes just about as long as the invocation frequency.

Question: if I turn off netflow, can I still get the same metrics from the Prometheus plugin?  This is an option for me - to monitor bandwidth usage externally...
#9
Hi!

I'm running a Protectli FW4B firewall (8GB RAM, 4-core Celeron J3160, mSATA SSD) that runs smoothly for the most part, until a high-bandwidth download is in play. At that point, and apparently inexplicably, flowd_aggregate.py starts to swallow up the CPU (>80% usage, sometimes up to 100%). I found another thread (https://www.analysisman.com/2020/10/opnsense-highcpu.html) where the person found that the fix was to disable IPv6.

My problem is that the fix he prescribes is already applied in my case: I already had IPv6 disabled to begin with.

Is there any way I can help debug why the CPU would be swallowed up in this manner? How can I help diagnose what may be happening here?

Thanks!
#10
Trust me: I tried everything and removing gateway groups was the only solution the devs offered because BSD doesn't support routeback rules for gateway groups.

Sticky connections only works for outgoing. I need routeback to work for incoming.
#11
I used to use Gateway groups, but moved away from them as there is no group-level functionality for backrouting (i.e. respond to packets on the interface they're received on).

This is why I'm doing things the way I am. That bit was already addressed by earlier support tickets.

Thanks!
#12
It seems the fix from this ticket did the trick: https://github.com/opnsense/core/issues/3961

Gateways are now assigned always, and I've worked around the problem of "ineligible" gateways being selected for default gateway duty by marking them as down (i.e. disable monitoring). This isn't ideal, but does the trick.

Cheers!
#13
Right...the different priorities are because I want one service to be preferred over the other. The MAIN circuit 200/10 while the BKUP circuit is 10/2 (i.e. only for emergencies). Thus, what I want to have happen is to have the MAIN circuit be used for internet access whenever it's available, and only fall back to BKUP if there's no other choice (i.e. some access is better than none).

I've come up with scripts that can monitor the default gateway configuration and I could definitely add code to trigger the gateway (re-)calculation code...if I only knew how to do that (documentation is scarce on this).

I'll keep poking around to see if I can make it work... I'm sure the problem has to do with how routes are recalculated when interfaces are added/removed (i.e. when OpenVPN clients go up/down).

Cheers...
#14
Hi!

I've been having problems with failover for some time, and I think I've more clearly figured out the circumstances (if not the root cause). These problems have carried over to 20.1.2 which is why I'm bringing this thread back up to this forum.

I have two circuits - MAIN and BKUP - which are the only upstream circuits available, and are clearly marked as such, and properly given priority among them (MAIN has a priority of 1, BKUP has a priority of 2). All other gateways - including the ones generated from some OpenVPN clients I have configured - are not marked as upstream, and have a priority of 255 (the default value).

I've disabled any and all routing configuration or customization on those VPN links (i.e. Don't pull routes is checked, and pull-filter ignore "redirect-gateway" is added into the Advanced configuration section), instead opting for policy-based routing rules within the firewall to forward traffic as appropriate. So far so good, and everything works as intended.

So... on to the scenario...

Whenever the MAIN circuit fails, BKUP immediately takes over and the system's default gateway is selected to route over the BKUP Gateway. However, the main circuit crash also causes the OpenVPN clients' connections to die, which means they need to be brought back up. When they are, their interfaces are also taken down and brought back up (which would make sense as this is how OpenVPN works), and I suspect that this triggers a recalculation of the default gateway "somewhere, by someone" (not sure what part of what code does that yet).

This will result in either the default gateway being left blank, or the gateway being erroneously assigned to the one from one of those VPN links. This makes no sense for several reasons, the biggest one being that none of those gateways is marked as upstream, and thus should not be eligible for selection as the default gateway.

Needless to say that when the default gateway is incorrectly configured, traffic will not be forwarded properly and internet service all but grinds to a halt.

There is a fairly simple - albeit manual solution: log into the firewall's UI, open (edit) any gateway (most commonly the BKUP circuit's gateway), and save it without making any changes. When I apply the changes, this will apparently trigger the default gateway computation code and cause the correct default gateway to be selected and configured.

However: the whole point of having failover is so that the system itself can automatically switch between circuits correctly, without human intervention.

I've been struggling with this one for months.

The biggest questions I have are:

  • Why are gateways not marked as upstream being considered as candidates for selection as default gateway?
  • Why would the system prefer a blank default gateway instead of the next available upstream gateway (given the configured priorities)?

So... any ideas? Also: let me know if you think this is more appropriate to be reported as an issue in GitHub.

Thanks!
#15
I had another outage today and I noticed that the MAIN interface's DHCP-provided (by the ISP) address wasn't getting reloaded/reapplied automatically upon recovery. I had to manually go in and click "reload". I realize this may be an ISP issue, though.  Still, maybe it's all related?

Cheers...