Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - drivera

#1
Hi!

I have 3 firewalls deployed, all fully updated (per the updater). I just reinstalled one to replace a failing hard drive, and found something interesting: even after full configuration restore and package restore, the package list still doesn't match the other two:


--- fw1 2021-06-29 18:17:06.041170626 -0600
+++ fw2 2021-06-29 18:17:09.829162384 -0600

-p5-Error   0.17029 64.0KiB OPNsense    GPLv1+, ART10   Error/exception handling in object-oriented programming style
-p5-File-Slurp  9999.27 42.2KiB OPNsense    GPLv1+, ART10   Perl module for single call read and write file routines
+p5-Error   0.17029 64.2KiB OPNsense    GPLv1+, ART10   Error/exception handling in object-oriented programming style
+p5-File-Slurp  9999.27 42.3KiB OPNsense    GPLv1+, ART10   Perl module for single call read and write file routines

-py37-openssl   20.0.1  556KiB  OPNsense    APACHE20    Python interface to the OpenSSL library
+py37-openssl   20.0.1  555KiB  OPNsense    APACHE20    Python interface to the OpenSSL library

-py37-pycodestyle   2.7.0   283KiB  OPNsense    MIT Python style guide checker
+py37-pycodestyle   2.7.0   276KiB  OPNsense    MIT Python style guide checker

-py37-six   1.16.0  90.8KiB OPNsense    MIT Python 2 and 3 compatibility utilities
+py37-six   1.16.0  90.6KiB OPNsense    MIT Python 2 and 3 compatibility utilities

-suricata   5.0.6   6.52MiB OPNsense    GPLv2   High Performance Network IDS, IPS and Security Monitoring engine
+suricata   5.0.6   6.42MiB OPNsense    GPLv2   High Performance Network IDS, IPS and Security Monitoring engine


I removed identical packages for briefer output, and added spaces for easier reading. Notice that the versions numbers do match, but not the package sizes. This seems most odd to me. Any ideas?

Before you ask: no, I haven't installed anything manually - everything has been installed using the in-system installer tools (pkg on the command-line, or the UI firmware/package manager).

Thanks!
#2
Hi!

I'm on the verge of deploying a monitoring station using Prometheus and Grafana for my home network - just to keep track of overall health and metrics that I'm interested in seeing. As an aside to that, I'm looking to keep track of my internet circuits' (two of them) health, as measured by dpinger: RTT, RTTD, loss %, etc.

However, looking at the Prometheus Exporter, these statistics aren't being tracked. Can these metrics be tracked in any meaningful way for reporting via the Prometheus Exporter? Can y'all think of any hints about what the best way to go about that might be?

Thanks!
#3
Hi!

I'm running a Protectli FW4B firewall (8GB RAM, 4-core Celeron J3160, mSATA SSD) that runs smoothly for the most part, until a high-bandwidth download is in play. At that point, and apparently inexplicably, flowd_aggregate.py starts to swallow up the CPU (>80% usage, sometimes up to 100%). I found another thread (https://www.analysisman.com/2020/10/opnsense-highcpu.html) where the person found that the fix was to disable IPv6.

My problem is that the fix he prescribes is already applied in my case: I already had IPv6 disabled to begin with.

Is there any way I can help debug why the CPU would be swallowed up in this manner? How can I help diagnose what may be happening here?

Thanks!
#4
Hi!

I've been having problems with failover for some time, and I think I've more clearly figured out the circumstances (if not the root cause). These problems have carried over to 20.1.2 which is why I'm bringing this thread back up to this forum.

I have two circuits - MAIN and BKUP - which are the only upstream circuits available, and are clearly marked as such, and properly given priority among them (MAIN has a priority of 1, BKUP has a priority of 2). All other gateways - including the ones generated from some OpenVPN clients I have configured - are not marked as upstream, and have a priority of 255 (the default value).

I've disabled any and all routing configuration or customization on those VPN links (i.e. Don't pull routes is checked, and pull-filter ignore "redirect-gateway" is added into the Advanced configuration section), instead opting for policy-based routing rules within the firewall to forward traffic as appropriate. So far so good, and everything works as intended.

So... on to the scenario...

Whenever the MAIN circuit fails, BKUP immediately takes over and the system's default gateway is selected to route over the BKUP Gateway. However, the main circuit crash also causes the OpenVPN clients' connections to die, which means they need to be brought back up. When they are, their interfaces are also taken down and brought back up (which would make sense as this is how OpenVPN works), and I suspect that this triggers a recalculation of the default gateway "somewhere, by someone" (not sure what part of what code does that yet).

This will result in either the default gateway being left blank, or the gateway being erroneously assigned to the one from one of those VPN links. This makes no sense for several reasons, the biggest one being that none of those gateways is marked as upstream, and thus should not be eligible for selection as the default gateway.

Needless to say that when the default gateway is incorrectly configured, traffic will not be forwarded properly and internet service all but grinds to a halt.

There is a fairly simple - albeit manual solution: log into the firewall's UI, open (edit) any gateway (most commonly the BKUP circuit's gateway), and save it without making any changes. When I apply the changes, this will apparently trigger the default gateway computation code and cause the correct default gateway to be selected and configured.

However: the whole point of having failover is so that the system itself can automatically switch between circuits correctly, without human intervention.

I've been struggling with this one for months.

The biggest questions I have are:

  • Why are gateways not marked as upstream being considered as candidates for selection as default gateway?
  • Why would the system prefer a blank default gateway instead of the next available upstream gateway (given the configured priorities)?

So... any ideas? Also: let me know if you think this is more appropriate to be reported as an issue in GitHub.

Thanks!
#5
Hi!

I've noticed that during failover, after a few minutes from the initial failover the default gateway configuration will get cleared even though failover had successfully ocurred. The result of this is that routing to the internet no longer works despite there being an active, healthy secondary gateway available. I'm using multiple upstream gateways with differing priorities and except for this glitch the configuration seems to work as intended.

The only way to recover this is to log onto the UI, edit one of the gateways (the healthy one, for instance), save it without making any changes, and clicking on "Apply Changes". This will trigger the code that recalculates the correct gateway and fix the configuration.

Sometimes (very often) this has to be done two or three times for it to take, and normal network functionality to be restored.  If this isn't done the gateway configuration will remain incorrect until the primary circuit returns. Obviously this defeats the purpose of any failover configuration.

However, once the primary circuit comes back to life everything returns to normal on its own.

Maybe the issue is related to the fact that the primary circuit is still online (still has an IP and the link is still UP), but it's effectively dead because some segment downstream is dead? Thus, the circuit's configured upstream gateway is down (and correctly detected as such) even though the interface isn't dead per-se. Perhaps that's what's confusing the gateway calculation algorithm?

I've written a script I use to monitor the gateway configuration which I could easily enough turn into a monitoring daemon (of sorts) that could trigger the gateway calculation/reconfiguration code when it detects that the default gateway has been left empty.  However: I don't know how to do that from the O/S CLI. Any ideas?

Is there documentation anywhere regarding the scripts/commands that are available at the CLI level to invoke OPNSense functionality?

Perhaps that daemon would only trigger the "repair" when it detects that one of the (higher-priority) upstream gateways is both enabled and "down" (i.e. we're in a failover state) ... this way it would minimize interference with normal operation when everything is OK....?

Thoughts?

Thanks!
#6
Hi!

I've noticed recently that my Insight graph screen never has any significant data. I tried resetting the RRD and Netflow Data, but to no avail.  I tried manually running the flowd_aggregate service, but it didn't fix anything.

I've been looking through the logs but I'm not sure what to look for. I don't even know if the Insight data is populated from the netflow data or somewhere/thing else entirely.

Importantly: I've tried doing a data reset and immediate reboot and sometimes it would work and everything starts to show as expected, but it's since stopped working.

To clarify: the "Traffic" section does show real-time traffic. It's the historical stuff that is borked, and I'd like to fix.

Can you guys help me figure this out? Is there any way to fully, cleanly, atomically reset the graphing data so the engine starts gathering stuff correctly again as if from a fresh install?

Thanks!
#7
Hi!

In my multi-WAN setup, I have my gateways configured such that their monitoring IP is a well-known, "always up", pingable IP on the general internet.  This is important because occasionally the ISPs will have a link be up, but with no internet connectivity. Thus, monitoring an "internet" address helps me cover for that case and apply failover even though the link appears to be up.

However, I've also found that they have another issue wherein when there's a connectivity hiccup - usually due to a short power outage (< 1 min) - the connection will seem to be up, but connectivity won't be restored.  This seems to be an issue with the CableModem/ISP connection itself since OPNSense is correctly detecting the lack of connectivity and refuses to fail-back to the primary.

The scenario is this:


  • Short power outage (< 1 min), causing connectivity over the primary circuit to disappear
  • Failover happens correctly to the secondary circuit
  • Power returns
  • The IP address/etc is still valid on the primary interface, but connectivity is still borked (OPNSense remains in "failed" state, correctly routing over the secondary circuit)
  • I manually cause a link restart on the primary circuit by power-cycling the cable modem, and everything returns to normal (fail-back to primary, etc)

The question I have regarding all the above is this: is there a way that I could somehow attach a custom script that is executed when a gateway is marked as "DOWN"? i.e. "when this interface's gateway is marked as down, flush the DHCP lease and leave it unconfigured until it comes back up on its own"

The alternative is for me to buy a USB- or Network-controllable power strip - IOT style - and through that custom script, trigger a power cycle of the Cable Modem, which ideally results in fixing everything up.

So...Thoughts? Ideas?
#8
Hi!  I've posted about this before (https://forum.opnsense.org/index.php?topic=11497.msg52045#msg52045).  The issue is still there: on a prolonged outage for the primary circuit (Cable), every so often the firewall's default gateway will simply get nulled out (i.e. set to "nothing") even though the secondary circuit (ADSL) is up and running.

The "workaround" is to log into the UI, open the ADSL gateway's configuration, save it (no changes!!), and then click on "Apply Changes". This causes the ADSL link to be selected as the default gateway.  But then again, a few minutes later, the same thing happens again (default gateway gets de-configured), and off we go again to the workaround...

Here are some configuration tidbits:


  • There are 4 gateways in the system: Cable, ADSL (these are physical interfaces), VPN1 and VPN2 (these are "soft" interfaces - OpenVPN the both)
  • I added all gateways to the same group, with Cable as tier 1, ADSL as tier 2, and the VPN gateways as tier 5
  • The VPN interfaces are configured with "Mark Gateway as Down", precisely so they won't be promoted to primary (not that it matters if both Cable & ADSL are down)
  • Both Cable and ADSL have explicit monitoring IPs set, in order to validate if the link is really up, vs just the interface is up (frequent case when Cable goes out is that the interface remains in the UP state, even though the actual link is down)
  • All gateways are set for DHCP on IPv4
  • NONE of the gateways is configured with "Disable Gateway Monitoring" as this will (erroneously, if you ask me) override "Mark Gateway as Down" and cause the gateway to be marked as UP even if you don't want it to

Basically, I have everything configured like the "textbooks" say I should have it, and yet I can't get it to work the way (I think) it should.  The problem seems to be with dpinger (or related processes), since if I change the VPN gateways to "Disable Gateway Monitoring" (i.e. assume they're always UP), then for some inexplicable reason they will be preferred ahead of the ADSL link as gateway, even though the ADSL link is in a higher tier within the same gateway group...!!!

Can someone please help me figure this out?

Thanks!
#9
Hi!

I'm experiencing many sudden and unexplained connection interruptions.  Suddenly, and without explanation, the firewall will even reset an SSH connection I have running on it displaying traffic capture over a LAN link!!!

I'm not sure where to begin to offer up log information.  It feels as though the firewall is resetting states periodically. Is there any logging feature that enters a message in the logs when firewall states are reset (and how do I enable it)? Maybe that can be the first step to figuring this out...

Thanks!
#10
18.7 Legacy Series / Firewall failover not working
February 06, 2019, 03:07:52 PM
Hi, all!

I'm experiencing some issues on the failover.  I had a power outage last night and one of my ISPs (the Cable provider) seems to be down due to some line damage, and they'll be offline for a few hours. This happened in the wee hours of the morning, while I was asleep.

When I woke up, I found that I had no internet service. This means that failover had once again been unsuccessful.

After some poking around, I've discovered that even though I have a failover gateway group set up for both my ISPs (Cable and ADSL), and the group is (apparently) configured properly (Cable is Tier 1, ADSL is Tier 2, Trigger Level is "Member Down"), the failover algorithm will not work as expected.

This is the behavior that I would expect:

* When the Cable link is up, the Cable link is promoted to default gateway
* When the Cable link is down, the ADSL link is promoted to default geteway
* When the Cable link comes back up, the Cable link is promoted to default gateway irrespective of the ADLS link's status

This is the behavior I'm seeing:

* When the Cable link is up, the Cable link is promoted to default gateway
* When the Cable link is down, the ADSL link is promoted to default geteway
* When the Cable link comes back up, the ADSL link remains as default gateway unless I explicitly mark the Cable gateway as the default gateway
* However, when I mark the Cable gateway as the default gateway, and a prolonged outage occurs, it seems that fail-back simply won't happen at all and it will remain as the default gateway regardless of up/down status

I've configured each gateway (Cable + ADSL) to have a Monitor IP setting for an address on the far side of the link, so it can be used to determine if the link really is up vs. appearing to be up. I've noticed that even though there's an outage, the Cable link's status shows as "pending" vs. "down". Perhaps this is the issue? Perhaps the algorithm is assuming that the link is up because it's not explicitly marked as "down"?

Thoughts?

If that's the case, then definitely the algorithm should compare the link's current status (UP, UP+Latency, UP+Packet Loss, UP+Latency+Packet Loss, Pending, Down) vs. the "Trigger Level" condition set up in the gateway group, taking into account that until a link is in UP or UP+* state, it should be considered to be down? (i.e. Down + Pending should be equivalent, I think)...

Thanks!
#11
18.7 Legacy Series / HOWTO: Outgoing port translation
December 17, 2018, 10:32:58 PM
Hi!

I'm trying to do something that on the surface appears simple enough, but I'm having a hell of a time configuring. I have a VPN through which I want to route all traffic that the firewall detects going to a specific port number (say...8888).  Simple enough, and done easily enough.

The trick is that I also want that outgoing traffic's destination port number to change from "8888" to "7777", without touching the destination address... and this is where I'm having a hell of a time.

The Outbound rules UI allows me to change the source port, the source address, the destination address... but not the destination port.

I also can't use a port forward since I don't know the request's possible destination beforehand and the port forwarding UI (at least) requires a destination IP to forward the traffic to...

I refuse to believe this "simple" (right?) functionality is impossible in OPNSense, and instead I prefer to think that the issue is that I'm uneducated/incompetent in OPNSense management and configuration, since that seems to me the more likely scenario. :D

So.... any ideas?

Thanks!
#12
Hi!

The description for what "Hybrid outbound NAT rule generation" does is as follows: Automatically generated rules are applied after manual rules

However, I added some manual rules that I've confirmed aren't being added accordingly.  Adding and removing the rules has no effect: using pfctl -sa produces the same NAT rule output each time.

I don't want to switch to fully manual rule generation if I can avoid it, so I can leverage the system's automatic rules.

Is this a known issue? Perhaps there's a misconfiguration somewhere else tripping me up?

Thoughts?

Thanks!
#13
Hi, all!

For my setup, I have several OpenVPN links going to-and-fro.  I managed to get everything working in a pretty clean manner, but have found one inconsistency that I thought could bear some discussion.

Out of preference, and b/c the tunnels I have are pretty much constantly up, I decided to assign a static interface to each of the tunnels so their access rules would be easier to manage.  That worked as expected, minus a speedbump when I realized that the OpenVPN tunnels had to be bounced after all the configuration was done. Minor setback, but easily resolved. Moving on...

What I discovered is that the rules for "OpenVPN" are evaluated *before* the rules for each of the individual tunnels. Intuitively, I would have thought that the most specific rules groups are always evaluated first, but this doesn't appear to be the case here.

Is this by design? Is this a defect that needs correcting?

I discovered this when I added what I hoped to be a "catch-all" REJECT rule, for easier debugging, and found that it was the cause of the traffic not flowing. As soon as the rule was disabled (later removed), everything worked as it should.

So... thoughts?
#14
18.7 Legacy Series / OpenVPN DNS data not being sent over
November 01, 2018, 04:11:20 AM
Hi!

I've configured an OpenVPN endpoint to be able to VPN into my home, but have hit a snag: it appears the OpenVPN configuration doesn't send the DNS server IPs that one configures over the wire for the client to consume. The domain setting has the same issue.

I had to manually add the push rules for dhcp-option DNS and dhcp-option DOMAIN to get it working.

Looking through /var/etc/openvpn I can see that the server configuration doesn't include those directives (unless I add them manually, of course).

I'll have a look at the plugin code to see if something jumps out at me as wholly amiss - maybe this can be my first contribution? :D

Cheers!
#15
Hi!

I have a Multi-WAN setup, which after some toil (mostly due to my newbness :D ) appears to be (mostly) working the way I want it to (thanks to mimugmail for helping me out!). However, there's one thing not working right now that I can't see my way past.

The WANs are set up for a failover scenario: if the primary fails, the secondary takes over. This works well enough. The problem is that while everything is up (i.e. primary is up), I'm unable to ping the secondary interface from a remote location. Pinging the primary works just fine.  When the primary is down and the secondary is up, then I can ping the secondary (now primary due to the failover) just fine from that same location.

The issue, I believe, has to do with default gateways. I can only set up one default gateway. I had to enable gateway switching to get around other problems (discussed here, there's some more fun hijinks on that topic but I digress).


Using the Packet Capture utility I can see that the traffic does arrive fine to the firewall on the secondary while the primary is up.  The problem is that a response is never sent out. This is because the primary had to be set as the default gateway (see the above link) for gateway switching to work, so the O/S (apparently) doesn't know to give those packets special treatment and bounce them right back the network interface they came from.


I know OPNSense isn't Linux, but the way to solve this in Linuxland would be to have a routing rule (using ip route) specifying that packets originating from a given interface's address are to be routed using a special routing table (built for that interface) where the default gateway is that interface's.


I have no clue how to do that on OPNSense-land (*BSD-land)...


Can you guys help me out?
#16
Hi!

I found another thread asking this same question, and this appears to be an unresolved topic. However, it's worth asking again.

I have some special routing needs which necessitate the definition of gateways to handle the traffic. My problem is that the firewall won't select the correct gateway to the internet, and so it'll occasionally become isolated and won't be able to check for updates (for instance).

The rest of the traffic works and flows just fine, though.

There's only a checkbox to select a default gateway, but my problem is I have a multi-wan setup.

Is there a way to configure OPNSense and tell it, specifically, which gateways lead to the internet so that it can rotate through them in the event of circuit failure (i.e. failover)? Alternatively, is there a means to set a gateway group as the default gateway?

Thanks!
#17
Hi!

I have a dual-WAN setup which I've successfully configured OPNSense to handle.  I can pull the plug on the primary LAN and the secondary will kick in.... eventually.

And therein lies the rub: for some reason, OPNSense takes its sweet time to fail over even when it detects that the primary LAN is LINK_DOWN (i.e. cable disconnect).

So my first question is: how do I make OPNSense react more quickly ("decisively"?) to a failover event?

To add insult to injury, the primary WAN is a Cable (DOCSIS 3) ISP which, during bootup, supplies two IPs to the client machine: a "private" (RFC-1918) one and then, subsequently, the final public IP.  My problem is, of course, that OPNSense detects the first IP and thus assumes that the circuit should be brought back online (after all, it's LINK_UP and has an IP... so who can blame it?) when in reality it should wait until the public IP has been assigned.

I don't know if/how OPNSense would be configured for that...

So.... help?
#18
Hi!

I found a GitHub issue report (https://github.com/opnsense/plugins/pull/458) which suggests that automatic, periodic configuration backup via SCP has already been implemented and merged into master since January, but I've found nothing relating to it within the current release's UI (18.7.6).

Also, the plugin itself (scp-backup) seems to have disappeared, further suggesting that this got added to core as the above link suggests.

However, I can't find the configuration page anywhere.  Did this get scrapped? Did it get pushed out to the 19.X release tree?

This is important functionality for those of us who need to keep backups locally (due to company policies, for instance). I'd settle for a backup via a network mount (NFS? Samba isn't ideal...but better than nothing)...

I can code this manually, but I'd just as soon have a UI to administer it, for consistency's sake (also, it's easier to backup a single config XML that has everything, including its "self-backup" configuration).

Thanks!