Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - bb-mitch

#1
20.7 Legacy Series / understanding packet graph...
January 14, 2021, 11:28:38 PM
Please see image attached. from Reporting -> Health > Packets Lan

Trying to look for a problem - not sure what the problem is. ipv6 is disabled on the interface, and yet, inpass6 shows 140m which I presume means 140 million pps??? how is that possible?

On the same graph, LAN inblock ranges from 3.0 to over 350m - but with ippass6 showing values I'm worried I can't trust those numbers?

Any ideas what's happening?

Thank you!
#2
20.7 Legacy Series / Re: opnsense / pfctl bug?
January 14, 2021, 12:46:23 AM
Here's something I noticed...
Running pfctl -s labels shows there are stats.

pfctl -s labels
02f4bab031b57d1e30553ce08e0ec131 7366693 7044318 508222235 7044318 508222235 0 0 0
02f4bab031b57d1e30553ce08e0ec131 7366487 18 1264 18 1264 0 0 0
1d245529367b2e34eeaff16086aeafe9 144 0 0 0 0 0 0 0
1d245529367b2e34eeaff16086aeafe9 3 0 0 0 0 0 0 0


When I cat /tmp/rules.debug I see labels are automatically applied...
port {xxxxx} label "4d91ecae57340e4d7495b86fead00729" # : Allow TO 8A

Do these change with each edit? Regardless it will still be possible to do:
pfctl -k label -k 4d91ecae57340e4d7495b86fead00729

That's pretty cool.

#3
20.7 Legacy Series / Re: opnsense / pfctl bug?
January 13, 2021, 10:10:06 PM
Thanks @chemlud - in our case we can't rely on killing all states. Consider today... certain traffic from certain clients needs to be flushed to enable reconnection (switching to an alternate proxy on our side). Relying on killall for this means we'd have to work outside their hours / be unable to manage this. so pretty critical for us to fix. that said, I think the pfctl WORKS - the call from opnsense / pfsense to pfctl seems to be the issue.

Will continue to update my notes on both forums if I can, but I this man is relevant.
https://www.freebsd.org/cgi/man.cgi?query=pfctl&sektion=8


     To kill a state with ID 4823e84500000018 created from a backup
     firewall with hostid 00000002 use:

   # pfctl -k id -k 4823e84500000018/2


In the case of opnsense the creator id always seems to be set, and changes as states are updated / replaced.

One other interesting option would be if we were able to kill states by label. To do that, we need to "label" the rule. Is that possible? then we could kill states in the form:


It is also possible to kill states by rule label or state ID.  In
     this mode the first -k argument is used to specify the type of
     the second argument.  The following command would kill all states
     that have been created from rules carrying the label "foobar":

   # pfctl -k label -k foobar


Any ideas / knowledge appreciated. Even if the bug can't be fixed, that would help us (and others) work around the issue. Thanks everyone!

Mitch


#4
20.7 Legacy Series / opnsense / pfctl bug?
January 13, 2021, 08:59:54 PM
In the "olden days" clicking the X next to a state in opnsense / pfsense worked. the state was gone - of course if the internal host continues to send a traffic a new state will be created (on a different NAT port), which will fail to reach the end host. That's ok... but at least one could kill those states.

Looking for a solution I found a related issue on pfsense... https://forum.netgate.com/topic/107208/pfctl-k-id-not-working/7

Basically what it comes down to is that the states panel doesn't seem to kill states like it should. It USED to work.

Regularly we used to use this function to search out and kill states for a particular client to affect changes like a new NAT target etc. but it hasn't worked in the past long while - only large scale (like IP filter, click KILL button) have worked (which doesn't work in this case for us. we might want to drop a SIP registration without dropping a call - we only want to kill a single mapping or the mappings in a related group.

The pfsense thread seems to identify the issue - although he was using pfctl directly.

In short:
pfctl -s state -vv

produces a list of states like:
all udp 10.x.x.x:ppppp (66.x.x.x:PPPPP) <- 216.x.x.x:RRRRR       NO_TRAFFIC:SINGLE
   age 00:00:04, expires in 00:00:56, 1:0 pkts, 32:0 bytes, rule 104
   id: 010000005cb3317b creatorid: 9171c710

It's these last two numbers that are key. I believe the docs on pfctl make it look like you can kill a state like this:
pfctl -k id -k 010000005cb3317b

But in reality it requires both the id and the creator:
pfctl -k id -k 010000005cb3317b/9171c710

I think this is likely a bug in both pfsense and opnsense, but people who need it have just been working around it.

Does what I'm suggesting make sense?
#5
I was trying to test out something I've tried before... hoping it had changed. And locked myself out again  ::) :P

I'm wondering if there's a reason things are the way they are, or if any of the powers that be can see a reason NOT to support a simple change I'm requesting. I can find ways to work around it. I just think it might make the feature more widely useful if there was some flexibility in the way the feature is coded.

I'm making this thorough so it's a helpful reference to any who read it later even if nothing changes.

When you want to set maximum limits for TCP connections you have the following field options (from the pf man page (https://www.freebsd.org/cgi/man.cgi?query=pf.conf&sektion=5&n=1) :

Quotemax-src-nodes <number>
Limits the maximum number of source addresses which can simultaneously have state table entries.

max-src-states <number>
Limits the maximum number of simultaneous state entries that   a single source address can create with this rule.

For stateful TCP connections, limits on established connections (connections which have completed the TCP 3-way handshake) can also be enforced per source IP.

max-src-conn <number>
Limits the maximum number of simultaneous TCP connections which have completed the 3-way handshake that a single host can make.

max-src-conn-rate <number> / <seconds>
Limit the rate of new connections over a time interval.  The connection rate is an approximation calculated as a moving average.
There is a section in the docs:
https://docs.opnsense.org/manual/firewall.html#connection-limits

However, I think it's missing some information / explanation including a section / reference to Firewall -> Diagnostics -> pfTables

IF you enable any of those options you need to know triggering them results in black listing.
To manage it:


  • Navigate to: Firewall -> Diagnostics -> pfTables
  • Select the virusprot table
  • Remove any IP you need to unblock


From the posts I've seen, it seems like that keeps catching people and it's not hard to understand why...

At the beginning of the firewall rules there are a couple of important lines:

table <virusprot>
This sets up the table.

block in log quick from {<virusprot>} to {any} label "8e36..." # virusprot overload table
This results in anything listed in that table being blocked in spite of later rules.

When you add some state limits, the rule gets tagged with them like this:
max-src-conn 1 max-src-states 10 tcp.established 120 max-src-conn-rate 1 /1, overload <virusprot> flush global


The tricky part (and what I'm wondering about changing / making allowance for customization) is the part at the end:
overload <virusprot> flush global

If we review the manual for pf again:
QuoteBecause the 3-way handshake ensures that the source address is not being spoofed, more aggressive action can be taken based on these limits. With the overload <table> state   option, source IP addresses which hit either of the limits on established connections will be added to the named table. This table can be used in the ruleset to block further activity from the offending host, redirect it to a tarpit process, or restrict its bandwidth.

The optional flush keyword kills all states created by the matching rule which originate from the host which exceeds these limits. The global modifier to the flush command kills all states originating from the offending host, regardless of which rule created the state.

For example, the following rules will protect the webserver against hosts making more than 100 connections in 10 seconds.  Any host which connects faster than this rate will have its address added to the <bad_hosts> table and have all states originating from it flushed. Any new packets arriving from this host will be dropped unconditionally by the block rule.

      block quick from <bad_hosts>
      pass in on $ext_if proto tcp to $webserver port www keep state \
         (max-src-conn-rate 100/10, overload <bad_hosts> flush global)[/code]

What I'm suggesting is under the advanced settings could there be a list of tables so you could optionally select one? If you wanted to, the default could still be virusprot which would preserve the default behavior?

The flush and global options could be default checked, but allowed to be unchecked.

WHY AM I ASKING?

Consider if I added a rule to rate limit access to a webserver... maybe used for provisioning. A large site with a power failure COULD trip the max https requests at once, but the ability to change the table name, and to not include flush and global could allow the working phones to continue to work. The existing connections to continue to download their payload.

The way it is, the tripwire is all or nothing - and prevents many people from using the connection rate limiting function unless they are prepared to lose all connectivity with any host that exceeds the limit.

limiting connections (i.e. to an SMTP service might be desirable (resulting in a timeout on the sender) instead of black listing the IP like currently happens.

In short one suggestion and one suggestion for a change (improvement?).
1) Update the doc with a reference to how to fix it when you enable those features?
2) Make the table name selectable, and the flush and global options optional.

What do you think? Happy to make a donation to the effort!  ;D

Thanks in advance for your consideration and feedback.

m

#6
Can anyone confirm / deny?
If I kill off a state, it doesn't seem to be removed, but killing all states does work.
Will upgrade another router currently not in an HA set and see if I can duplicate the findings there.
Thanks!
#7

In my configuration, I have two hosts using HA/CARP.

On primary / carp master, go to Firewall -> Diagnostics -> States Dump

Filter on an IP. Press Kill.

Refilter on the same IP, the states do not seem to be cleared.

Pressed X on each state, and then filter on the same IP.

The states do not seem to be cleared.

I took the host in question offline. Repeated the process. I did this to ensure the host was not re-establishing the states before I could see them deleted.

So then I complete reset the states with Firewall -> Diagnostics -> States Reset

Now the states are gone. I haven't had to do this often, but I'm pretty sure this worked properly in 18.1.x - is there something wrong with my procedure?

Thanks!

M
#8
18.7 Legacy Series / Re: Firewall API use
November 05, 2018, 01:46:54 AM
@datiscum: Thanks for the detail - sounds like you are doing exactly what I want to do. I needed something "smarter" and more general than individual sysems like fil2ban and other log watchers which ban locally without context. I wanted to collect that data, filtering for false positives (for example a bad smtp password from a site with lots of recent GOOD logins shouldn't be banned, but logged / warned.

I was looking for a way to insert / delete table entries as I thoguht this might be more efficent than a table alias reload but I imagine if that's possible you will be looking at that too.

@franco: When you hear back about the apparent omission, can you please post back here? It sounds like if that function was present, @datiscum wouldn't have to patch each time.

Do either of you have any thoughts about the most efficent way to accomplish this sort of thing (i.e. editing of tables?) I wouldn't want too frequent reloads to place an undo load on the router.

If you look at http://www.openbsd.org/faq/pf/tables.html there is an example:

#pfctl -t spammers -T add 203.0.113.0/24
#pfctl -t spammers -T delete 203.0.113.0/24
#pfctl -t spammers -T show


Is there a preferred way to invoke that sort of thing in OPNsense via API? Or is ssh the only way?

We could periodically update the table file (in case of reloads or fail over) on active / backup carp units, and use the add/delete as a way to modify the live rule set instead of constantly reloading.

My assumption is that reloading the table frequently would have much higher load than just editing the live table?

Is there an API method to do this (or generic API for a command?) or would it be better to ssh to the router, run teh associated pfctl options, write the file, and use the API just for periodic reloads?

Thanks again guys :-)
;D
#9
18.7 Legacy Series / Re: Firewall API use
November 02, 2018, 11:46:17 PM
Ugh - I see the difference now on third look. Thanks :-)
#10
ok well thanks - will keep that in mind for the future but as long as there's nothing wrong with what we have I won't rush to change. Have a good weekend - thanks,
Mitch
#11
18.7 Legacy Series / Re: Firewall API use
November 02, 2018, 07:08:05 PM
Hey Franco - I tried just now (running latest) and I had the exact same experience datiscum had...
Before I copied / soft linked the file, I got an error. After I copied it worked.

Datiscum: Where is this documented? I've been meaning to get a start on it for something similar - pushing IP's to block into a table through the API. Basically moving the results of utilities like log monitoring to block on the edge of the network.

Thanks!!!

m
#12
There are some differences though right?

Like an alias won't respond to icmp (i.e. ping, traceroute, etc.).

We wanted the ability to selectively allow icmp monitoring of the various IP's (filtered by firewall) instead of just monitoring that the firewall itself was "up".  And I think with aliases we lose that functionality. And because of that loss, we wondered if that might affect how the upstream router viewed the IP (in terms of speed to detect / return an ICMP unreachable to the remote end), etc.

I wasn't sure what else we might be losing in terms of flexibility if we wanted to move an IP from one router set to another for example - and without it's own virtual mac, that process might be more complicated?

Is there a reason using multiple VHID is "bad" - we've been doing it for years without issue? I wasn't under the impression is used any significant resources althgouh there are only a limited number of VHID's within a broadcast domain of course.

Thanks again :-)
#13
Totally willing to accept other opinion and advice on network architecture, but the number of VHID isn't the issue (or doesn't seem to be). If I change the VHID the problem is resolved.

My issue does seem to be an apparent duplicate of the pseudo mac assocaited with VHID 2 on a network which should be partitioned to prevent such things (but isn't yet).

For what it's worth / continuing my education, I'm interested to know your advice though...

If you had a /27 on the WAN, and wanted to NAT say 6 to 12 of those addresses through an OPNsense setup with CARP what would you do?

If you wanted to NAT them all, what would you do then?

Thanks :-)
#14
Hi - we currently have VHID 1 on the LAN, and VHID 2 through 7 on the WAN.

I've been doing some thinking and reading.

They are type CARP. Each IP is a single address inside a /27 network.

The setup has been working fine for about 2 years.

We see some VERY MINOR ping response on this IP using VHID 2 normally (the native WAN IP shows none) - pinging it once per second, we see a regular 1 packet lost every 10 minutes. But since the issue appeared, we are seeing close to 40% packet loss.

We found the broadcast domains at the colo do not seem to be properly separated (we can see other traffic in packet captures on the WAN we should not see) but we do not see any other VRRP or CARP traffic which would directly explain the issue.

If I understand CARP properly, the CARP IP is associated with a kind of virtual MAC - so perhaps this behavior is simply someone asserting that same duplicated MAC through their own configuration of CARP or VRRP - which causes the router or switch to relearn that MAC periodically resulting in the loss of ping responses?

Although that doesn't explain the 1 packet every 10 minutes, I think it does explain the bursts of loss when I use VHID 2. In the images attached, I change from VHID 8 to VHID 2, capture the loss, and then change back to 8.

If that's what I'm seeing, then all I need to do is confirm what the associated MAC address would be for VHID 2 and presumign that's universal, I would need to push that issue back upstream at the colo.

Does that make sense?

Thanks!

Mitch
#15
We have a pair of opnsense configured with high availability / carp and recently noticed an odd behavior.

One of our Virtual IP's was intermittently not responding to pings. About 10 seconds "on" / 20 seconds "off" - pretty regular. The issue only applied to a single IP. We could not see any CARP traffic on the public network, but we found a way to "fix" the problem - by changing the VHID to a different number, the problem went away.

The virtual IP in question does have a password (long and complex random string).

The base / skew is 1 / 0.

I would have expected any competing broadcasts for this VHID would have not been accepted by our router due to the mismatched password.

And yet somethign seems to be "stealing" our address - I don't see the carp mode changing to backup on the primary but perhaps I'm having trouble catching it?

I did run packet captures on the WAN - and although I couldn't see traffic to indicate that's what was happening, I think the symptoms would indicate that's the cause?

By changing the VHID of that one Virtual IP, I can work around the issue. If I change teh VHID back the issue returns. I'd like to resolve the issue permanently - I'm on the latest release firmware.

Can anyone recommend any next steps?

Thanks in advance :-)