Log NAT Rules

Started by josephcocoa, January 22, 2025, 05:01:06 PM

Previous topic - Next topic
I'm starting to migrate my systems from pfSense to OPNsense.  Been enjoying the system so far, but one of the hard requirements we are starting to face from our clients is that we need to be able to log NAT rules such that we can identify who made an outbound connection.  Here is the scenario:

We manage the internet access for an apartment complex.  They have about 1500 devices accessing the internet.  We get a letter saying that someone from our IP was doing something illegal.  They give us the Source IP (Which is our public IP) the source port (on our side) and the timestamp.

I know how to set up logging for knowing who connected out, but it just shows me the information like this:

internal_ip:port -> destination_ip:port

I need to somehow get the logging like this:

internal_ip:port -> nat_ip:port -> destination_ip:port

It looks like https://github.com/italovalcy/pfnattrack might be useful.

So my question is 2 fold.
1) Is it possible to log NAT rules already in the way I describe, if so, how?
2) If not, is it possible to sponsor this feature for an upcoming release?

Check your logs - I'd expect you to see a pair of messages for every NAT'd session, e.g. ("Outbound NAT" session, from Firewall: Log Files: Plain View):

2025-01-23T01:54:31-06:00    Informational    filterlog    68,,,1232f88e5fac29a32501e3f051020cac,bridge0,match,pass,out,4,0x0,,127,62414,0,DF,17,udp,1278,47.190.83.202,173.194.57.231,25980,443,1258
2025-01-23T01:54:31-06:00    Informational    filterlog    172,,,590701f358a203982edb91d7727e8f3a,bridge1,match,pass,in,4,0x0,,128,62414,0,DF,17,udp,1278,10.101.11.160,173.194.57.231,52918,443,1258

(This is in GUI log format, i.e. latest first.) Personally I'd want the MAC address of the associated IPs as well, but it's (apparently) not available from pf. (Lots of ways to deal with that, starting with ignoring it and ending somewhere around 802.1x.)

It's sort of funny considering we contributed better NAT logging through pflog into FreeBSD even ;)

NAT rules are logged using "rdr", "nat" and "binat" actions in the live log (or plain filter logs).  Some NAT types of the GUI may not allow a logging option yet which is also for legacy reasons alone...


Cheers,
Franco

January 23, 2025, 10:28:15 AM #3 Last Edit: January 23, 2025, 10:32:13 AM by Seimus
>  It's sort of funny considering we contributed better NAT logging through pflog into FreeBSD even ;)
Funny yet useful when Tshooting ;). The simplicity to have it in the live view along side of hitting a RULE makes my life easier.

In regards of OP question.
You can Tshoot NAT related things like this.

If you have enabled logging of NAT rules you will see them in the live view as NAT is processed before Firewall rules. e.g


This is always followed by a Firewall rule, e.g


And you can check the State table to see the actual mapping


Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

Thank you all of you for commenting, I'll try to reply to all of you in one post:

@pfry: I'm trying to interpret the log entries you posted.  Could you dissect them out and explain what each column is indicative of?  I want to make sure I'm parsing things correctly on my end.  I couldn't tell looking at what you posted which is source_ip:port, nat_id:port, and destination_ip:port. If I'm writing a parser for myself to use on logs, I want to make sure I don't mess it up. :) Also, with mac address randomization becoming popular, MAC addresses are less useful for my use case.  Not bad to have, but not always useful when serving the legal response.  Instead, we're setting up personal area networks, so that we can tell based on the IP who to serve it to.

@franco: Opnsense being more receptive to this sort of work is why I want to migrate to it and why I want to contribute. :)  I appreciate all of the efforts.

@seimus: I see the rules along with the state information.  To make it more relevant, the sort of information we'd get from the legal team as part of the information request is "at 10PM there was a connection from your network at 10.66.3.1:305".  We usually get that a week or so after the request is made, so it isn't really possible for us to use the live feed at all.  I'm planning on pushing this off to a remote syslog for storage and analysis if needed, which is why I want to make sure I'm parsing log entries correctly. :) 

It would be really nice to see a way to condense both entries to one log entry automatically to help eliminate human error, but based on what I'm seeing, with a little bit more understanding from me, I should be able to make this work. Either way, I'm fine funding some improvements if the team is receptive to it.  If not, that's ok too, I can try to make this work.

Quote from: josephcocoa on January 23, 2025, 04:19:02 PM[...]
@pfry: I'm trying to interpret the log entries you posted.  Could you dissect them out and explain what each column is indicative of?
[...]

Stealing from Franco: OPNsense log description.
The format would hinge on your chosen log source (e.g. if you wanted to use the pflog interface or pf device).

January 23, 2025, 09:23:57 PM #6 Last Edit: January 23, 2025, 11:10:14 PM by EricPerl
I had started a reply yesterday and then I stopped because I realized it could get tricky real fast.

Removing some fields:
2025-01-23T01:54:31-06:00 ... bridge0 ... out ... udp ... 47.190.83.202,173.194.57.231,25980,443
2025-01-23T01:54:31-06:00 ... bridge1 ... in  ... udp ... 10.101.11.160,173.194.57.231,52918,443

Timestamp interface direction protocol source, destination, source, source port, destination port.
More recent at the bottom, so you see the incoming request on the LAN side from the client first (private IP as source), followed nearly immediately by outgoing on the WAN side with the public IP and NAT port. I suspect there could be some interleaving with heavy traffic.

So timestamp with public IP and port would yield a destination and port and the first line earlier in the log for that same destination+port is likely your client IP+port.
I'm not sure how you map that back to a customer without correlation to a MAC and registration of MAC addresses...
As pointed out, with randomization of these...
You'd have the DHCP logs for the first part though.

BUT:
Are you getting a timestamp corresponding to a connection establishment, or timestamp of offense?
The former is what you'll find on the OPN side. With the latter, you'll have to search for the last connection established before the timestamp.

Of course, this relies on clocks on both sides to be reasonably in sync.
With 1500 devices, I wonder how long it takes for ports to be reused... You can probably look at your logs and find out.
The faster they are reused, the more critical sync is.
If you're provided with the destination IP, you can remove some doubts.



@EricPerl It's really annoying because they don't provide destination information. Also, sometimes their time is approximate, other times it is precise, and other times, it hasn't even been correct at all.  We just have to do our best with it.

I appreciate the dissection of what it all means.  That shouldn't be hard to write a parser for. Still, it would be nice to have something that provides all of it in a conveniently compiled way.

We're not needing to track down the specific offender, just the PAN that it was on.  So each unit has their own PAN and we would deliver the notice to the person in that unit. Recycling of ports can be pretty swift, but it isn't terrible.

@pfry I was planning on setting up a syslog collector that feeds everything into a database and then adding it as a remote log in System->Settings->Logging -> Remote . I would configure the syslog collector to store everything into a DB and then I can just query the DB for the timeframe (probably +- 5 minutes to account for inaccuracy of the requesting party) and then write a tool to look up potential matches. I'm assuming that telling it to log "firewall" for the application would send the logs like @EricPerl and yourself have provided examples for. I might also include DHCP logs just for some additional context, but I don't know that it would be critical, and I already know how to read those. :)

Another approach I've thought about was using webflows, but that might be overkill for me and I'm not interested in trying to write or manage a webflow collector/analyzer.

My last question is, I deploy my boxes with a healthy amount of storage space. I could do local logging and just allow it to have really big files and rotate them out. That would save me having a collector and then I'd just download the appropriate log file for analysis, but I'm curious for everyone's thoughts on that. My gut tells me that would be a bad idea and is potentially problematic since logs could cycle quickly and/or accidentally be cleared.

The timeframe precision better be smaller than your low port recycling duration...

There's no argument from me that a log of the NAT mappings visible in sessions/states FW diagnostics would be more accurate.

The above is heuristics.
Consider this log:
2025-01-23T01:54:31-06:00 ... bridge0 ... out ... udp ... 47.190.83.202,173.194.57.231,25980,443
2025-01-23T01:54:31-06:00 ... bridge0 ... out ... udp ... 47.190.83.202,173.194.57.231,11111,443
2025-01-23T01:54:31-06:00 ... bridge1 ... in  ... udp ... 10.101.11.160,173.194.57.231,55555,443
2025-01-23T01:54:31-06:00 ... bridge1 ... in  ... udp ... 10.101.99.160,173.194.57.231,52918,443

Do they match 1-3 & 2-4 or 1-4 & 2-3?
I suspect there's no guarantee that this kind of interleaving is not possible.

If you build something automated, you'd better take such fuzziness into account (i.e. not simple parsing, multiple answers possible).
That adds to the time uncertainty (another source of multiple answers).
I don't know the kind of volume you're dealing with, but given the stakes, you might have to manually double check your queries for a while...

You're probably constrained legally by a retention period for the logs.
Unless it's small, I would deal with the logs on another machine.
I heard running out of disk space on your router is annoying 😉
And I suspect some syslog servers incorporate functionality wrt log correlation and analysis.


I'd love to throw some funds towards improving the nat logging capabilities.  At one point for pfSense, I wrote a package that would collect info from pfSync0 interface and would interpret the state creation/destruction signaling.  That worked pretty well, but was a mess to deal with.

Just to intercept into the ongoing discussion (sorry).

So basically you are looking for to have historical data, e.g who went where at what time.
Did you maybe considered to research pfelk
https://github.com/pfelk/pfelk?tab=readme-ov-file ?

Maybe its worth to have a look if it can as well interpret NAT hits.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
APU2D2 - deceased
N5105 - i226-V | Patriot 2x8G 3200 DDR4 | L 790 512G - VM HA(SOON)
N100   - i226-V | Crucial 16G  4800 DDR5 | S 980 500G - PROD

@Siemus:  That looks like a really interesting project.  It looks like pfelk is great for grabbing the data, but it might not be able to correlate the results for my purposes, that is, finding the internal ip of a natted connection when only given our external IP, port, and time. I'll kick it around a bit though and see. Thank you for making me aware.