Multiple public IPs: NAT works randomly

Started by lrob, January 08, 2023, 03:46:01 AM

Previous topic - Next topic
Hi,

I'm somewhat new to OPNSense, so please, do consider user mistake might be an option at any step of the config.

Got a bridge set on multiple interfaces for my 192.168.1.0/24 LAN.
Got a WAN configured using PPoE.

This worked well, and I was able to make NAT port forwarding without issues.

Problem started when I subscribed to a /29 of public IPs from my internet provider (OVH, FTTH).
The intent is to redirect each public IP to a local IP, in order to have all ports available.

I've tried many configurations, and got it working, but only 50 to 70% of the time.
I've got a test web page on one of the additional IPs visible here http://109.190.103.26/ (no problem to share it, it's a public server), if you refresh it (ctrl + F5) several times, you'll likely encounter timeouts. Same goes with SSH.

I wonder if that is a bug or if I'm doing something wrong. I've been trying everything for a whole day and couldn't find a solution.

Current config:
- One virtual IP on WAN set as /32 and gateway from the /29 as advised by my ISP
- Setup a One-to-one NAT rule : Exernal IP 109.190.103.26/32, Internal IP 192.168.1.16, Destination IP * (Any)
- Firewall allows all incoming connections until I make sure basic config works

At this point I'm stuck, I don't know how to diagnose this kind of random issue.

Do you have any idea of what to try or to look for?

Thank you

One to one is just that  ;)

How is your firewall to decide what public source IP to give your packets from 192.168.1.16 for the NAT on their way back to the client?

Give your internal host the same number of IP addresses as the number of 1:1 NAT rules you have configured for it and each service/daemon listening to that address will do the right thing and so will the firewall  ;D

Bart...

Hello Bart,

It's an honor to find your answer in person! You can't imagine how many forum posts I've found from you doing research. Unfortunately, I didn't find or understand the exact piece of info that I needed. Thank you for your involvement!

Quote from: bartjsmit on January 08, 2023, 09:08:56 AM
How is your firewall to decide what public source IP to give your packets from 192.168.1.16 for the NAT on their way back to the client?

I have no idea, I assume the One-to-one does it somehow?
From 192.168.1.16, if I curl ifconfig.me, I get this:
root@game:~# curl ifconfig.me
109.190.103.26

So it seems like it's kind of a dedicated IP now.
But I've got the same kind of issue: If i repeat the curl, then it sometimes won't answer.

Previously, If i remember well it was with standard NAT forwarding (I've spent 10 hours trying everything so pardon if I mix things), curling ifconfig.me gave me the main public IP from my box (so not from the /29 range, but the first one that I've got, that is set as PPPoE).


Quote from: bartjsmit on January 08, 2023, 09:08:56 AM
Give your internal host the same number of IP addresses as the number of 1:1 NAT rules you have configured for it and each service/daemon listening to that address will do the right thing and so will the firewall  ;D

I'm unsure what you mean. By "Internal host", you mean "192.168.1.16" ?
The 1:1 NAT rule has only /32 IPs configured, so there is the same amount of IPs on each side, shouldn't it work?

Just to make sure, here is my current config:
One-to-one


Assignments (note that WAN on igb1 is not set as an interface on its own other than on  PPPoE) :


Virtual IP (here I've tried no gateway, gateway from the main IP, and gateway I'm supposed to use for the /29, the latest being currently defined).


The fact that it works about 67.6% of the time (almost exactly 2/3 of the time, does it mean something?) according to an external monitoring tool that I've set makes me think there is an issue somewhere, either at driver level, or at ISP level. But I don't know how to diagnose.

But doing an MTR from my monitoring server, everything looks fine:
root@moon:~# mtr -s 100 -r -c 100 109.190.103.26
Start: 2023-01-08T12:43:02+0100
HOST: moon.fihosting.net          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 172.31.1.1                 0.0%   100    4.3   3.6   2.4   5.8   0.9
  2.|-- 20870.your-cloud.host      0.0%   100    0.2   0.3   0.2   0.6   0.1
  3.|-- ???                       100.0   100    0.0   0.0   0.0   0.0   0.0
  4.|-- static.73.143.12.49.clien  0.0%   100    1.3   2.3   0.8  27.3   4.2
  5.|-- spine14.cloud1.nbg1.hetzn  0.0%   100    0.9   4.0   0.8  58.5   9.6
  6.|-- static.213-133-112-81.cli  0.0%   100    0.5   1.7   0.3  24.9   2.9
  7.|-- core5.fra.hetzner.com      0.0%   100    3.7   4.8   3.5  43.3   4.8
  8.|-- core9.fra.hetzner.com      0.0%   100    3.8   3.9   3.7   6.0   0.2
  9.|-- fra-1-a9.de.eu             0.0%   100    3.8   3.9   3.7   4.2   0.1
10.|-- ???                       100.0   100    0.0   0.0   0.0   0.0   0.0
11.|-- ???                       100.0   100    0.0   0.0   0.0   0.0   0.0
12.|-- ???                       100.0   100    0.0   0.0   0.0   0.0   0.0
13.|-- be101.sbg-g2-nc5.fr.eu     0.0%   100    6.7   8.4   6.6  65.9   7.5
14.|-- be103.par-gsw-sbb1-nc5.fr  0.0%   100   12.6  12.7  12.5  15.1   0.3
15.|-- be104-202.par-gsw-pb1-nc5  0.0%   100   13.4  41.0  12.7 335.4  57.1
16.|-- th2-dsl1-a1.fr.eu          0.0%   100   12.4  12.5  12.3  12.9   0.1
17.|-- 145.239.153.164            0.0%   100   12.4  12.4  12.3  16.2   0.4
18.|-- game.terageek.org          0.0%   100   28.3  28.0  27.5  28.8   0.3


Ping also works 100% of the time. And it also works 100% of the time locally.

Sorry for the long answer, at least we have all the details. Knowing all that, is there anything I'm doing wrong?

Thank you

I'm messing around in /system_advanced_firewall.php , maybe there are relevant or required settings to add in a bridge configuration here?
At this point I'm considering reinstalling the whole OPNSense and trying without bridging... And if it works, I'll be buying a 10G switch in order to simplify router's config. But that's an expensive solution for an individual.

Quote from: lrob on January 08, 2023, 12:49:11 PM
It's an honor to find your answer in person! You can't imagine how many forum posts I've found from you doing research. Unfortunately, I didn't find or understand the exact piece of info that I needed. Thank you for your involvement!
I'm blushing :) My reputation is far more due to persistence than better insight.

Quote from: lrob on January 08, 2023, 12:49:11 PM
I'm unsure what you mean. By "Internal host", you mean "192.168.1.16" ?
Think of it this way - in "normal" 1:1 scenarios there would be a separate internal host for each service published on the WAN side of the firewall - web, email, files, chat, etc. Each would have its own internal LAN IP and a 1:1 NAT to ensure that the return packets to the client have the same WAN IP that the client sent their request to.

You want to consolidate all these separate servers/services on one internal host (game), which is perfectly fine. That host will need to emulate the many hosts to the firewall by having a separate LAN IP for each 1:1 NAT and by extension, each service.

The tricky part is then to configure game with multiple IP's. For Linux you would use something like 'ip a add 192.168.1.17/24 dev eth0' but that depends on your distro. You may also need to configure each daemon to bind to a separate IP, especially if they listen on the same port.

You don't need to do much on OPNsense at all. It can (and should) be blissfully unaware of all the internal config work. From the firewall perspective there are just multiple services that need 1:1 NAT.

If you have a few spare hosts, try the multiple 1:1 NAT's with them first to get a feel for the procedure. Something like VirtualBox or Docker can help but they also introduce more complexity  ???

Bart...

Quote from: bartjsmit on January 09, 2023, 08:02:46 AM
I'm blushing :) My reputation is far more due to persistence than better insight.
Well, persistence is very valuable to the community, I sure can tell! :)
I've had communities I've been helping for a while too for years as well, so I appreciate when others do it as well!


Quote from: bartjsmit on January 09, 2023, 08:02:46 AM
Think of it this way - in "normal" 1:1 scenarios there would be a separate internal host for each service published on the WAN side of the firewall - web, email, files, chat, etc. Each would have its own internal LAN IP and a 1:1 NAT to ensure that the return packets to the client have the same WAN IP that the client sent their request to.

You want to consolidate all these separate servers/services on one internal host (game), which is perfectly fine. That host will need to emulate the many hosts to the firewall by having a separate LAN IP for each 1:1 NAT and by extension, each service.

The tricky part is then to configure game with multiple IP's. For Linux you would use something like 'ip a add 192.168.1.17/24 dev eth0' but that depends on your distro. You may also need to configure each daemon to bind to a separate IP, especially if they listen on the same port.

You don't need to do much on OPNsense at all. It can (and should) be blissfully unaware of all the internal config work. From the firewall perspective there are just multiple services that need 1:1 NAT.

If you have a few spare hosts, try the multiple 1:1 NAT's with them first to get a feel for the procedure. Something like VirtualBox or Docker can help but they also introduce more complexity  ???

Bart...

I believe there is a misunderstanding here.
I want to map additional public IPs (all ports) to only one local IP, one per (virtual) server. That way it's like every single server has its own IP. And I don't mind if I have to do them one by one because there aren't many (and that way they can have IPs in non consecutive orders); so I'd be using /32 I guess.
So for example, game.terageek.org has locally 192.168.1.16 and publicly reachable on 109.190.103.26, and sends replies as 109.190.103.26.
game.ficellocube.fr would have 192.168.1.17 and be reachable and answer through 109.190.103.27.
And selfhost.lrob.net would have 192.168.1.254 and be reachable through 109.190.103.28

Ideally, I would even like to have servers with their public IP set as an interface (and an LAN IP set as additional IP, for easier local access), and OPNSense just letting traffic through them without affecting it.
Just like how you'd set a router in a data-center. I had this kind of config working on Mikrotik equipment in a data-center, but it was like 2 or 3 years ago so I don't recall exactly how I did, and I didn't find the equivalent in OPNSense.
Basically as far as I remember, in Mikrotik, you'd set your networks like v.w.x.y/z and your main upstream gateway (used for all public subnets in my case), and you could assign these IPs on servers' interfaces right away.


But the story of today is... My config works, all my services are accessible... Only there are packet drops. I mean, packets that even seem to entirely disappear and never go through my WAN or LAN (I couldn't find any packet in all logs I've checked upon a connection not being made). Resulting in connections randomly not occurring.

I've conducted so many more tests to make sure none of my configs or hardware were the problem. Only thing I didn't try is change the LAN and/or WAN to the additional Intel NIC instead of the embed of my Supermicro server that acts as a router. But once again, since there is no issue with regular NAT, only when using the external IP, I do believe that the issue comes from outside of my home/office net.
I've played with MTU (set as 1492 for PPoE) and MSS (set at 1452 for PPoE), no change.
I've even installed pfSense, results are exactly the same on pfSense.
I even tried to install Mikrotik's RouterOS (but didn't succeed to install this last one).
Everything works well when used normally with a single IP, but when setting an IP from the additional range, it becomes the mess that I've described.


Strange thing is I've noticed when doing an MTR to 109.190.103.26, sometimes the last node appears twice and the second occurrence has packet drops.
You might be able to see it by running this from any Linux PC or server:
mtr -s 1024 -c 128 109.190.103.26 -T
or
mtr -s 1024 -c 128 109.190.103.26 --udp

Also, "funny" thing but kind of the only lead I've got is: Issue is visible in TCP, UDP, but not ICMP (default for MTR).
mtr -s 1024 -c 128 109.190.103.26

I've made a screenshot of the issue in TCP (made from an OVH server) as it's almost impossible to copy/paste otherwise:


I've just contacted my ISP (OVH, FTTH, they do home and office network as well now) because at this point I really believe the issue comes from them, but I'm pretty sure they'll be helpless like "You are using your own router, therefore any issue comes from you and there is no support" as they usually do.

That's weird, I'm really bored by this issue.
I'm using all my spare time on this, delaying eating and sleeping... And it lead to nothing until now. I really need to solve this. I would appreciate any help, idea or insight.

Thank you

Quote from: lrob on January 10, 2023, 11:53:27 PM
I want to map additional public IPs (all ports) to only one local IP, one per (virtual) server. That way it's like every single server has its own IP.
game.ficellocube.fr would have 192.168.1.17 and be reachable and answer through 109.190.103.27.
And selfhost.lrob.net would have 192.168.1.254 and be reachable through 109.190.103.28
Where are these 192.168.1.x IP addresses configured? How do you set them on the game server?

Always worthwhile to do some packet captures on the host and the firewall, especially if the problem is repeatable.

Bart...

January 11, 2023, 07:48:49 PM #7 Last Edit: January 11, 2023, 10:02:29 PM by lrob
Well, 192.168.1.0/24 is my local network. So these are machines from my home office, which includes physical and virtual machines.
Just for sanity, to make sure issue wasn't coming from the fact that I'm using virtual machines, I've tested the one-to-one config on a physical host directly attached to the router (so no switch, no interfering equipment) and the issue is the same: Packets seem to disappear.

I've therefore tried packet capture on pf/OPNSense directly, and when packets don't go to the server, it seems like they don't go to the router neither.

So you motivated me to do further investigation with packet capture and Wireshark.

Test is: curl ifconfig.me

From LAN perspective:


From WAN perspective:


So it seems like I'm sending a SYN but never receive the ACK, even after a TCP retransmission.

Also OVH said as expected (quick translate/sum up of they message): "Issue is behind your router, so go fy".
So I replied with the tcpdump and the fact that traffic goes out from my network, so the issue is probably after, therefore on their side.

That's where I'm at now.

I'm with you now. It seems like your 1:1 NAT is configured correctly - it's relatively basic so it usually works or doesn't.

Were those traces from OPNsense?

Thanks. I would agree, it either works or doesn't. If it's in between, then issue likely comes from ISP.

Those traces come from pfSense (didn't reinstall to OPNSense yet, but I sure will... In the meantime, 95% of configs are the same, so that's not a big deal I guess), and I've opened the .pcap files with Wireshark.

OVH isn't cooperative at all, they only want to acknowledge mtr tests running default settings (so only ICMP), while ICMP shows no issue, because issue is with SYN not reaching the internet, and SYN don't occur in TCP, they occur on TCP. So I've asked them to run the test in TCP, and check the nodes that I might not see using mtr  (as mtr doesn't show every single gateway), and check their anti-ddos SYN flood protections. But until now they didn't care and even closed my first ticket... I had to call them in order for them to open a new one... but their "administrators" as they call them won't give any care to my issue.
Every single time there is an issue with OVH, I have to spend literally 2 to 6 weeks convincing them that the issue really comes from them. And then they fix it, most of the time. But I'm not sure I will be able to do it this time because the mtr shows packet drops on my IP and not on one of their visible nodes. But one of their IP shows up after mine in mtr, so that might be a key to solving the issue. I've notified them of that.

Last test I would do is buy a cheap Mikrotik router to try a completely different ecosystem. As pfSense and OPNSense are very similar. If that still doesn't work, then I'll be totally convinced it really comes from OVH.


January 12, 2023, 09:27:37 PM #11 Last Edit: January 12, 2023, 09:37:58 PM by lrob
So, asymmetric routing is when outgoing traffic doesn't take the same route as incoming traffic.


It affects TCP but not UDP or ICMP. That is because unlike TCP, UDP and ICMP are stateless protocols.
That exactly fits my issue.

I have only one gateway configured (set by PPPoE connection), so it seems impossible to me for such issue to happen on my router. Which means it would happen after the WAN.
Googling the issue didn't allow me to find anything related to asymmetric routing when using one-to-one NAT.

I found a post explaining this more easily than Netgate's doc:
https://www.auvik.com/franklyit/blog/asymmetric-routing-issue/

I've asked OVH if it would be possible that the /29 was facing asymmetric routing.
Now waiting for the answer...