OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: teknoadmin on August 13, 2020, 05:29:09 pm

Title: IPSec / OpenVPN PBR
Post by: teknoadmin on August 13, 2020, 05:29:09 pm
Hello everyone,

first of all, I am sorry to bother you all with this long post and questions, but I had throw many, many nights trying to make work this out.

SITE A, dedicated server in cloud
Virtualized Opnsense 20.7 with 2 vnics.
/29 public subnet, configured as Virtual IPs.
Lan 172.20.10.0/24
Only relevant host is 172.20.10.10/32

SITE B, main office
DEC2650, 8 Nics
3 WAN with 3 public IPs
Lan 172.16.10.0/24
Lan 172.16.9.0/24
Lan 172.16.8.0/24

SITE C, branch office
DEC2610, 3 Nics
2 WAN with 2 public IPs
Lan 172.18.10.0/24

SITE D, branch office
DEC2610, 3 Nics
1 WAN with 1 public IP
Lan 172.19.10.0/24

My goal : star network topology, with central user VPN and following considerations :

(B) 172.16.10.0/24 -> (A) 172.20.10.10/32 via WAN1 GW
(C) 172.18.10.0/24 -> (A) -> (B) 172.16.10.0/24 via WAN2 GW
(D) 172.19.10.0/24 -> (A) -> (B) 172.16.10.0/24 via WAN2 GW

This can be accomplished only via policy based routing I think.

So, my first attempt, was IPSEC

Site A, 3 Ipsec VTIs, public IPs.
Site B, 3 Ipsec VTIs, one per WAN.

After playing with outbound NAT a bit on A site, all Phase 1 came up.
After locking me out leaving tick on "Install policy" with a VTI 0.0.0.0/0 tunnel, I setted up the gateways on A and B.

Many hours and tries later, I realized that IPSEC VTI on FreeBSD don't support pf reply-to, and apparently there is no way to route traffic to the same subnet from different gateways without loosing source address using NAT.

Also, my experience wasn't smooth at all. Some days I found tunnels stopped and no auto reconnected. Sometimes manual tunnel restart didn't work at all, with one side hanging unresponsive and traffic flowing (tcpdumped) from one site to the other. Starting one tunnel brings down the others (also with ipsec up con* command), and the only way to bring up others was to restart ipsec service, all tunnels.

Result : failure.  :-[

And, my second attempt, OPENVPN

All OpenVPN servers on SITE A, SITE B,C,D clients.

After many hours of sweat and tries, I realized that you can't assign a WAN interface directly to a client (to use its GW) without setting a static system route, because Pfsense can't use PBR itself.
Initially I solved the problem using SITE B as OPENVPN Server, but for sake of elegance and because I am too stupid to be satisfied, I assigned 3 different public IPs on SITE A on 3 OpenVPN Servers.
Tunnels now are stable, they reconnect in case of network failure, everything works as expected.

To summarize (OpenVPN Tunnels are in 10.20.X.X classes) :

1 - (A) 172.20.10.10/32 -> 10.20.50.1 -> (B) 10.20.50.2 -> 172.16.10.0/24 (WAN1)
2 - (A) 172.20.10.10/32 -> 10.20.51.1 -> (B) 10.20.51.2 -> 172.16.9.0/24 (WAN2)
3 - (A) 172.20.10.10/32 -> 10.20.53.1 -> (C) 10.20.53.2 -> 172.18.10.0/24
4 - (A) 172.20.10.10/32 -> 10.20.54.1 -> (D) 10.20.54.2 -> 172.19.10.0/24 

5 - (C) 172.18.10.0/24 -> 10.20.53.2 -> (A) 10.20.53.1 -> 10.20.52.1 -> (B) 10.20.52.2 -> 172.16.8.0/24
6 - (D) 172.19.10.0/24 -> 10.20.54.2 -> (A) 10.20.54.1 -> 10.20.52.1 -> (B) 10.20.52.2 -> 172.16.8.0/24

7 - (C) 172.18.10.0/24 -> 10.20.53.2 -> (A) 10.20.53.1 -> 10.20.51.1 -> (B) 10.20.51.2 -> 172.16.10.0/24
8 - (D) 172.19.10.0/24 -> 10.20.54.2 -> (A) 10.20.54.1 -> 10.20.51.1 -> (B) 10.20.51.2 -> 172.16.10.0/24

9 - (C) 172.18.10.0/24 -> 10.20.53.2 -> (A) 10.20.53.1 -> 10.20.51.1 -> (B) 10.20.51.2 -> 172.16.9.0/24
10 -(D) 172.19.10.0/24 -> 10.20.54.2 -> (A) 10.20.54.1 -> 10.20.51.1 -> (B) 10.20.51.2 -> 172.16.9.0/24

Everything is accomplished with PBR.
Everything is working fine EXCEPT FOR A THING THAT DRIVE ME CRAZY :

Last 6 routes does not work as expected.
Analyzing traffic with tcpdump, I realized that traffic reach correctly B through A from C and D, and is sent correctly back to A from B.
However A fails to route traffic back to respective OpenVPN GW and send it to default WAN.

The incredible thing is that if I setup a static route to the respective GW on A, 5 and 6 starts to work, but 7,8,9,10, hell, no, they continue to end in WAN.

I kindly ask if someone can help me to sort out this problem, because I know that I am doing something wrong but I am honestly exausted to search and try again, and again.

Thank you for any help you can give!

Regards  :)