OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: Ricardo on November 18, 2020, 01:22:58 pm

Title: Ipsec Site-to-Site VPN goes down regularly
Post by: Ricardo on November 18, 2020, 01:22:58 pm
Hi folks,

I have 2 Opnsense routers, RouterA on SiteA, and RouterB on SiteB. Both RouterA and RouterB has dynamic WAN IP (both WAN is PPPoE), so I used 2x Dynamic DNS FQDN for the tunnel endpoint (instead of the temporary WAN IP address). I did the config based on this guide:
https://docs.opnsense.org/manual/how-tos/ipsec-s2s.html

I did every step like in the guide. The tunnel becomes UP. But after a couple of days, the tunnel usually breaks, and does not come up. I have to restart the IPSEC service on RouterA (I dont have access to RouterB as it is on a remote site with no qualified staff), and sometimes it restores the tunnel. Sometimes I have to restart Unbound, as it seems the problem may be with the DDNS FQDN<-->WAN IP mapping (as explained at the top the WAN IP is dynamic, the ISP changes the WAN IP after every reconnect, or after 2 weeks of WAN uptime).

The guide did not describe the additional parameters, but I have enabled the following Tunnel parameter:

Dynamic gateway    Allow any remote gateway to connect
Recommended for dynamic IP addresses that can be resolved by DynDNS at IPsec startup or update time.

--> to be honest I dont understand whether this setting is really needed, or just introduce some decreased security by allowing literally ANYBODY to connect to this tunnel, the text is not that great to explain if its mandatory for local/remote dynamic tunnel endpoint or not.

The reference guide only says a short description about this scenario:

Site to site VPNs connect two locations with static public IP addresses and allow traffic to be routed between the two networks. This is most commonly used to connect an organization’s branch offices back to its main office, so branch users can access network resources in the main office.

I understand and acknowledge that during WAN IP change time period, there will be a DNS TTL-lenght outage in the tunnel, but this scenario can auto-recover from such tunnel endpoint update, or thats completely impossible with this setup?
I see similar things in the log on RouterA:

2020-11-18T13:03:03   charon[95260]   12[IKE] <con1|3> received AUTHENTICATION_FAILED notify error
2020-11-18T13:03:03   charon[95260]   12[ENC] <con1|3> parsed IKE_AUTH response 1 [ N(AUTH_FAILED) ]
2020-11-18T13:03:03   charon[95260]   12[NET] <con1|3> received packet: from [ROUTER-B-WAN_IP][4500] to [ROUTER-A-WAN_IP][4500] (80 bytes)
2020-11-18T13:03:03   charon[95260]   12[NET] <con1|3> sending packet: from [ROUTER-A-WAN_IP][4500] to [ROUTER-B-WAN_IP][4500] (320 bytes)
2020-11-18T13:03:03   charon[95260]   12[ENC] <con1|3> generating IKE_AUTH request 1 [ IDi N(INIT_CONTACT) IDr AUTH N(ESP_TFC_PAD_N) SA TSi TSr N(MOBIKE_SUP) N(ADD_4_ADDR) N(MULT_AUTH) N(EAP_ONLY) N(MSG_ID_SYN_SUP) ]
2020-11-18T13:03:03   charon[95260]   12[IKE] <con1|3> establishing CHILD_SA con1{4} reqid 1
2020-11-18T13:03:03   charon[95260]   12[IKE] <con1|3> authentication of '[ROUTER-A-WAN_IP]' (myself) with pre-shared key
2020-11-18T13:03:03   charon[95260]   12[CFG] <con1|3> selected proposal: IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048
2020-11-18T13:03:03   charon[95260]   12[ENC] <con1|3> parsed IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(CHDLESS_SUP) N(MULT_AUTH) ]
2020-11-18T13:03:03   charon[95260]   12[NET] <con1|3> received packet: from [ROUTER-B-WAN_IP][500] to [ROUTER-A-WAN_IP][500] (472 bytes)
2020-11-18T13:03:02   charon[95260]   12[NET] <con1|3> sending packet: from [ROUTER-A-WAN_IP][500] to [ROUTER-B-WAN_IP][500] (464 bytes)
2020-11-18T13:03:02   charon[95260]   12[ENC] <con1|3> generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
2020-11-18T13:03:02   charon[95260]   12[IKE] <con1|3> initiating IKE_SA con1[3] to [ROUTER-B-WAN_IP]
2020-11-18T13:03:02   charon[95260]   14[KNL] creating acquire job for policy [ROUTER-A-WAN_IP]/32 === [ROUTER-B-WAN_IP]/32 with reqid {1}

If I try to trigger / force the tunnel establishment under IPSEC \ Status overview, I get the same results as seen in the log. After 1-2 days, the issue recovers by itself. But its difficult to troubleshoot the remote tunnel endpoint while I cannot reach it, so it would be really great if somebody can point to what is the basic mistake in my config.
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: c-mu on November 18, 2020, 01:42:20 pm
This is not a solution, but maybe it could help:

I had similar problems with a remote site wich is under my control. I helped myself with creating a firewall rule that allows that one Public IP from my Headquarter to directly access the admin gui via HTTPS over the public IP from the remote site. With that workaround I was able to locate a problem on the remote site, even when the VPN Tunnel was not able to be established.
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Ricardo on November 22, 2020, 07:31:54 pm
Nobody uses dynamic IP with site2site IPSEC VPN?
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Patrick M. Hausen on November 22, 2020, 08:20:00 pm
Nobody uses dynamic IP with site2site IPSEC VPN?
Nope. At least I don't support any such configuration and would strongly argue to get fixed IP addresses to any customer. And I have built quite a number of IPsec based VPNs in my life.

Doesn't help - sorry.
Patrick
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: chemlud on November 22, 2020, 08:53:12 pm
Nobody uses dynamic IP with site2site IPSEC VPN?

Tried it for some years, went on to openVPN, much more stable, less trouble.
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Ricardo on November 23, 2020, 11:09:52 am
Nobody uses dynamic IP with site2site IPSEC VPN?
Nope. At least I don't support any such configuration and would strongly argue to get fixed IP addresses to any customer. And I have built quite a number of IPsec based VPNs in my life.

Doesn't help - sorry.
Patrick

I understand IF that was money-generating business, and I would be called on weekend midnights to fix such an unreliable setup, I would immediately reject that as well. But this is strictly non-money generating setup, and I have no choice to demand a fix-IP from any of the 2 ISPs. So my question remains clear: is this a trobuleful architecture (a respectful proof would significantly help me to trust the answer instead of just saying a 1-word yes/no), and opnsense has no ready-made built-in logic to handle it with confidence? Or I need to workaround the situation with custom scripts and service restarts?
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Gauss23 on November 23, 2020, 12:19:17 pm
Or I need to workaround the situation with custom scripts and service restarts?

I would go for this one. As far as I can see the IP for a FQDN is only resolved once during startup.
So you need a script which checks if a tunnel is running and if not it should restart it.

Maybe this thread is a help:
https://forum.opnsense.org/index.php?topic=13543.0
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: chemlud on November 23, 2020, 01:29:30 pm
I had IPsec tunnels to locations with IPs changing every night and I didn't have to restart IPsec every morning. But from time to time the tunnel didn't come up after connection lost (not necessarily only a change in IP).

With wireguard you have the situation that dynDNS name is only resolved once, but not if handshake fails.
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Gauss23 on November 23, 2020, 01:35:33 pm
I had IPsec tunnels to locations with IPs changing every night and I didn't have to restart IPsec every morning. But from time to time the tunnel didn't come up after connection lost (not necessarily only a change in IP).

With wireguard you have the situation that dynDNS name is only resolved once, but not if handshake fails.

Both endpoints were dynamic?
It at least one endpoint is static you set that one to respond only and the dynamic side is trying to reconnect as soon as it comes back online. I had problem with both sides being dynamic.
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: chemlud on November 23, 2020, 02:54:26 pm
Yepp, both sides dynamic, but not changing IP at the same time, normally.

It's years ago, I switched to openVPN, site-to-site. Very stable with dynDNS. But Wireguard might be more secure?
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Gauss23 on November 23, 2020, 04:36:33 pm
Yepp, both sides dynamic, but not changing IP at the same time, normally.

It's years ago, I switched to openVPN, site-to-site. Very stable with dynDNS. But Wireguard might be more secure?

If choosing the right ciphers and algorithms OpenVPN and IPsec are very secure. WireGuard is not yet production ready so I would be careful to call it more secure.
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Ricardo on November 23, 2020, 08:22:44 pm
Or I need to workaround the situation with custom scripts and service restarts?

I would go for this one. As far as I can see the IP for a FQDN is only resolved once during startup.
So you need a script which checks if a tunnel is running and if not it should restart it.

Maybe this thread is a help:
https://forum.opnsense.org/index.php?topic=13543.0

Much appreciated, I will see the next time if it helps!
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: chemlud on November 24, 2020, 12:46:36 pm
Yepp, both sides dynamic, but not changing IP at the same time, normally.

It's years ago, I switched to openVPN, site-to-site. Very stable with dynDNS. But Wireguard might be more secure?

If choosing the right ciphers and algorithms OpenVPN and IPsec are very secure. WireGuard is not yet production ready so I would be careful to call it more secure.

I saw my openVPN traffic in the past frequently routed via the UK, although that doesn't seem at all the "right" or "shortest" way. Wireguard is newer and a) broken by design or b) not yet on the standard list to-be-broken...
Title: Re: Ipsec Site-to-Site VPN goes down regularly
Post by: Ricardo on November 30, 2020, 12:44:41 pm
Or I need to workaround the situation with custom scripts and service restarts?

I would go for this one. As far as I can see the IP for a FQDN is only resolved once during startup.
So you need a script which checks if a tunnel is running and if not it should restart it.

Maybe this thread is a help:
https://forum.opnsense.org/index.php?topic=13543.0

Much appreciated, I will see the next time if it helps!

In the meantime I found a less destructive STONGSWAN restart method: restart only the specific VPN session, not the entire STRONGSWAN service, which would kill temporarily all the other unaffected VPN tunnels at the same time:

https://forum.opnsense.org/index.php?topic=17350.0
-------------------------------------------------------------------------------------------------------------------

Hi, I've done this by setting up Monit service.

Quick howto:

1. Settings / Monit / Setting / Service Test Settings -> New entry +

Name: It's up to you
Condition:
failed ping4 count 1 address your_opnsense_internal_ip

(this will send 1 ping = 3 retires to remote ipsec host)
Action: Restart

2. Settings / Monit / Setting / Service Settings -> New entry +

Check Enable

Name: Some name
Type: Remote host
Address: remote_gateway_ip (or some host ip inside remote network responding do pings)
Start:
/usr/local/sbin/swanctl -i --child conN

(where N is your connection position on the list in VPN/IPSEC/Status Overview, ie con1)
Stop:
/usr/local/sbin/swanctl -t --child conN

(where N is your connection position on the list in VPN/IPSEC/Status Overview, ie con1)
Tests: Select your test name from p1.
Depends: Nothing depends

General Settings:
Enable service,
I set up polling interval to 60s

This setup will send 3 ping retires to remote ipsec host every 1 minute. If case all 3 ping will timeout Monit service will stop/start this single connection, and so on every 1 minute :)

If connection is up and at least 1 ping will succeed nothing will happen.
If connection is down and at least 1 ping will succeed it will be restarted.

Good luck
----------------------------------------------------------------------------------------------------

I have not tried it yet (if I screw up, have to travel to the remote site to fix the mess).