OPNsense Forum

Archive => 16.7 Legacy Series => Topic started by: mickbee on December 28, 2016, 03:09:30 pm

Title: IPSEC issues latest stable
Post by: mickbee on December 28, 2016, 03:09:30 pm
Hi guys,

Thanks for all the great work you're doing, OPNsense is awesome! Saying that after I've been using PFSense for many many years on all sorts of platforms.

To the point, I migrated some PFSense boxes to OPNsense the other day whilst retaining my IPSEC mesh config (with around 9 boxes doing network to network as required). Most settings are as follows:

v2, default conn, IPv4, via the WAN interface (or a virtual IP on the WAN if), main, mutual PSK, IP addresses as identifiers, AES128/SHA1, DH2, default lifetime, no DPD or NAT-T;

Phase2 is LAN to LAN IPv4 tunnel, ESP, Blowfish128/MD5, PFS2 with default lifetime and ping set to target remote gateway internal IPs.

All worked fine on PFsense being super stable, now i'm getting tunnels dropping every few minutes or hours at random, being offline for a few minutes and then going back; some tunnels never go up anymore (always same ones) but examining their respective configs on both ends of that given link (dump xml and check what's in it) shows no inconsistencies.

Is there a known bug? (tried looking, nothing seems to be that) hence my question - unless there are any reasons why the above settings would yield poor results on OPNsense?

thanks and happy new year everyone!
Title: Re: IPSEC issues latest stable
Post by: fabian on December 28, 2016, 04:23:04 pm
Nobody can help you without the log messages. The logs should contain a hint why the connection breaks. By the way: Using dead peer detection will help to detect a broken connection and it will try to bring it up again.
Title: Re: IPSEC issues latest stable
Post by: mickbee on December 31, 2016, 12:40:22 am
thanks for having a read fabian!

logs, obviously; i set the log level at control for most items on the list; i get a lot of the following messages:
charon: 11[KNL] unable to query SAD entry with SPI XXXXXXXX: No such file or directory (2)
charon: 11[JOB] deleting half open IKE_SA after timeout

one other 17.1b box (thought i'd try to see if the upgrade changes anything but it didn't) also reports:
charon: 13[MGR] checkin and destroy IKE_SA (unnamed)[19]
charon: 13[IKE] IKE_SA (unnamed)[19] state change: CONNECTING => DESTROYING

i undersand that without context that's still perhaps not detailed enough - other messages are the usual i'd expect to see strongswan generate;

as for DPD i never had good experience and for the past years it only made my ipsec tunnels unstable when using pfsense; gave it a try following your suggestion and was about to say that it's much better now for the tunnels that go up but need more testing to be sure; still the other ones just won't ever get up and it's not a config mismatch issue... happy to hear your thoughts
Title: Re: IPSEC issues latest stable
Post by: fabian on December 31, 2016, 11:20:55 am
I would suggest you to check phase 1 entries as you are having issues building the IKE SA. If the IKE SA fails, no handshake or key exchange for phase 2 (AH or ESP) is possible and the connection breaks. Can you try IKEv2, if it still does not work and all settings does exactly match?

Regards

Fabian
Title: Re: IPSEC issues latest stable
Post by: mickbee on December 31, 2016, 11:27:42 am
it's actually set for IKEv2 for all tunnels except the mobile one (which does work in the 2 locations where deployed).

All settings are equal, own and peer indentificators are set to reflect the IP addresses devices in question have and those aren't behind any NAT (globally routable IPs assigned to the interfaces which tunnels should bind two).

I have a total of 9 devices in a few countries, randomly after migrating to OPNsense some went up the second i clicked save/apply and the other ones never did. all entries created by hand so i rule out pfsense config xml parsing issues. Random! :) I hate random when computer systems should be deterministic ;)
Title: Re: IPSEC issues on the latest stable build
Post by: mickbee on December 31, 2016, 01:35:32 pm
so, i finally have a bit more time to dig around and noticed something odd;

first, the setup explained however - the topology looks as follows

ISP fibre -> media converter -> WAN on an APU1 board running OPNsense -> two local subnets each with dedicated ETH interface;

now, there's an ESXi box sitting somewhere on a local network attached to ETH2 which has a few virtual networks for separation behind yet another OPNsesne instance, this time virtual.

so the VM has its WAN on the same switch as the ISP and APU WAN links - hope this isn't confusing

finally the ISP assigns (by DHCP) an IP which the APU WAN port receives; that's a X.Y.A.39;
additionally, the ISP routes an entire X.Y.B.88/28 network towards the same IP

so as before with PFsense, I assign the .89 and .90 IPs as Virtual IP aliases on the same APU WAN port; i use those for IPSEC and traffic whereas the actual DHCP given IP is for pure management

now, the VM instance has a static IP of X.Y.B.91 on its WAN link (again, on the same broadcast domain) and .92 .93 and .94 assigned as virtual IPs on the same WAN port. those are used for port forwarding (web and email hosting) and IPSEC tunnels;

obviously, since the APU needs to act as the gateway for the /28 network, i have proper firewall rules in place; there is no NATing of any sort and I see the traffic from the outside (http or email) desitned towards the .91/.94 passing through it on the WAN if; that works

Analogically, the VM has a default route of .89 and is able to go out to the internet just fine for upgrades or time etc.

Both OPNsense hosts have the other one's MAC addresses in their arp tables just fine; ping works both ways; all good;

The VM does gateway monitoring (arpinger) for the APU .89 and that shows as online - once I configure gateway monitoring on the APU to check .91 for the VM it however always shows Offline; note that there are some subnets behind the VM and behind the APU and routing between those (with static routes) also works so I have no idea where this comes from;

Now, the reason why i think it might be related to my non-working IPSEC tunnels is that when i check log entries containing one of the other 7 nodes which are remote geographically and use different ISPs, the VM IPSEC log shows:
charon: 14[KNL] Z.V.X.41 is not a local address or the interface is down

i know that this is a lot, any guidance for what else to check or what to expect out of OPNsense would be helpful though!
Title: Re: IPSEC issues latest stable
Post by: mickbee on January 15, 2017, 03:36:13 pm
guess that there's no fix for this;

bottom line is, i had ipsec tunnels stable for days with latest pfsense and migrating to opnsense broke them even though the same settings are in use; what's funny is that i still have 2 pfsense devices and those are able to keep ipsec tunnels stable with my opnsense boxes

so it seems that opnsense -> opnsense ipsec has issues;

what logs can i provide to have someone much smarter than me look at it? :)
Title: Re: IPSEC issues latest stable
Post by: franco on January 15, 2017, 05:51:26 pm
Hi mickbee,

Happy new year!  :)

Was this always a problem for you when having migrated to OPNsense or was there a "good" version of OPNsense prior to "latest stable"? We need these data points to see if something changed, or the configuration may be off or something that wasn't a use case for us until now (we forked from pre-2.2, you may rely on features/fixes from later pfSense versions which we don't have).

I can't say a lot has changed in our IPsec since early 16.7.x.


Cheers,
Franco
Title: Re: IPSEC issues latest stable
Post by: mickbee on January 15, 2017, 06:13:38 pm
happy new year as well franco! :)

i went from pfsense 2.3.2-RELEASE-p1 to opnsense 16.7.11 and through to .13 but afaicr the changelog mentioned nothing relevant to charon or strongswan;
Title: Re: IPSEC issues latest stable
Post by: franco on January 15, 2017, 06:19:36 pm
Ok, so far so good. Was 16.7.11 working better than 16.7.13 or were they the same?


Cheers,
Franco
Title: Re: IPSEC issues latest stable
Post by: mickbee on January 15, 2017, 07:23:13 pm
no difference really from an end user perspective; inconsistent ipsec tunnel behavior across 16.7.11-13 and 17.1b (both on 10.3 and 11.0 bsd)
Title: Re: IPSEC issues latest stable
Post by: mickbee on January 15, 2017, 11:05:46 pm
actually, just to demonstrate how bad this is, have a look at the attached graph; i'm using librenms running on a vmware esxi vm, behind one opnsense box, polling numerous opnsense and pfsense hosts - amongst others;

see how i'm getting vpn tunnel traffic drops frequently over a 24h period; note the recent snmp poll logs which demo the same symptom - there does seem to be a pattern there?

2017-01-17 18:15:04   Device status changed to Down from icmp check.   
2017-01-17 17:35:14   Device status changed to Up from check.   
2017-01-17 17:25:07   Device status changed to Down from icmp check.   
2017-01-17 16:35:05   Device status changed to Up from check.   
2017-01-17 16:30:05   Device status changed to Down from icmp check.   
2017-01-17 13:30:04   Device status changed to Up from check.   
2017-01-17 13:25:05   Device status changed to Down from icmp check.   
2017-01-17 12:00:05   Device status changed to Up from check.   
2017-01-17 11:10:05   Device status changed to Down from icmp check.   
2017-01-17 10:25:05   Device status changed to Up from check.

on the contrary, have a look at the same librenms host snmp polling another pfsense 2.3.2 host; note that the one break was due to me changing settings and rebooting a few hosts at the same time with network convergence taking a while;

both using the same setup (v2, main mode, ips as identifiers, pf groups, cryptos); both have global ips assigned to their interfaces (no nat) and both using a lan to lan phase2 setup; i'm lost :(