IPsec tunnels constantly breaking down (solved)

Started by jeuler, March 13, 2019, 11:06:45 AM

Previous topic - Next topic
March 13, 2019, 11:06:45 AM Last Edit: March 13, 2019, 05:23:22 PM by jeuler
Our company has a multi-site setup with six locations. In the progress of leaving IPcop behind (which has been discontinued a while ago) I began switching over to OPNsense about half a year ago.

Of those six locations, four have been migrated to OPNsense (three on Lanner x64 hardware, plus the smallest one on an old i386 PC). One location is served by a Sophos UTM (which I decided to keep running as long as the subscription is valid), in the last one IPcop is still running.

There are tunnels configured in every location's gateway so that everything shall be accessible from everywhere. We have static IPV4 addresses at every location which are under control by the respective gateways (i.e. 4x OPNsense, 1x IPcop, 1x Sophos), so no NAT-T should be necessary. The internet access seems to be stable (not aware of dropping connections).

Every OPNsense is running 19.1.3 at this time.

To make a long story short, the connections where IPcop and/or Sophos is involved seem — quite — stable while the OPNsense-only connections have been dropping either at once or after a few minutes. Some connections show as alive on the IPsec status overview while showing as down on the dashboard widget (and technically spoken they ARE down).

The tunnel setup between OPNsenses basically is

  aggressive = no
  fragmentation = yes
  keyexchange = ikev2
  mobike = yes
  reauth = yes
  rekey = yes
  forceencaps = no
  installpolicy = yes
  type = tunnel
  dpdaction = clear
  dpddelay = 10s
  dpdtimeout = 60s
  left = ***LOCALPUBLICIP***
  right = ***PUBLIC.AVAILABLE.REMOTE.URL***
  leftid = ***LOCALPUBLICIP***
  ikelifetime = 28800s
  lifetime = 3600s
  ike = 3des-sha256-modp2048!
  leftauth = psk
  rightauth = psk
  rightid = ***REMOTEPUBLICIP***
  rightsubnet = 172.17.0.0/23
  leftsubnet = 172.31.0.0/23
  esp = 3des-sha256-modp2048!
  auto = start


The log files of the braking tunnels show a whole bunch of what I tend to call "lame excuses", at least to my knowledge.

Any thoughts what's wrong with my setup? I'm just an inch away from deleting and re-creating every single tunnel  :-\

It turned out to be caused by one specific OPNsense. After deleting all its VPNs all other tunnels run stable


What happened is, in short, that while the respective location was a "full" member of my "IPsec mesh", every tunnel which involved at least one OPNsense was very prone to break down with various error messages in the log and various symptoms on the GUI — long description above  :)

As for the cause, unfortunately I'm not exactly sure (yet). At this location's sense we've been experiencing some other problems, epecially Unbound also crashing (whithout having found the time to investigate why exactly it did so).

There are four possible causes whiih I'll have to investigate:
1. Hardware failure — low probability — it's a Scope7 7525 box, couple of months old, extensively tested at my home (which I always do before bringing a gateway into production)
2. IP address problem — fiber connection by Deutsche Telekom which is shared with another company. Maybe their firewall which resides side-by-side with ours causes conflicts.
3. Cable problem — the quality of the physical cable from the main switch is unknown to me
4. Config problem — reinstall OPNsense (or at least run the setup from scratch).

Well, at least I won't get bored during the next days (as if I ever would... ;D )