OPNsense Forum

English Forums => Virtual private networks => Topic started by: pmladenov on March 22, 2021, 01:29:34 pm

Title: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 22, 2021, 01:29:34 pm
Hello,


I have an IPSec routed mode between 2 opnsense FWs: opnFW1 and opnFW2 running:

OPNsense 20.7.5-amd64
FreeBSD 12.1-RELEASE-p10-HBSD
OpenSSL 1.1.1h 22 Sep 2020

After an approximately a week uptime, without any configuration changes on both ends, I'm getting the following error in opnFW1's /var/log/ipsec.log and of course the IPSec is not working....

Code: [Select]
Mar 22 21:43:47 opnFW1 charon[16547]: 11[KNL] creating acquire job for policy 192.168.1.10/32 === 192.168.1.1/32 with reqid {1000}
Mar 22 21:43:47 opnFW1 charon[16547]: 07[CFG] trap not found, unable to acquire reqid 1000
Mar 22 21:44:19 opnFW1 charon[16547]: 07[KNL] creating acquire job for policy 192.168.1.10/32 === 192.168.1.1/32 with reqid {1000}
Mar 22 21:44:19 opnFW1 charon[16547]: 11[CFG] trap not found, unable to acquire reqid 1000

The ipsec logical interface on opnFW1 is ipsec1000:
Code: [Select]
root@opnFW1:~ # ifconfig ipsec1000
ipsec1000: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1400
        tunnel inet 192.168.1.10 --> 192.168.1.1
        inet6 fe80::1a5a:58ff:fe10:13a0%ipsec1000 prefixlen 64 scopeid 0x13
        inet 172.16.1.10 --> 172.16.1.1 netmask 0xffffffff
        groups: ipsec
        reqid: 1000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


From opnFW1 I can successfully ping opnFW2 "underlay" IP address - 192.168.1.1, however I can't ping the "overlay" IP - 172.16.1.1
Code: [Select]
root@opnFW1:~ # ping -c 2 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=7.266 ms
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=3.638 ms

--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 3.638/5.452/7.266/1.814 ms
root@opnFW1:~ # ping -c 2 172.16.1.1
PING 172.16.1.1 (172.16.1.1): 56 data bytes

--- 172.16.1.1 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss

The ipsec configuration on opnFW1 is:
Code: [Select]
root@opnFW1:/usr/local/etc # cat ipsec.conf
# This file is automatically generated. Do not edit
config setup
  uniqueids = yes

conn con1
  aggressive = no
  fragmentation = yes
  keyexchange = ikev2
  mobike = yes
  reauth = yes
  rekey = yes
  forceencaps = no
  installpolicy = no

  dpdaction = restart
  dpddelay = 10s
  dpdtimeout = 60s

  left = 192.168.1.10
  right = 192.168.1.1

  leftid = 192.168.1.10
  ikelifetime = 28800s
  lifetime = 3600s
  ike = aes256gcm16-sha512-ecp512bp!
  leftauth = psk
  rightauth = psk
  rightid = 192.168.1.1
  reqid = 1000
  rightsubnet = 0.0.0.0/0
  leftsubnet = 0.0.0.0/0
  esp = aes256gcm16-sha512-ecp512bp!
  auto = start

From the configuration above - that IPSec should rely on DPD.

On the other side - opnFW2 the logs I'm getting is:

Code: [Select]
root@opnFW2:/var/log # clog ipsec.log | grep 192.168.1.
Mar 22 13:38:37 opnFW2 charon[41296]: 05[KNL] creating acquire job for policy 192.168.1.1/32 === 192.168.1.10/32 with reqid {9000}
Mar 22 13:39:09 opnFW2 charon[41296]: 02[KNL] creating acquire job for policy 192.168.1.1/32 === 192.168.1.10/32 with reqid {9000}
Mar 22 13:41:20 opnFW2 charon[41296]: 07[KNL] creating acquire job for policy 192.168.1.1/32 === 192.168.1.10/32 with reqid {9000}
Mar 22 13:41:52 opnFW2 charon[41296]: 14[KNL] creating acquire job for policy 192.168.1.1/32 === 192.168.1.10/32 with reqid {9000}
Mar 22 13:42:25 opnFW2 charon[41296]: 15[KNL] creating acquire job for policy 192.168.1.1/32 === 192.168.1.10/32 with reqid {9000}

IPsec logical interface on opnFW2 is ipsec9000:

root@opnFW2:/var/log # ifconfig ipsec9000
ipsec9000: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1400
        tunnel inet 192.168.1.1 --> 192.168.1.10
        inet6 fe80::1e72:1dff:feb6:c703%ipsec9000 prefixlen 64 scopeid 0x25
        inet 172.16.1.1 --> 172.16.1.10 netmask 0xffffffff
        groups: ipsec
        reqid: 9000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

The ping tests are identical: from opnFW2 I can ping 192.168.1.10 and cannot ping 172.16.1.10

Code: [Select]
root@opnFW2:/var/log # ping 192.168.1.10
PING 192.168.1.10 (192.168.1.10): 56 data bytes
64 bytes from 192.168.1.10: icmp_seq=0 ttl=64 time=7.893 ms
64 bytes from 192.168.1.10: icmp_seq=1 ttl=64 time=7.310 ms
64 bytes from 192.168.1.10: icmp_seq=2 ttl=64 time=7.990 ms
^C
--- 192.168.1.10 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 7.310/7.731/7.990/0.300 ms
root@opnFW2:/var/log # ping -c 2 172.16.1.10
PING 172.16.1.10 (172.16.1.10): 56 data bytes

--- 172.16.1.10 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
root@opnFW2:/var/log #

IPSec config on opnFW2 related to that tunnel is:

Code: [Select]
cat /usr/local/etc/ipsec.conf

config setup
  uniqueids = yes

conn con9
  aggressive = no
  fragmentation = yes
  keyexchange = ikev2
  mobike = yes
  reauth = yes
  rekey = yes
  forceencaps = no
  installpolicy = no

  dpdaction = restart
  dpddelay = 10s
  dpdtimeout = 60s

  left = 192.168.1.1
  right = 192.168.1.10

  leftid = 192.168.1.1
  ikelifetime = 28800s
  lifetime = 3600s
  ike = aes256gcm16-sha512-ecp512bp!
  leftauth = psk
  rightauth = psk
  rightid = 192.168.1.10
  reqid = 9000
  rightsubnet = 0.0.0.0/0
  leftsubnet = 0.0.0.0/0
  esp = aes256gcm16-sha512-ecp512bp!
  auto = start


And the funnies thing is that if I restart the strongswan service (/usr/local/etc/rc.d/strongswan onerestart) on opnFW1 (with ... "unable to acquire reqid" logs) the issue disappears and everything starts working again....untill the next time it stops.....

Any ideas, comments are highly appreciated!
Intentionally I haven't restored the connectivity this time, so I can provide any additional outputs/logs if required.

Regards,
Plamen
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 22, 2021, 08:17:16 pm
Another observation - there's no UDP traffic between both opnFW1 and opnFW2 on the transport interface.
None of them is trying to initiate phase1.
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 22, 2021, 10:04:54 pm
Nothing in the firewall logs, either, which makes me believe that IKE_SA_INIT is not getting generated from both ends. It's just stuck, although there's "Start immediately" option selected for phase 1 and DPD with restart on both firewalls.

Currently on opnFW2 there are other IPSec VTIs which are working fine (however some of them were in the same stuck state in the past) and I can't find out why it's not generating IKE_SA_INIT packet for that specific peer.

Any ideas how should I proceed with the troubleshooting? Any meaningful ipsec debug level increase?

As I wrote in the first post - if I restart the strongswan service the issue will be resolved, but it will happen again after few days.
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 23, 2021, 05:10:07 pm
Any workarounds? I start thinking of some kind of ugly script in the crontab or using monit service to restart the IPSec when it hang again.
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 24, 2021, 10:35:11 pm
Looks like the IPSec re-connection issue is not because of  "trap not found, unable to acquire reqid 1000"

During my workaround tests in a lab environment I was able to reproduce the issue. As I expected, that's happening when there is an underlay connectivity loss for a longer period of time.

During the connectivity loss IKE packets are retransmitted 5 times before:
Code: [Select]
Mar 24 19:30:10 FW3 charon[90790]: 14[IKE] <con1|1> giving up after 5 retransmits
Mar 24 19:30:10 FW3 charon[90790]: 14[IKE] <con1|1> restarting CHILD_SA con1
....
Mar 24 19:32:55 FW3 charon[90790]: 12[IKE] <con1|2> giving up after 5 retransmits
Mar 24 19:32:55 FW3 charon[90790]: 12[IKE] <con1|2> peer not responding, trying again (2/3)
....
Mar 24 19:35:40 FW3 charon[90790]: 08[IKE] <con1|2> giving up after 5 retransmits
Mar 24 19:35:40 FW3 charon[90790]: 08[IKE] <con1|2> peer not responding, trying again (3/3)
.....
Mar 24 19:38:25 FW3 charon[90790]: 05[IKE] <con1|2> giving up after 5 retransmits
Mar 24 19:38:25 FW3 charon[90790]: 05[IKE] <con1|2> establishing IKE_SA failed, peer not responding



So the question is how can I change that behavior and force the IPSec to continue trying to connect?
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 24, 2021, 11:03:44 pm
And let me reply to myself again - the missing keyword here is "keyingtries"

https://wiki.strongswan.org/projects/strongswan/wiki/connsection (https://wiki.strongswan.org/projects/strongswan/wiki/connsection)
Quote
keyingtries = 3 | <number> | %forever

how many attempts (a positive integer or %forever) should be made to negotiate a connection, or a replacement
for one, before giving up (default 3). The value %forever means 'never give up'. Relevant only locally, other end need
not agree on it.

And the issue raised back in 2020 -
https://github.com/opnsense/core/issues/4204 (https://github.com/opnsense/core/issues/4204)
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 25, 2021, 08:45:58 am
Based on the https://github.com/opnsense/core/issues/4204 (https://github.com/opnsense/core/issues/4204) seems that noone is interested in having persistent ipsec connection....
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: wurmloch on March 25, 2021, 09:29:29 pm
Hi,

One could have that impression. I am tunneling with Linux/openswan and pfSense since a long time. Now I am diging into opnsense IPsec, still frustrated.

First learning, never use policy-based, chose route-based IPsec (1). I am using a lab infrastructure with several APU (pcengines) and some Supermicro/Celeron Firewalls as test machines. At the moment I take advantage of the cold weather to setup-test-discard-start over...

At the end I will see if I can handle ipsec in a reliable way, switch to openvpn or do not use opnsense for site-to-site tunneling.

Don't give up!
Uwe

(1) https://weberblog.net/route-vs-policy-based-vpn-tunnels/ (https://weberblog.net/route-vs-policy-based-vpn-tunnels/)
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: mimugmail on March 26, 2021, 06:28:47 am
Hi,

One could have that impression. I am tunneling with Linux/openswan and pfSense since a long time. Now I am diging into opnsense IPsec, still frustrated.

First learning, never use policy-based, chose route-based IPsec (1). I am using a lab infrastructure with several APU (pcengines) and some Supermicro/Celeron Firewalls as test machines. At the moment I take advantage of the cold weather to setup-test-discard-start over...

At the end I will see if I can handle ipsec in a reliable way, switch to openvpn or do not use opnsense for site-to-site tunneling.

Don't give up!
Uwe

(1) https://weberblog.net/route-vs-policy-based-vpn-tunnels/ (https://weberblog.net/route-vs-policy-based-vpn-tunnels/)

Please note there is a limitation in FreeBSD with pf that you can't use NAT with route-based IPsec. No matter if using OPNsense or pfSense.
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: mimugmail on March 26, 2021, 06:33:20 am
And let me reply to myself again - the missing keyword here is "keyingtries"

https://wiki.strongswan.org/projects/strongswan/wiki/connsection (https://wiki.strongswan.org/projects/strongswan/wiki/connsection)
Quote
keyingtries = 3 | <number> | %forever

how many attempts (a positive integer or %forever) should be made to negotiate a connection, or a replacement
for one, before giving up (default 3). The value %forever means 'never give up'. Relevant only locally, other end need
not agree on it.

And the issue raised back in 2020 -
https://github.com/opnsense/core/issues/4204 (https://github.com/opnsense/core/issues/4204)

Yep it was me complaining, but this only happens on unreliable WANs. For these areas I switched to OpenVPN based IPsec, but I'd also like to diagnose further if you still interested. When I see couple of replies in a thread I usually dont look at it since I guess already another guys is helping out  ;D
Since you already fiddled with the .conf and CLI, can you grab your generated ipsec.conf, search for the affected con, add keyingtries=%forever and put this in a .conf file in the include folder. Then remove the ipsec from UI and restart IPsec. Is it then stable enough?

I can always reopen the issue, but it needs more voices to make progress since changing things in such a sensible area is always risky.

Thanks for hacking on :)
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: pmladenov on March 26, 2021, 01:59:37 pm
Quote
Yep it was me complaining, but this only happens on unreliable WANs. For these areas I switched to OpenVPN based IPsec, but I'd also like to diagnose further if you still interested. When I see couple of replies in a thread I usually dont look at it since I guess already another guys is helping out  ;D
Since you already fiddled with the .conf and CLI, can you grab your generated ipsec.conf, search for the affected con, add keyingtries=%forever and put this in a .conf file in the include folder. Then remove the ipsec from UI and restart IPsec. Is it then stable enough?

I can always reopen the issue, but it needs more voices to make progress since changing things in such a sensible area is always risky.

Thanks for hacking on

Thanks mimugmail,

I did the same, created a the following config file:
Code: [Select]
cat /usr/local/etc/ipsec.opnsense.d/never-give-up.conf
conn %default
 keyingtries = %forever

and restarted the service. It works like a charm.
A standard use case - WAN/Internet outage for longer period of time (for instance failed during the weekend and restored on Monday). With the default keyingtries value, a manual service restart will be needed or full device restart in case the device is completely unreachable remotely and non technical people at the site (which may be even worse in case there's noone there)


Quote
Please note there is a limitation in FreeBSD with pf that you can't use NAT with route-based IPsec. No matter if using OPNsense or pfSense.
Could you please elaborate a little bit more about that? You can't use NAT with the ipsecXXXX (VTI) interfaces or at all?
Another missing feature with VTI IPSec (although not so critical as the NAT) is DHCP relay. DHCP relay daemon simply can't be bind to the VTI interface. Seems that's also the case with pfsense:
https://redmine.pfsense.org/issues/10904 (https://redmine.pfsense.org/issues/10904)


Regards,
Plamen
Title: Re: IPSec routed mode - "unable to acquire reqid"
Post by: mimugmail on March 26, 2021, 08:51:38 pm
I asked for reopening