Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - pmladenov

#16
21.1 Legacy Series / Re: OPNsense ECMP routing
March 12, 2021, 06:01:26 PM
Quote from: mimugmail on March 11, 2021, 03:11:53 PM
No, AFAIK the radix implementation in FreeBSD is quite error prone. In FreeBSD 13 they have a new and more stable approach, but this will need some time

Thanks for confirming.
#17
That was supposed to be simple, but I still can't get it work...

I have a very basic setup:

Site1 LAN <-> OPNsense-FW1 <-- VTI ipsec1000 --> OPNsense-FW2 <-> Site2 LAN

OPNSense-FW1 has a route to Site2 LAN via OPNsense-FW2 dev ipsec1000
OPNSense-FW2 has a route to Site1 LAN via OPNsense-FW1 dev ipsec1000

Hosts in Site1 LAN are able to communicate with hosts in Site2 LAN.

All I would like to accomplish is locally originated traffic from OPNsense-FW1 destined to Site2 LAN subnet to use its Site1 LAN IP address instead of the IP address of ipsec1000 interface.
I assume this is some kind of source NAT with the following logic:

SRC_IP=ipsec1000_IP, DST_IP=Site2 LAN
SRC_NAT_IP=Site1 LAN_IP,
outgoing interface ipsec1000

I tried the above with couple of variations and none of them were working.
What am I missing here?

Regards,
Plamen
#18
21.1 Legacy Series / OPNsense ECMP routing
March 09, 2021, 09:56:04 PM
Hello!

I'm wondering does latest version (21.1.X) of OPNsense support ECMP (equal cost multipath) routing with FRR?

With my lab devices (20.7.5) I tried to simulate it using OSPF and was able to see two paths in FRR (vtysh -> sh ip ro ospf), however netstat -rnl4 shown completely different story (only one of the paths is actually installed into BSD routing table)


I've found that from almost 2 years ago - https://forum.opnsense.org/index.php?topic=12815.0 which was not positive at all during that time...


Regards,
Plamen
#19
Quote from: mimugmail on February 24, 2021, 05:38:06 PM
This is a known bug/limitation of FreeBSD
Thanks for letting me know mimugmail.

So in that case I guess the workaround will be to simply to disable "scrub" on interfaces AND create a new scrub rule to set the TCP MSS to 1300 (for example).
I haven't read the last sentence of help
Quote"Disable all default interface scrubing rules, mss clamping will also be disabled when you check this. Detailed settings specified below will still be used."
and was with wrong impression that disabling scrub will disable it completely :)

Meanwhile I found out the reason for "asymmetrical" ping behavior in my test lab. During my previous tests I have left a "Passthrough networks" subnet in the advanced VPN config on opnsense-FW2. I'm not sure how this is related, but when I add Host1 subnet in FW1 as a Passthrough networks and Host2 subnet in FW2 I can successfully send 1500 bytes ICMP packets (in both directions) without deactivating interface scrub, however no more than 1500 bytes..

Regards,
Plamen
#20
And the other strange thing:

When I ping host1 (10.70.10.100) from host2 (10.30.10.100) with 1500 byte packet - only the first one gets reply:

17:29:30.832068 IP (tos 0x0, ttl 253, id 8022, offset 0, flags [none], proto ICMP (1), length 1500)
    10.30.10.100 > 10.70.10.100: ICMP echo request, id 39, seq 0, length 1480
17:29:30.833003 IP (tos 0x0, ttl 254, id 8022, offset 0, flags [none], proto ICMP (1), length 1500)
    10.70.10.100 > 10.30.10.100: ICMP echo reply, id 39, seq 0, length 1480
17:29:30.834352 IP (tos 0x0, ttl 253, id 8023, offset 0, flags [none], proto ICMP (1), length 1500, bad cksum 5006 (->6f9e)!)
    10.30.10.100 > 10.70.10.100: ICMP echo request, id 39, seq 1, length 1480
17:29:32.835360 IP (tos 0x0, ttl 253, id 8024, offset 0, flags [none], proto ICMP (1), length 1500, bad cksum 5005 (->6f9d)!)
    10.30.10.100 > 10.70.10.100: ICMP echo request, id 39, seq 2, length 1480

This is again captured on opnsense-fw1 port towards host1.
#21
I have a very basic setup:

host1 <-> (em0) opnsense1 (em1) <----- ipsec routed mode -----> (em1) opnsense2 (em0) <-> host2

host1 is able to ping host2 will small packets but not with 1500bytes packet.
MTU size is default everywhere (1500 bytes and ipsec interface has MTU 1400 by default).
When I disable "interface scrub" (Firewall -> Settings -> Normalization) on opnsense1 firewall (ONLY!) everything starts working. Strange thing is that I don't touch that setting on opnsense2 firewall (we have scrub enabled there, as per default).

When host 1 send 1500 bytes packet it's received by opnsense1, fragmented to 2 packets (1400 bytes and 100 bytes) and send over the ipsec interface. It's getting received on opnsense2, re-assembled and forwarded to host2 as 1500bytes packet.
The problem is in the opposite direction - ICMP echo reply from Host2 is received by FW2, fragmented and sent via the ipsec interface to FW1. FW1 received both fragments on ipsec interface, combine them into a single 1500 byte packet and send it to Host1.  And here is the problem:

According to tcpdump on em0 interface of FW1, that ICMP 1500byte packet has a WRONG checksum (and at the end host1 is not receiving the replies from host2).

I've spent 2 full days in troubleshooting that, simplifying the setup the to above one. Both opnsense1 and 2 initially were 20.7.5, I upgraded the opnsense1 to the latest 20.7.X (no luck), then I completely reinstalled (from scratch without importing configs) opnsense1 VM, it didn't help at all, after that I deleted it and install 21.1 image but it didn't help either...
I think I'm missing something, it's really strange that in the other direction (host1->host2) everything is working (no bad checksums for re-assembled packets), no need to disable PF scrub on opnsense2 fw.

Any idea what I'm missing because I'm completely out of ideas anymore?
Anything else I can try to troubleshoot that problem?

If I have to disable scrub at all - what will happen with TCP MSS? I don't think it will be negotiated to 1360bytes which will definitely break many apps.




#22
High availability / Re: IPSec Site to Site Tunnel with HA
February 23, 2021, 08:52:50 PM
Hi ying18,

I saw similar behavior - although I've selected in Phase 1 the CARP logical interface, during failovers I can see both FWs are trying to use the physical IP address initially...
Also keep in mind that Dead Pear Detection is taking almost 3 minutes to detect a failure (despite what you've configured).
Probably the better approach here is to have 2 separate tunnels to both Firewalls in the HA setup and not to rely on any timers. 
#23
Hello,

I'm running OpnSense 20.7.5

I've configured Site-to-Site IPSec tunnel with IKEv2 and DPD with 2 seconds interval, 5 retries and action=restart tunnel.

My ipsec.config:

Quoteroot@OPNsense:/tmp # cat /usr/local/etc/ipsec.conf
# This file is automatically generated. Do not edit
config setup
  uniqueids = yes

conn pass
  right=127.0.0.1 # so this connection does not get used for other purposes
  leftsubnet=10.30.0.0/16
  rightsubnet=10.30.0.0/16
  type=passthrough
  auto=route

conn con1
  aggressive = no
  fragmentation = yes
  keyexchange = ikev2
  mobike = yes
  reauth = yes
  rekey = yes
  forceencaps = no
  installpolicy = yes
  type = tunnel
  dpdaction = restart
  dpddelay = 2s
  dpdtimeout = 12s



Based on
https://wiki.strongswan.org/projects/strongswan/wiki/connsection

Quotedpdaction = none | clear | hold | restart

controls the use of the Dead Peer Detection protocol (DPD, RFC 3706) where R_U_THERE notification messages
(IKEv1) or empty INFORMATIONAL messages (IKEv2) are periodically sent in order to check the liveliness of the
IPsec peer. The values clear, hold, and restart all activate DPD and determine the action to perform on a timeout.
With clear the connection is closed with no further actions taken. hold installs a trap policy, which will catch
matching traffic and tries to re-negotiate the connection on demand. restart will immediately trigger an attempt
to re-negotiate the connection. The default is none which disables the active sending of DPD messages.

dpddelay = 30s | <time>

defines the period time interval with which R_U_THERE messages/INFORMATIONAL exchanges are sent to the peer.
These are only sent if no other traffic is received. In IKEv2, a value of 0 sends no additional INFORMATIONAL
messages and uses only standard messages (such as those to rekey) to detect dead peers.

dpdtimeout = 150s | <time>

defines the timeout interval, after which all connections to a peer are deleted in case of inactivity.
This only applies to IKEv1, in IKEv2 the default retransmission timeout applies, as every exchange is used to
detect dead peers.

And from https://wiki.strongswan.org/projects/strongswan/wiki/Retransmission

Quote
retransmit_tries    Integer    5    Number of retransmissions to send before giving up
retransmit_timeout    Double    4.0    Timeout in seconds
retransmit_base    Double    1.8    Base of exponential backoff

Using the default values, packets are retransmitted as follows:
Retransmission    Formula    Relative timeout    Absolute timeout
1    4 * 1.8 ^ 0    4s    4s
2    4 * 1.8 ^ 1    7s    11s
3    4 * 1.8 ^ 2    13s    24s
4    4 * 1.8 ^ 3    23s    47s
5    4 * 1.8 ^ 4    42s    89s
giving up    4 * 1.8 ^ 5    76s    165s

Apparently that ipsec.conf configuration is not relevant for ikev2 and that's the reason why it takes so long to reset the tunnel (in my case ~90+ seconds)

Is there any easy way I can fix that one?
As stated in the comment section of /usr/local/etc/ipsec.conf
root@OPNsense:/tmp # cat /usr/local/etc/ipsec.conf
# This file is automatically generated. Do not edit

where should I make the modification (considering I'm not gonna use ikev1 and only ikev2 in that setup)

Regards,
Plamen
#24
I have the exact same problem with 20.7.5 and here's what I've found:

https://wiki.strongswan.org/projects/strongswan/wiki/connsection

Quotedpdaction = none | clear | hold | restart

controls the use of the Dead Peer Detection protocol (DPD, RFC 3706) where R_U_THERE notification messages
(IKEv1) or empty INFORMATIONAL messages (IKEv2) are periodically sent in order to check the liveliness of the
IPsec peer. The values clear, hold, and restart all activate DPD and determine the action to perform on a timeout.
With clear the connection is closed with no further actions taken. hold installs a trap policy, which will catch
matching traffic and tries to re-negotiate the connection on demand. restart will immediately trigger an attempt
to re-negotiate the connection. The default is none which disables the active sending of DPD messages.

dpddelay = 30s | <time>

defines the period time interval with which R_U_THERE messages/INFORMATIONAL exchanges are sent to the peer.
These are only sent if no other traffic is received. In IKEv2, a value of 0 sends no additional INFORMATIONAL
messages and uses only standard messages (such as those to rekey) to detect dead peers.

dpdtimeout = 150s | <time>

defines the timeout interval, after which all connections to a peer are deleted in case of inactivity.
This only applies to IKEv1, in IKEv2 the default retransmission timeout applies, as every exchange is used to
detect dead peers.

And from https://wiki.strongswan.org/projects/strongswan/wiki/Retransmission

QuoteUsing the default values, packets are retransmitted as follows:
Retransmission    Formula    Relative timeout    Absolute timeout
1    4 * 1.8 ^ 0    4s    4s
2    4 * 1.8 ^ 1    7s    11s
3    4 * 1.8 ^ 2    13s    24s
4    4 * 1.8 ^ 3    23s    47s
5    4 * 1.8 ^ 4    42s    89s
giving up    4 * 1.8 ^ 5    76s    165s

And guess what I have in the config:

Quoteroot@OPNsense:/tmp # cat /usr/local/etc/ipsec.conf
# This file is automatically generated. Do not edit
config setup
  uniqueids = yes

conn pass
  right=127.0.0.1 # so this connection does not get used for other purposes
  leftsubnet=10.30.0.0/16
  rightsubnet=10.30.0.0/16
  type=passthrough
  auto=route

conn con1
  aggressive = no
  fragmentation = yes
  keyexchange = ikev2
  mobike = yes
  reauth = yes
  rekey = yes
  forceencaps = no
  installpolicy = yes
  type = tunnel
  dpdaction = restart
  dpddelay = 2s
  dpdtimeout = 12s



DPD is incorrectly configured here with ikev2 it should use:
Quote
retransmit_tries    Integer    5    Number of retransmissions to send before giving up
retransmit_timeout    Double    4.0    Timeout in seconds
retransmit_base    Double    1.8    Base of exponential backoff

Regards,
Plamen
#25
After digging into that in the last 4-5 hours I've found the following solutions (which may save hours for others having the same problem)

Solution 1:

Go to route based IPSec (VTI)

Pros:
Good for more complex topologies.

Cons:
MTU and fragmentation issues (must disable interface scrub in order IP fragmentation to work properly, and I'm not sure what will happen with MSS in that case)
More complex (have to deal with tunnel IPs, new single gateways, complex routing, etc) 

My 2 cents:
VTI mode works really well on ...cisco equipment, but not so well here (at least not in my virtual lab with a little bit outdated software now). If FRR plugin had worked properly (in conjunction with HA setup with config synchronization and CARP) it would have been a perfect fit with VTI (route based) mode.



Solution 2:

There's an option to have multiple Phase 2 entries and specify multiple subnets (and not using 10/8 aggregate one)

Pros:
Simply works, no MTU issues, IP fragmentation works w/o any pf modifications.

Cons:
Simply works in small networks, but it will become real administrative pain if there're more spokes and you add additional ones over the time (you'll have to go to all other spokes and add the additional phase2 entry and also on the hub)

 
Solution 3 (the best fit for me):

VPN -> IPSec -> Advanced Settings -> Passthrough networks
I've added the local /16 here on each spoke, which simply tells not to encrypt any local traffic.

Pros:
Fastest and simplest one

Cons:
I can't think of any disadvantages at this point.


Hope that helps.

Regards,
Plamen

#26
20.7 Legacy Series / How "Gateway" concept works
February 04, 2021, 01:47:28 PM
Hello,

I would like to understand the "Gateway" concept in OpnSense (I believe I'm missing something fundamental).
I have 2 static routes for exact the same destination network (let say 192.168.0.0/24) via two different single gateways: GW1 (1.1.1.1) with priority 10 and GW2 (2.2.2.2) with priority of 20.

Because the priority of GW1 is lower than GW2 that should mean (as per my understandings) that GW1 should be the preferred exit point for 192.168.0.0/24 (which seems to be the case, based on netstat -rnl4 output, 192.168.0.0/24 is pointing to 1.1.1.1).

I've also enabled the monitoring feature of both single gateways (using the IP address of the next-hop => for GW1 - 1.1.1.1 and for GW2 - 2.2.2.2)

What is the correct behavior if for some reason GW1 become unreachable (IP address 1.1.1.1 is not reachable)?
My assumption is that it will be displayed as offline (which it is) and because it's offline the entry in the routing table will not point to 1.1.1.1 (GW1) anymore, but it will point to 2.2.2.2 (GW2), because 2.2.2.2 is still reachable, although it has worst priority.  However that's not what I'm seeing. Despite of GW1 is offline, 192.168.0.0/24 is still pointing to it (1.1.1.1) and not to GW2 (2.2.2.2).

What I'm missing here?

Regards,
Plamen





#27
Hello,

I have an OpenSense Hub and Spoke topology with the following IP addressing schema:

Hub: 10.30.0.0/16
Spoke1: 10.31.0.0/16
Spoke2: 10.32.0.0/16
Spoke3: 10.33.0.0/16
...and so on
SpokeX: 10.X.0.0/16
SpokeY: 10.Y.0.0/16

Please note - all Spokes have a connection to the HUB only, there's no Spoke-to-Spoke physical direct link. The only way for SpokeX to communicate with SpokeY is via the HUB.
All Spokes and the HUB are OpnSense firewalls and there's no other firewalls/routes in the topology (only pure L2 switches and OpnSense FWs)

Of course, I need an encryption between all Spokes and the HUB (including Spoke-to-Spoke traffic).

What I did so far (which seems to work up certain extent) is:

Each Spoke has a static default-gw pointing to the HUB and the HUB has a static route for the corresponding Spoke's subnet pointing to the that Spoke opnsense FW.
Without encryption involved I have full connectivity between each spoke and HUB and between spokes itself (via the HUB)

Each Spoke forms an IPSec tunnel to the HUB and for phase 2 I have:

SpokeX:
Local Subnet: 10.X.0.0/16
Remote Subnet: 10.0.0.0/8

Hub (to Spoke X):
Local Subnet: 10.0.0.0/8
Remote Subnet: 10.X.0.0/16

That works fine for the connectivity between Spoke X and HUB sites, as well as Spoke X and Spoke Y (via the HUB site) but the problem I'm facing and looking for a solution is for local Spoke traffic:

For example - Spoke 1 OpnSense FW has 2 inside logical interfaces:

VLAN100 - 10.31.100.1/24
VLAN200 - 10.31.200.1/24

Local hosts (in Spoke1 site) living in VLAN100 (10.31.100.0/24) are NOT able to communicate with local hosts (again in Spoke1 site) living in VLAN200 (10.31.200.0/24). Even these hosts are NOT able to reach their default gateway. I assume their traffic is encrypted following 10/8 phase2 policy.

Is there any way I can exclude that local traffic from being encrypted? These 2 interfaces are locally connected to the same firewall, so apparently I would like to have clear text connectivity between both local subnets following OpnSense routing table (directly connected routes)

Can I have more than 1 subnet in phase 2 (local or remote)?

Any suggestions are highly appreciated!

Regards,
Plamen


#28
Is there any easy way I can run the GUI in debug mode and see all interface related commands (ifconfig ...) parsed to the BSD OS?
#29
For the missing VLAN - I've checked, although VLAN interface is not seen from the GUI (neither from Interfaces -> Other Types -> VLAN, nor under Interfaces -> Assignments) the interface has been created on the BSD level, which of course interrupt again the whole LACP group.
#30
A little bit more on the "feature" - I start believing this is a bug.
I found out why there is a traffic glitch:
When I create a new VLAN on top of the LACP group all physical interfaces (part of the LACP group) flaps.
Moreover adding a new VLAN from the GUI not always works - quite often when I click "Apply" button and nothing happens (there's no error message, everything looks good, but the new vlan is not created, can't be seen from the GUI and I have to start over doing exactly the same in order to have the vlan created).
Apparently that creates operation issues (no one wants to disrupt production traffic over the existing VLANs when tries to add a new VLAN...) My workaround for the moment is to create the vlan on the standby unit first (yes, physical interfaces flaps, but at least there's no prod traffic) and after that to force CARP failover and create the vlan on the new standby unit and force failover again....

I reproduced the same behavior in the virtual LAB (vmware workstation and 20.7.5 running as a VM) as well as several physical Dell VEP devices. Same thing happens even on a single physical interface only without being part of LAGG.