Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - nwildner

#1
Quote from: AveryFreeman on October 22, 2020, 04:36:49 AM
You guys got me interested in this subject. I have tested plenty of iperf3 against my VMs in my little 3-host homelab, my 10GbE is just a couple DACs connected between the 10Gbe "backbone" IFs of my Dell Powerconnect 7048P, which is really more of a gigabit switch.

The infrastructure i have on that remote office i was reporting so far:

- PowerEdge R630(2 servers)
- 2 Socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz with 12 cores each(24 cores per server)
- 3x NetXtreme II BCM57800 10 Gigabit Ethernet (dual port NIC), meaning 6 phisical adapters distributed into 3 virtual switches (2 nics vm, 2 nics vmotion, 2 nics vmkernel)
- 512GB Ram each server
- Plenty of storage on an external SAS 12Gbps(2x6Gbps active + 2x6GBbps passive paths) MD3xxx Dell storage with round-robin paths
- 2x  Dell N4032F as core/backbone switches with 10Gbps ports and stacked with 2x 40Gbps ports.
- 6 port trunks for each server. 3 ports per trunk per stacking member so, each vSwitch nic will touch one stack member
- Stack member on Dell N series are treated as a unity so, LACP can be configured across stack members(no MLAG involved).

Even when trying to transfer data between vms that were not registered on the same phisical hardware i can achieve 8Gbps easily, except with vmxnet3 driver from FreeBSD 12.1.

Quote from: mimugmail on October 22, 2020, 07:27:38 AM
Be honest to yourself, would you buy a piece of hardware with only 2 cores if you have to requirement for 10G? The smallest hardware with 10 interfaces has 4 core minimum.

What is not honest is to pretend that a VM cant push more than 1Gbps or achieve decent throughput rates while having only 1 vCPU configured, and that is not true. On the contrary, while doing virtualization you should always configure resources in a way that will avoid cpu oversubscription. Having for example a 4vCPUs VM that is mostly idle and does not run cpu intense operations will create problems to other vms on the same pool/share/physical hardware. For simple iperf3 and network transfer tests with FreeBSD 13 1vCPU did fine, while OPNSense(FreeBSD 12.1) with 4vCPU and high cpu shares being the only VM with that share configuration crawled during transfers.

Vmxnet3 on FreeBSD 12.1 is garbage. It seems that the port to iflib created some regressions related to MSI-X, tx/rx queues, iflib leaking MSI-x messages, non-power-of 2 tx/rx queue configs and others. I could even find some LRO regressions on commits that could explain retransmissions and the abismal lack of performance that i've reported here on a previous page while trying to enable LRO as a workaround for that performance issue. https://svnweb.freebsd.org/base/head/sys/dev/vmware/vmxnet3/if_vmx.c?view=log

The test i've made above with FreeBSD 13-CURRENT, i was only using 1vCPU, 4GB ram, pvscsi and vmxnet3 and the system performed greatly compared with the vmxnet3 driver state of the FreeBSD 12.1-RELEASE.
#2
For those interested, started a FreeBSD 13 Current VM (2020-oct-08), vmxnet3 interface, created one 802.1q vlan, and did some iperf between this guy and a Linux VM and, BOOM!. Full performance with 4 paralelism configured:

[ ID] Interval           Transfer     Bandwidth       Retr
[  5]   0.00-10.23  sec  2.34 GBytes  1.96 Gbits/sec    0             sender
[  5]   0.00-10.23  sec  2.34 GBytes  1.96 Gbits/sec                  receiver
[  7]   0.00-10.23  sec  2.09 GBytes  1.75 Gbits/sec    0             sender
[  7]   0.00-10.23  sec  2.09 GBytes  1.75 Gbits/sec                  receiver
[  9]   0.00-10.23  sec  1.67 GBytes  1.40 Gbits/sec    0             sender
[  9]   0.00-10.23  sec  1.67 GBytes  1.40 Gbits/sec                  receiver
[ 11]   0.00-10.23  sec  1.65 GBytes  1.39 Gbits/sec    0             sender
[ 11]   0.00-10.23  sec  1.65 GBytes  1.39 Gbits/sec                  receiver
[SUM]   0.00-10.23  sec  7.75 GBytes  6.50 Gbits/sec    0             sender
[SUM]   0.00-10.23  sec  7.75 GBytes  6.50 Gbits/sec                  receiver

Maybe this is some regression on 12.1.

#3
Quote from: AdSchellevis on October 20, 2020, 12:53:16 PM
@nwilder: would you be so kind not to keep spreading inaccurate / false information around. We don't use any modifications on the vmx driver, which can do more than 1Gbps at ease on a stock FreeBSD 12.1. LRO shouldn't be used on a router for obvious reasons (also pointed at in my earlier post https://forum.opnsense.org/index.php?topic=18754.msg90576#msg90576).

All right. I've tried lro/lso/vlan_hwfilter cause i'm running out of options here. Tried all those sysctls from that FreeBSD bugreport and no sensible performance increase was noticed after tunning tx/rx descriptors. Same 800Mbps limited on transfer whenever OPNSense tries to contact another host.

Other tests i've made:

1 - Iperf from one vlan interface to another, same parent interface(vmx0): After that, i've made another test by putting iperf to listen on one vlan interface(parent vmx0) while binding the client to another vlan interface(parent also vmx0) on this OPNSense box and got pretty good forwarding rates:

iperf3 -c 10.254.117.ip -B 10.254.110.ip -P 8 -w 128k 5201
[SUM]   0.00-10.00  sec  8.86 GBytes  7.61 Gbits/sec    0             sender
[SUM]   0.00-10.16  sec  8.86 GBytes  7.49 Gbits/sec                  receiver


I was just trying to test internal forwarding.

2 - Try do disable ipsec and it's passthrough related configs: By thinking that ipsec could be the one throttling the connection through it's passhtrough tunnels on traffic that comes in/ou of vlan interfaces, i've disabled all ipsec configs and iperf still got 800Mbps max from firewall to Windows/Linux servers.

3 - Disable PF: After disabling ipsec tunnels tried to disable pf entirely, did a fresh boot and put OPNSense in router mode. No luck (still the same iperf performance).

4 - Adding vlan 117 to a new phisical vmx interface, letting the hypervisor tag it: Presented a new interface, vlan 117 tagged by the hypervisor, changed the assignment inside OPNSense ONLY to this specific servers network. iperf tests keep getting the same speed.

Additional logs: bug id=237166 threw some light on this issue, and i've found that MSI-X vectors aren't being handled correctly by vmware(looking from the point of view that MSI-X related issues were resolved on FreeBSD). I'm looking for any documentation that could help me on this case. I'll try to thinker with hw.pci.honor_msi_blacklist=0 on loader.conf to see if i get better performance.

vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci4
vmx0: Using 4096 TX descriptors and 2048 RX descriptors
vmx0: Using 4 RX queues 4 TX queues
vmx0: failed to allocate 5 MSI-X vectors, err: 6
vmx0: Using an MSI interrupt
vmx0: Ethernet address: 00:50:56:a5:d3:68
vmx0: netmap queues/slots: TX 1/4096, RX 1/4096


Edit: "hw.pci.honor_msi_blacklist: 0" removed the error form the log, but transfer rates remain the same:

vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci4
vmx0: Using 4096 TX descriptors and 2048 RX descriptors
vmx0: Using 4 RX queues 4 TX queues
vmx0: Using MSI-X interrupts with 5 vectors
vmx0: Ethernet address: 00:50:56:a5:d3:68
vmx0: netmap queues/slots: TX 4/4096, RX 4/4096

root@fw01adb:~ # sysctl -a | grep blacklis
vm.page_blacklist:
hw.pci.honor_msi_blacklist: 0



Hope that some of my tests could bring light on this issue.
#4
Quote from: mimugmail on October 20, 2020, 06:03:29 AM
You wrote vmxnet cant handle more than one gb which is not true. Now when someone googles for similar problem they might think it's a general limitation. I have no idea about hyperviaors, but I dont want that wrong facts are going wild

Just read again my reports.

vmxnet3 is not handling more than 1Gbps on FreeBSD(maybe, OPNSense specific patches). I never said vmxnet3 is garbage, and as you can see, Linux is handling traffic fine. I have other phisical machines on different offices and vmxnet3 is just fine with Linux and Windows.

And if you google for solutions, you will find plenty of information(and that also means missinformation). Bugs and other fixes(maybe iflib/vmx related) that COULD work:


UPDATE REPORT: had to disable lro, lso and vlan_hwfilter since it made traffic entering on that interface horribly slow (7Mbps max), and that is a regression that we could not handle.

Better have an interface using 1Gbps than one that uses 4,5Gbps only one way.
#5
Quote from: Gauss23 on October 19, 2020, 08:37:01 PM
What about this idea?
https://xenomorph.net/freebsd/performance-esxi/

Well, only enabling lro didn't change much. The guy that wrote this tutorial is using the same NIC series i'm using it was worth trying to enable lro,tso and vlan_hwfilter, and after that, things got a loooot better.

Still not catching 10Gbps, but could get almost 5Gbps which is pretty good:

Only enabling lro:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.17  sec   118 MBytes  97.5 Mbits/sec    0             sender
[  5]   0.00-10.17  sec   118 MBytes  97.5 Mbits/sec                  receiver
[  7]   0.00-10.17  sec   120 MBytes  98.9 Mbits/sec    0             sender
[  7]   0.00-10.17  sec   120 MBytes  98.9 Mbits/sec                  receiver
[  9]   0.00-10.17  sec   120 MBytes  98.8 Mbits/sec    0             sender
[  9]   0.00-10.17  sec   120 MBytes  98.8 Mbits/sec                  receiver
[ 11]   0.00-10.17  sec   117 MBytes  96.8 Mbits/sec    0             sender
[ 11]   0.00-10.17  sec   117 MBytes  96.8 Mbits/sec                  receiver
[ 13]   0.00-10.17  sec   118 MBytes  97.4 Mbits/sec    0             sender
[ 13]   0.00-10.17  sec   118 MBytes  97.4 Mbits/sec                  receiver
[ 15]   0.00-10.17  sec   119 MBytes  98.0 Mbits/sec    0             sender
[ 15]   0.00-10.17  sec   119 MBytes  98.0 Mbits/sec                  receiver
[ 17]   0.00-10.17  sec  90.8 MBytes  74.9 Mbits/sec    0             sender
[ 17]   0.00-10.17  sec  90.8 MBytes  74.9 Mbits/sec                  receiver
[ 19]   0.00-10.17  sec  72.2 MBytes  59.6 Mbits/sec    0             sender
[ 19]   0.00-10.17  sec  72.2 MBytes  59.6 Mbits/sec                  receiver
[SUM]   0.00-10.17  sec   875 MBytes   722 Mbits/sec    0             sender
[SUM]   0.00-10.17  sec   875 MBytes   722 Mbits/sec                  receiver

iperf Done.

vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
             options=800428<VLAN_MTU,JUMBO_MTU,LRO>
             ether 00:50:56:a5:d3:68
             inet6 fe80::250:56ff:fea5:d368%vmx0 prefixlen 64 scopeid 0x1
             media: Ethernet autoselect
             status: active   
             nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL


lro, tso and vlan_hwfilter enabled:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec  1.08 GBytes   929 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  1.08 GBytes   929 Mbits/sec                  receiver
[  7]   0.00-10.01  sec   510 MBytes   427 Mbits/sec    0             sender
[  7]   0.00-10.01  sec   510 MBytes   427 Mbits/sec                  receiver
[  9]   0.00-10.01  sec  1.05 GBytes   903 Mbits/sec    0             sender
[  9]   0.00-10.01  sec  1.05 GBytes   903 Mbits/sec                  receiver
[ 11]   0.00-10.01  sec   953 MBytes   799 Mbits/sec    0             sender
[ 11]   0.00-10.01  sec   953 MBytes   799 Mbits/sec                  receiver
[ 13]   0.00-10.01  sec   447 MBytes   375 Mbits/sec    0             sender
[ 13]   0.00-10.01  sec   447 MBytes   375 Mbits/sec                  receiver
[ 15]   0.00-10.01  sec   409 MBytes   342 Mbits/sec    0             sender
[ 15]   0.00-10.01  sec   409 MBytes   342 Mbits/sec                  receiver
[ 17]   0.00-10.01  sec   379 MBytes   318 Mbits/sec    0             sender
[ 17]   0.00-10.01  sec   379 MBytes   318 Mbits/sec                  receiver
[ 19]   0.00-10.01  sec   825 MBytes   691 Mbits/sec    0             sender
[ 19]   0.00-10.01  sec   825 MBytes   691 Mbits/sec                  receiver
[SUM]   0.00-10.01  sec  5.57 GBytes  4.78 Gbits/sec    0             sender
[SUM]   0.00-10.01  sec  5.57 GBytes  4.78 Gbits/sec                  receiver

iperf Done.
root@fw01adb:~ # ifconfig vmx0
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8507b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
        ether 00:50:56:a5:d3:68
        inet6 fe80::250:56ff:fea5:d368%vmx0 prefixlen 64 scopeid 0x1
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

#6
Quote from: Supermule on October 19, 2020, 09:42:48 PM
Where do you manually edit the rc.conf??

There is an option inside the web administration:

Interface > Settings > Hardware LRO > Uncheck it to enable LRO
#7
Quote from: Gauss23 on October 19, 2020, 08:37:01 PM
What about this idea?
https://xenomorph.net/freebsd/performance-esxi/

I'll try as soon as our users stop doing transfers at that remote office :)
Nice catch.
#8
Quote from: mimugmail on October 19, 2020, 07:38:33 PM
I have customers pushing 6Gbit over vmxnet driver.

OK. And what i'm supposed to do with this information? Not trying to be rude, but there is plenty of reports on this topic that goes against your scenario.

Do you have any idea what i could tune to achieve better performance then?
#9
Quote from: mimugmail on October 13, 2020, 07:20:17 AM
It's under investigation, 20.7.4 May bring an already fixed kernel

Just to add more info on this topic: vmxnet3 can't handle more than 1Gbps while traffic testing OPNSense to Windows(and reverse mode iperf) on the same vlan. It's a big hit since our users frequently access fileserver and PDM(autocad-like) data that are on different vlans(and thus, all traffic is forwarded by OPNSense). All network is 10Gbps including user workstations and esxi 6.7u3 servers.

We have noticed a big hit on transfer speeds after changing our firewall vendor to OPNSense on that location and we believe that it relates to this vmxnet3 case.

OPNSense VM Specs:

  • 4 vCPU - Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  • 8 GB RAM
  • vmxnet3 attached to a vSwitch with 2 10Gbp/s - QLogic Corporation NetXtreme II BCM57800(broadcom Dell OEM).
  • 10 vlans

OPNSense and Windows server, same vlan, opnsense as gateway of this server vlan:
OPNSENSE to WINDOWS:
iperf3 -c 10.254.win.ip -P 8 -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.02  sec   118 MBytes  98.8 Mbits/sec    0             sender
[  5]   0.00-10.02  sec   118 MBytes  98.8 Mbits/sec                  receiver
[  7]   0.00-10.02  sec   116 MBytes  96.8 Mbits/sec    0             sender
[  7]   0.00-10.02  sec   116 MBytes  96.8 Mbits/sec                  receiver
[  9]   0.00-10.02  sec   113 MBytes  94.5 Mbits/sec    0             sender
[  9]   0.00-10.02  sec   113 MBytes  94.5 Mbits/sec                  receiver
[ 11]   0.00-10.02  sec   109 MBytes  91.5 Mbits/sec    0             sender
[ 11]   0.00-10.02  sec   109 MBytes  91.5 Mbits/sec                  receiver
[ 13]   0.00-10.02  sec   107 MBytes  89.7 Mbits/sec    0             sender
[ 13]   0.00-10.02  sec   107 MBytes  89.7 Mbits/sec                  receiver
[ 15]   0.00-10.02  sec  99.8 MBytes  83.5 Mbits/sec    0             sender
[ 15]   0.00-10.02  sec  99.8 MBytes  83.5 Mbits/sec                  receiver
[ 17]   0.00-10.02  sec  82.0 MBytes  68.7 Mbits/sec    0             sender
[ 17]   0.00-10.02  sec  82.0 MBytes  68.7 Mbits/sec                  receiver
[ 19]   0.00-10.02  sec  71.2 MBytes  59.6 Mbits/sec    0             sender
[ 19]   0.00-10.02  sec  71.2 MBytes  59.6 Mbits/sec                  receiver
[SUM]   0.00-10.02  sec   816 MBytes   683 Mbits/sec    0             sender
[SUM]   0.00-10.02  sec   816 MBytes   683 Mbits/sec                  receiver


OPNSENSE to WINDOWS(iperf3 reverse mode):
iperf3 -c 10.254.win.ip -P 8 -R -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  88.4 MBytes  74.1 Mbits/sec                  sender
[  5]   0.00-10.00  sec  88.2 MBytes  74.0 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   118 MBytes  98.7 Mbits/sec                  sender
[  7]   0.00-10.00  sec   117 MBytes  98.5 Mbits/sec                  receiver
[  9]   0.00-10.00  sec  91.9 MBytes  77.1 Mbits/sec                  sender
[  9]   0.00-10.00  sec  91.7 MBytes  76.9 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec  91.6 MBytes  76.9 Mbits/sec                  sender
[ 11]   0.00-10.00  sec  91.5 MBytes  76.7 Mbits/sec                  receiver
[ 13]   0.00-10.00  sec  92.6 MBytes  77.7 Mbits/sec                  sender
[ 13]   0.00-10.00  sec  92.4 MBytes  77.5 Mbits/sec                  receiver
[ 15]   0.00-10.00  sec  94.4 MBytes  79.2 Mbits/sec                  sender
[ 15]   0.00-10.00  sec  94.2 MBytes  79.0 Mbits/sec                  receiver
[ 17]   0.00-10.00  sec   100 MBytes  84.3 Mbits/sec                  sender
[ 17]   0.00-10.00  sec   100 MBytes  84.1 Mbits/sec                  receiver
[ 19]   0.00-10.00  sec  99.9 MBytes  83.8 Mbits/sec                  sender
[ 19]   0.00-10.00  sec  99.6 MBytes  83.6 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec   777 MBytes   652 Mbits/sec                  sender
[SUM]   0.00-10.00  sec   775 MBytes   650 Mbits/sec                  receiver


Linux VM Specs:

  • 1 vCPU - Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
  • 4 GB RAM
  • vmxnet3 attached to a vSwitch with 2 10Gbp/s - QLogic Corporation NetXtreme II BCM57800(broadcom Dell OEM).
  • vmnet attached to the vm(no visibility on vlan tags)



Linux server and Windows server, same vlan cause they are designated on the "servers vlan":
LINUX TO WINDOWS:
iperf3 -c 10.254.win.ip -P 8 -w 128k 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.17 GBytes  1.00 Gbits/sec  128             sender
[  4]   0.00-10.00  sec  1.17 GBytes  1.00 Gbits/sec                  receiver
[  6]   0.00-10.00  sec   275 MBytes   231 Mbits/sec   69             sender
[  6]   0.00-10.00  sec   275 MBytes   231 Mbits/sec                  receiver
[  8]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  150             sender
[  8]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec  1.13 GBytes   972 Mbits/sec   98             sender
[ 10]   0.00-10.00  sec  1.13 GBytes   972 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec   264 MBytes   222 Mbits/sec   37             sender
[ 12]   0.00-10.00  sec   264 MBytes   222 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec  1.13 GBytes   973 Mbits/sec  109             sender
[ 14]   0.00-10.00  sec  1.13 GBytes   973 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec   280 MBytes   235 Mbits/sec   34             sender
[ 16]   0.00-10.00  sec   280 MBytes   235 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec   246 MBytes   206 Mbits/sec   64             sender
[ 18]   0.00-10.00  sec   246 MBytes   206 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  5.59 GBytes  4.81 Gbits/sec  689             sender
[SUM]   0.00-10.00  sec  5.59 GBytes  4.80 Gbits/sec                  receiver


LINUX TO WINDOWS(Reverse mode iperf): This is where iperf and vmxnet reaches it's full potential
iperf3 -c 10.254.win.ip -P 8 -R -w 128k 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  3.17 GBytes  2.72 Gbits/sec                  sender
[  4]   0.00-10.00  sec  3.17 GBytes  2.72 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  3.10 GBytes  2.66 Gbits/sec                  sender
[  6]   0.00-10.00  sec  3.10 GBytes  2.66 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  2.91 GBytes  2.50 Gbits/sec                  sender
[  8]   0.00-10.00  sec  2.91 GBytes  2.50 Gbits/sec                  receiver
[ 10]   0.00-10.00  sec  3.00 GBytes  2.58 Gbits/sec                  sender
[ 10]   0.00-10.00  sec  3.00 GBytes  2.58 Gbits/sec                  receiver
[ 12]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  sender
[ 12]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  receiver
[ 14]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  sender
[ 14]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  receiver
[ 16]   0.00-10.00  sec  2.68 GBytes  2.31 Gbits/sec                  sender
[ 16]   0.00-10.00  sec  2.68 GBytes  2.31 Gbits/sec                  receiver
[ 18]   0.00-10.00  sec  2.63 GBytes  2.26 Gbits/sec                  sender
[ 18]   0.00-10.00  sec  2.63 GBytes  2.26 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec                  receiver
#10
Sorry to necrobump this post, but this is just a feedback on why i've stopped posting about this reconnection method.

Since we moved from Checkpoint to Fortigate to manage our sdwan solution at the main site, things got a lot better. Checkpoint ipsec implementation is GARBAGE and require you to use a lot of kludges to make it work.

Tunnel isolation, ikev1 only, no dpd(btw, dead peer detection only works with site2site checkpoint only) and other stuff that this proprietary firewall does to cripple third-party integration. It was deliberately designed to integrate badly with other firewalls.

With Fortigate on our main site, is just a matter of configuring dpd counters to keep the tunnel reconnecting on WAN outages, and with tunnel restart on ou main site.

Maybe this info could be useful to others trying to integrate with this garbage IPSec provided by Checkpoint.
#11
Actually, this solution is working fine with short outage of the link(10-15min).

Today, our ISP suffered from problems that kept the link out about 2 hours, and strongswan seems to hang in a way that configctl isn't able to really stop it. This is what happens after a lot of restarts(once a minute as i've configured).


Aug 28 10:26:33 fw01 monit[75637]: 'IPSEC_RELOAD' ping test failed
Aug 28 10:26:33 fw01 monit[75637]: 'IPSEC_RELOAD' trying to restart
Aug 28 10:26:33 fw01 monit[75637]: 'IPSEC_RELOAD' stop: '/usr/local/sbin/configctl ipsec stop'
Aug 28 10:26:34 fw01 ipsec_starter[5513]: ipsec starter stopped
Aug 28 10:26:34 fw01 monit[75637]: 'IPSEC_RELOAD' start: '/usr/local/sbin/configctl ipsec start'
Aug 28 10:26:35 fw01 ipsec_starter[74021]: Starting strongSwan 5.8.0 IPsec [starter]...
Aug 28 10:26:35 fw01 ipsec_starter[74021]: charon is already running (/var/run/charon.pid exists) -- skipping daemon start


Any clues?
#12
Well, since i was afraid i didn't have enough time to investigate, cause my coworkers weren't happy with this ipsec issue, i did a config export on the current firewall and imported this config on a fresh 19.7 VM.

It Did the trick.

configctl is restarting the tunel whenever my monit rule gets a match again :)

Cheers.
#13
Hi. Weeks ago, i've managed to create a way to reconnect bogus IPSec tunnels.

However, after upgrading from 19.1.10 to 19.7(and 19.7.2 after that), configctl isnt able to stop/kill strongswan anymore.

Every time i issue /usr/local/sbin/configctl ipsec stop, an "OK" is printed on the screen, but ipsec statusall shows that the tunnel it's still running with only the bypass networks connection, creating a situation where no connection is available to our main office. If i try to stop the service again, another "OK" will be print without really stopping the service.

I have a second OPNSense installation on other remote site that was deployed using 19.7 without the major version upgrade and with the same configurations(being the box local networks addressing the exception) and this feature is working great.

Is there anything else I could to do help on investigating this issue?



#14
Quote from: frmoronari on February 27, 2019, 10:47:48 PM
Boa noite;

Frédney @frmoronari.

Espírito Santo.

Iniciando no OPNSense essa semana, Analista de Rede / Soluções Free Software.
20 Anos de experiência na área mas todo dia aprendendo algo novo.

Tu também por aqui frmoronari?  ;)

Me chamo Nícolas Wildner(@nwildner), sou de Bento Gonçalves/RS e iniciei no OPNSense faz pouco tempo.

Estou usando ele como firewall em escritórios remotos com IPSec+vlans, e até agora tenho gostado da solução.
#15
Well, this is how we've created a solution to this case: Ping a host inside our main site that is only reachable through ipsec, if it isn't reachable, restart ipsec.

Steps:

Services > Monit > Settings
General Setings:

Set Polling Interval to 120 and Start Delay to 60. This will make monit checks to execute each 2 minutes. Check Enable Monit.

Service Test Settings > Add New
Name: IPSEC_ICMP_MONITOR
Condition: failed ping4 count 5 address 10.x.x.254
Action: Restart

10.x.x.254 is the IP Address of this firewall LAN interface. This will ensure that i'm using a source address that will be able to reach that host inside the IPsec tunnel. Keep in mind that i have a specific SPD rule that will deal with delivering traffic.

Service Settings > Add New
Name: REDIAL_IPSEC
Type: Remote Host
Address: 172.y.y.y
Start: /usr/local/sbin/configctl ipsec start
Stop: /usr/local/sbin/configctl ipsec stop
Tests: IPSEC_ICMP_MONITOR

172.y.y.y is our main AD server. Could be any host with real importance and that you know will be always up-and-running inside your main site.

Done. No more manual intervention on this host.