OPNsense Forum

English Forums => General Discussion => Topic started by: toxic on March 21, 2021, 01:39:09 PM

Title: [SOLVED] Static LAGG 2GB/s to 2 clients 1 Gateway from 1 NAS
Post by: toxic on March 21, 2021, 01:39:09 PM
Hello,

I'm getting desperate, I need help to find a setup where 2 windows clients can download files from my NAS using SMBv3 both at 1GB/s at the same time for a total 2GB/s sent from the NAS...

I've tried a lot of things and got LAGG working between lots of parties achieving 2GB/s several times but never for end-to-end from laptop to NAS.

What I have :

What I want :

If you have ideas you can stop reading here and propose it ;) If you have time I'm now gonna tell you what I tried that did not work...

What I would like but can compromise on :

I'm at a point where I'm considering the simplest setup that fullfils almost none of the optional wishes : one flat network for all LAN and each firewall a bridge to the ISP(wan) network with CARP VIP. That would work, but I'd be mostly blind in splunk as to what my NAS is doing for my LAN...

What I've tried, focusing on the primary router running in proxmox :

In almost all these config, I do get full 2GB/s speeds on several legs of the network (almost all):

In all these cases I am able to get 2GB/s, except...
Cases 4 and 5 show where it breaks : opnSense is able to SEND to the NAS at full 2GB/s, but running iperf client with --reverse, I don't get the full 2GB/s, both iperf clients only add up to 1GB/s...

So I never got to my goal at the very top of this post : 2GB/s from clients to NAS through opnSense...

Any help or idea would be greatly appreciated !

Thanks a lot for your reading and help !
Title: Re: [Help] Static LAGG 2GB/s to 2 clients 1 Gateway from 1 NAS
Post by: toxic on March 24, 2021, 07:30:07 PM
Ok, got this working, I had 2 issues in fact :


   1/ Getting the bond working with vlan-aware in proxmox
        - create bond of all phy interfaces you need
        - create vmbrX having this bond as slave
        - attach the opnSense VM to vmbrX to access all VLANs
        - attach VMs to vmbrX.Y with Y beeing the VLAN
        - all traffic works, from pfSense to vmbrX, to phy devices with loadbalancing, no issues !
        - (what I did was attach vmbrY to bond0.Y and that is wrong but can almost work, it fails in subtle ways...)
   2/ Getting the Synology NAS to use more than layer2 hashing for his bond, because with a gateway inbetween, layer2 will always result in the hash putting all the traffic to the gateway on the same phy device regardless of the IP of the destination device...


Right now I'm still facing some issues but I think I understand now :


What happens is that I can almost never use 2 windows laptops to achieve 2GB/s downloading from the NAS, because I changed the bond on the NAS to use level3+4 for the hash algorythm.

Now, the 2 laptops are on a 10.0.30.x while the NAS is on 10.0.11.x so level2 is not enough for the hash since all trafic is going to the gateway anyway, so same level2 MAC address for both file downloads...

But level 3+4 will loadbalance more than the 2 flows in question for this download, and even SMB is maybe using several tcp connections... So even if level 3+4 have different results, the hash will have to chose between only 2 outgoing phy interface on the NAS, inevitabely there will be a lot of collisions and a lot of traffic that will have to share the same phy.


But with this setup, playing around with VMs downloading from the NAS using iperf, it then depends on the port I choose, but I am reliably able to avoid the collision by launching iperf client downloading on one machine then launching iperf client on another machine and if collision (bandwith drops on the first iperf), then I cancel and try again on the same machine with another ports on the server, keeping the first iperf running. Rince and repeat changing ports, at some point you'll get lucky and see 2GB/S. It's not always the same port combinaison even for the same IPs, but I always find one that works.


In this manner, it takes some time but I can always find a way to get 2GB/s from the NAS !

(In fact, I even got 2BG/s downloading from the NAS allthewile uploading 2GB/s to the NAS ! I was not aware but apparently a 1GB/s NIC can do 1 down and 1 up at the same time !)


So I was hoping that I could find a way to more reliably get 2GB/s downloading from the NAS to 2 devices...


In fact, I don't have any single device that can exceed 1GB/s by itself, so level3+4 does not make sense in this case I think.

My thinking now is that since most of my file download will be from devices in 10.0.30.x downloading from the NAS in 10.0.11.x, if I change the hash algorythm to level2+3 on the NAS, the level2 will always be the same, that is the MAC of the gateway, but level 3 will always be different because each device on my net has it's own IP and my gateway is not doing NAT between my LANs. So I hope there is a higher chance that with level 2+3 I can achieve 2GB/s more reliably... But in fact, layer 2+3 are already different right now every time, just unlucky with hash collisions when downloading using windows explorer and not able to control the port it uses... I got it working once or twice but I had to go on the NAS and kill the existing connections, hoping that the new one would result in a different NIC beeing selectedgiven the new srcPort in level4...


But in then end I also see that with so many IPs and only 2 physical NICs, there will be a lot of "hash collisions" and the logic is apparently not able to see that a specific physical NIC is overloaded and some tcp connections could benefit from changing to the other NIC...

So I'm not really hopefull with changing to Level2+3 for the NAS bond...

That's disappointing to see 1 NIC on the NAS beeing overloaded and the other one just hanging around doing nothing while both windows laptops struggle to get more frames than the other...


That's quite a disappointment to now understand how LAGG works... Will not be putting more NICs into my devices... Can't wait for 5GB/s to catch up in the home market, sadly even 2.5G is not even there...