With Load Balancing, low performance and loss of the Web session

Started by DarkCorner, January 30, 2024, 10:21:15 AM

Previous topic - Next topic
I followed both the official documentation and various online guides to configure a Load Balancing on two lines with identical characteristics.
I also read several posts in the forum.
Nonetheless, I have low performance and often a web session is interrupted or a page is not reachable.

The configuration is the standard one with small changes.

  • System: Gateways: Group / Trigger Level = Packet Loss or High Latency
  • System: Settings: General / DNS servers 1) 9.9.9.9  2) IP monitored by WAN1 and assigned to GW1  3) IP monitored by WAN2 and  assigned to GW2
  • System: Settings: General / DNS server options = Off
  • Firewall: Rules: LAN. The destination is on "LAN Address" and not on the firewall IP because in my opinion it should match.

When the Gateway for the "any/any" rule is on LoadBalancing, then there are problems; just set it back to "default" and everything goes back to normal.

I preferred Load Balancing over Failover to make better use of connectivity, taking it for granted that if one of the two lines falls, the traffic is still moved entirely to the active one.

The changes do not seem decisive to me.
What am I doing wrong?



I'll close the post with an update.
I waited for the latest version of OPNSense (now there is 24.1_1).
For a test lasting about an hour the following were used:

  • A Win11 PC on the LAN with a Youtube session and web search.
  • A Win10 PC on the LAN with several Windows Updates.
  • Two Linux PCs on GuestNet, both with apt update and a Youtube session.
  • A NAS on DMZ with a heavy Dropbox storage upgrade.
The Load Balance works because in the graph I see both the one relating to the WAN and WAN2 activated.
However, although I did not find any errors in the logs, the traffic was not continuous with frequent interruptions and the inability to open some web pages.
Only by reporting the default gateway in the rules there were no problems.
In LAN, GuestNet and DMZ there are only two rules. The one for DNS (on the default gateway) and the "Pass Any Any" one (on the Gateway Group LoadBalancing).

Not finding any support here, not even as a suggestion to activate other logs, I decide to deactivate Multi WAN.

I reopen the post after reactivating the Multi WAN because I wanted to try a further variation by changing System: Gateways: Group / Trigger Level to "Member Down".
A new short test with just two Windows PCs would seem to cause no problems.

My reflection comes from a consideration.
Previously I had no DNS problems, nor were there any errors reported in the logs.
The page simply didn't load and you had to refresh to open it. The error was therefore due to the difficulty in reaching a URL correct and resolved; on the other hand there could be no problems simply calling up google.com.

So, too much time was spent in OPNsense, probably deciding which of the two WANs to forward the traffic to.

Changing Trigger Level from "Packet Loss and High Latency" to "Member Down" probably takes less time for forwarding although I don't understand the reason (assuming that mine is a decisive step).

What is certain is that, by doing so, in line conditions with low performance the traffic would no longer be forwarded onto the second line, but only if the first one stopped completely.

Having said all this, I'm ending the test at least until I have more information.

Well the performance degradation certainly comes from states being reset for one reason or another. It's not easy to gather that information by asking text-based questions.

One thing you should make sure is to try Firewall: Settings: Advanced: Multi WAN:

Checking "Use sticky connections" should help.

Unchecking "Use shared forwarding between packet filter, traffic shaper and captive portal" can help if the kernel interferes, which could be a side effect of overlapping rules being evaluated in a suboptimal fashion.

Checking "Disable force gateway" may help, but may also be irrelevant to your issue.

Making sure to find a simple test case and tracing it through the live view with logging enabled would be a good way to see when and where (and why) connections are reset (at least that is the working theory here).

It could be a side effect of your network setup too. Not an easy case as said elsewhere.


Cheers,
Franco

February 09, 2024, 06:14:14 AM #4 Last Edit: February 09, 2024, 06:34:50 AM by johnmcallister
Quote from: DarkCorner on January 30, 2024, 10:21:15 AM
I followed both the official documentation and various online guides to configure a Load Balancing on two lines with identical characteristics.
I also read several posts in the forum.
Nonetheless, I have low performance and often a web session is interrupted or a page is not reachable.
....
The changes do not seem decisive to me.
What am I doing wrong?

One thing I see missing from your post are the hardware specifications, including particularly the NICs (ethernet network controller type(s)) for your Opnsense box or instance.

It's my understanding that not all NICs are equally-well engineered at the hardware level, nor are they all equally-well-supported by freeBSD at the driver level.

Not a NIC or an Opnsense guru but it'd be a useful piece of additional info.

I suggest installing the "HWprobe" plugin (just 2 MB disk space, installed, does not run as a service, only on demand.)

NOTE: Using this plugin submits a well-anonymized complete hardware profile of your Opnsense system to a public web server. See examples here: https://bsd-hardware.info/?view=timeline before you run it on your machine, so you understand what is gathered and what will be posted.  Sensitive info like IP addresses, network device MAC addresses, storage device serial #s, etc. are removed from the results automatically.

System --> Firmware --> Plugins -->
Type "os-hw-probe" in the search field, then click the "+" to install the plugin.

Once installed, reload your browser window, and go to:
Services --> HW Probe --> Generate

In about 5 or 10 seconds, without any prompts or status messages, it will generate and publish your machine's anonymized hardware profile. Here's one of mine, for example:

https://bsd-hardware.info/?probe=2741a6da81

After you carefully review your HWprobe results URL to be satisfied that it in fact is anonymized (no IP addresses, domain names, usernames, hashes, etc. in it), consider sharing it in your problem-report/request posts.