1
22.7 Legacy Series / Poor network performance under very specific circumstances
« on: November 10, 2022, 03:00:10 pm »
Hi, so I've been trying to better track down this problem for months at this point, but because it is extremely intermittent and unpredictable, it's also been difficult for me. Basically, about 95% of the time, my network under OPNsense is performing great. Even with several people in the house surfing the web, streaming etc. in my household.
So basically it's that 5% of the time where things are iffy. For a bit of background, my primary WAN connection is a VDSL2 connection with 100 Mbps of downstream bandwidth, and 20 Mbps of upstream bandwidth. I also have a backup LTE link (similar speeds to the primary link), but for this topic I'm more focused on my primary WAN link.
I think I'm at the point where I'm reliably able to reproduce the problem, and once such case where it appears to be reproducible is when file downloads are happening from Akamai's CDN servers. In essence, if a file is downloaded from their CDN, it causes a considerable latency increase while the file download is in progress, and it beomces bad enough that I even start to see some TCP packet loss, and not just ICMP loss. This all seems to occur in spite of my downloads speeds to Akamai's network not reaching much past 24 Mbps - so in spite of the packet loss and latency increase over idle - I am only using about 25% of my pipe's bandwidth downstream. This is where I'm so confused.
I should mention I already do have AQM and fair queuing rules in place so that everyone connected to my network has a good quality of experience. I first noticed these significant latency increases over idle when I was still using fq_codel as my preferred AQM algorithm, but even with my current setup it's not any better (I switched away from fq_codel thinking I may have hit a quirk relating to that algorithm)
What's even stranger to me is that outside of these edge cases, if someone on the network hammers the downstream with say, a Steam download, other people using the network still have a good experience (I say this as I know too well how heavily Steam can easily cripple a typical home network without any AQM and fair queuing). Most other applications that download lots of data are also fine outside of this.
The heavy latency increases when downloading from Akamai just don't make sense me to me at all, even with the high latency I seem to get to their CDN:
Of course most high-latency connections that transfer a lot of data don't seem to do the same to my network, either. So I'm really stumped here. Seeing over 40ms of jitter in this extreme case isn't doing anyone on my network anyfavour who may be in the middle of a VoIP call, for instance.
Edit: should probably mention I'm also running OPNsense version 22.7.7_1 (amd64/OpenSSL). Hardware is a Qotom Q710G4: Intel Celeron J3455E, with 4 x Intel Gigabit NICs, and a 250GB Crucial MX500 SSD.
So basically it's that 5% of the time where things are iffy. For a bit of background, my primary WAN connection is a VDSL2 connection with 100 Mbps of downstream bandwidth, and 20 Mbps of upstream bandwidth. I also have a backup LTE link (similar speeds to the primary link), but for this topic I'm more focused on my primary WAN link.
I think I'm at the point where I'm reliably able to reproduce the problem, and once such case where it appears to be reproducible is when file downloads are happening from Akamai's CDN servers. In essence, if a file is downloaded from their CDN, it causes a considerable latency increase while the file download is in progress, and it beomces bad enough that I even start to see some TCP packet loss, and not just ICMP loss. This all seems to occur in spite of my downloads speeds to Akamai's network not reaching much past 24 Mbps - so in spite of the packet loss and latency increase over idle - I am only using about 25% of my pipe's bandwidth downstream. This is where I'm so confused.
I should mention I already do have AQM and fair queuing rules in place so that everyone connected to my network has a good quality of experience. I first noticed these significant latency increases over idle when I was still using fq_codel as my preferred AQM algorithm, but even with my current setup it's not any better (I switched away from fq_codel thinking I may have hit a quirk relating to that algorithm)
What's even stranger to me is that outside of these edge cases, if someone on the network hammers the downstream with say, a Steam download, other people using the network still have a good experience (I say this as I know too well how heavily Steam can easily cripple a typical home network without any AQM and fair queuing). Most other applications that download lots of data are also fine outside of this.
The heavy latency increases when downloading from Akamai just don't make sense me to me at all, even with the high latency I seem to get to their CDN:
Code: [Select]
traceroute www.crucial.com
traceroute to www.crucial.com (104.84.168.83), 30 hops max, 60 byte packets
1 <redacted> (10.115.101.1) 0.394 ms 0.322 ms 0.306 ms
2 203-219-198-51.tpgi.com.au (203.219.198.51) 8.212 ms 8.083 ms 8.295 ms
3 per-apt-stg-crt1-be100.tpgi.com.au (203.219.57.129) 54.064 ms 54.287 ms 54.156 ms
4 syd-gls-har-crt1-Hu-0-5-0-1.tpg.com.au (202.7.162.133) 55.579 ms 55.604 ms 55.952 ms
5 203-221-3-82.tpgi.com.au (203.221.3.82) 54.010 ms 203-221-3-18.tpgi.com.au (203.221.3.18) 55.510 ms 55.518 ms
6 203-219-106-90.tpgi.com.au (203.219.106.90) 54.171 ms 53.791 ms 53.985 ms
7 ae5.r02.syd01.icn.netarch.akamai.com (23.56.128.38) 55.726 ms 55.216 ms 55.234 ms
8 ae3.r02.per01.icn.netarch.akamai.com (23.214.115.22) 92.233 ms 92.233 ms 92.216 ms
9 ae0.r01.per01.icn.netarch.akamai.com (23.214.115.16) 93.727 ms 93.613 ms 93.992 ms
10 ae4.r02.sin01.icn.netarch.akamai.com (23.214.115.20) 101.614 ms 101.476 ms 100.087 ms
11 ae7.r02.sin02.icn.netarch.akamai.com (23.215.54.215) 101.332 ms 101.198 ms 101.308 ms
12 ae6.r02.hkg02.icn.netarch.akamai.com (23.215.54.138) 193.061 ms 193.147 ms 193.395 ms
13 ae4.r01.hkg03.icn.netarch.akamai.com (23.215.54.153) 133.912 ms 133.269 ms 133.080 ms
14 ae11.r01.hkg03.ien.netarch.akamai.com (23.56.143.43) 154.087 ms ae13.r02.hkg03.ien.netarch.akamai.com (23.56.143.47) 133.296 ms 133.327 ms
15 ae10.cmignc-hkg.netarch.akamai.com (23.56.143.193) 197.989 ms 197.864 ms 197.974 ms
16 * a104-84-168-83.deploy.static.akamaitechnologies.com (104.84.168.83) 191.860 ms 191.969 ms
Of course most high-latency connections that transfer a lot of data don't seem to do the same to my network, either. So I'm really stumped here. Seeing over 40ms of jitter in this extreme case isn't doing anyone on my network anyfavour who may be in the middle of a VoIP call, for instance.
Edit: should probably mention I'm also running OPNsense version 22.7.7_1 (amd64/OpenSSL). Hardware is a Qotom Q710G4: Intel Celeron J3455E, with 4 x Intel Gigabit NICs, and a 250GB Crucial MX500 SSD.