PPPoE Slow Upload [SOLVED]

Started by d3dl3g, June 27, 2024, 12:16:55 AM

Previous topic - Next topic
June 27, 2024, 12:16:55 AM Last Edit: July 01, 2024, 02:02:59 PM by d3dl3g
Good evening everyone.

Im a little bit stumped by this issue.
Since Monday 17th at 13.00 (GMT) i have had degraded upload speeds. (see below for rudimentary table, taken from log export in speedtest widget, timestamps are in "epoch" format https://www.epochconverter.com/ )

I have a 900mbit FTTP service, and use mimugmail's speed test widget (https://github.com/mimugmail/opn-repo) to monitor speeds. every hour, on the hour, from within OPN. this is not an asyncronous connection its "900 down/900 up" (FTR i can regularly speed test at >900 down/ >800 up before this issue arose)

My network HW  is ONT > HP Procurve switch > Dell Power Connect switch > Proxmox Server > OPNsense VM

Proxmox and OPNsense fully up-to-date with latest "stable" releases. neither are of the "enterprise" variety.
The issue has spanned OPNsense 24.1.8 into OPNsense 24.1.9 (release date for .9, i believe, was 18/06)

I have confirmed the following... Shut down all other VMs/CTs Within Proxmox, leaving ONLY OPN VM running. this resulted in no change to the poor upload speeds...  iperf3 at any step in above chain results in maximum 1g throughput. when running iperf from ONT side to Proxmox Server i saw a max of 1300 retries but the throughput was there.
i further tested speeds with my ISP router at every stage that there was an ethernet connection... this resulted in >900down, >800up
Given the combination of these 2 results i think i can safely say my line and HW are good.
i talked with ISP, they informed me that they do not apply throttling to the line, regardless of usage or attached router. i have no reason to not believe them at this stage as the testing appears to confirm that the line speed is obtainable, at least with their kit.

My suspicion is now firmly with OPN. No settings were changed in relation to WAN, i did set a port in NAT Port Forward towards my reverse proxy, but thats it, since testing i have disabled this PF and confirmed the attached floating rule has also been disabled.

im not really sure of what to check/adjust. any help greatly appreciated and warmly recieved.

Timestamp   DlSpeed   UlSpeed   Latency
1718600429   908.81   406   12.66
1718604028   843.66   361.76   15.37
1718607627   842.27   255.55   12.28
1718611225   899.97   266.33   9.04
1718614824   878.21   298.25   10.83
1718618436   737.92   277.4   12.54
1718622031   843.74   356.48   11.87
1718625635   837.8   404.11   14.78
1718629224   825   140.19   13.62
1718762425   909.74   22.65   12.45
1718766017   873.95   14.52   9.37
1718769621   841.62   15.98   10.1
1718773228   851.58   15.68   10.72
1718776830   904.69   25.16   14.81
1718780419   877.29   15.86   9.58
1718784027   798.76   7.31   16.35
1718787630   790.9   25.49   13.92
1718791227   487.07   21.18   12.8
1718794827   890.61   20.5   9.54

Speed test widget reports
3341 probes (and counting)
Avg Down:- 809.15 Mbps (min: 14.78 Mbps, max: 939.75 Mbps)
Avg Up:-  308.96 Mbps (min: 0.53 Mbps, max: 827.16 Mbps)
* "Avg Up" is skewed slightly due to multiple <30mbit results over the past couple of days, but only by around 50Mbps


################
Fixed just as quickly as it arose with "no intervention"
################

Quote from: d3dl3g on July 01, 2024, 02:00:38 PM
After the 57th (/s) time of turning it off turning it back on again... it works!!!???

no clue as to what solved it unless the ISP needs time to figure out the link has gone down and "resets" something.

Had been disconnected for a little over 30 mins.

Id started to play with openwrt as a replacement/test, maybe it was that that, could have put a digital rocket up its backside...

Now have +900 down/+900 up

Beware of the shotgun: http://catb.org/jargon/html/S/shotgun-debugging.html

Reduce your config down to the absolute minimum - bare metal, NAT only, with default rules. Then add features one by one until you see a drop in speed.

Bart...

Since this is not a bare metal configuration: Did you verify that Proxmox is not the culprit? There have been reported problems with some newer 6.8.4 kernels introduced by PVE 8.2.2 which have lead to the update to 6.8.8-1, but many support forum posts suggest installing 6.5 and pinning it:

apt-get update
apt install -y pve-kernel-6.5
proxmox-boot-tool kernel pin 6.5.
reboot now

Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Thought I would add my experience seeing as I have a PPPoE connection (1000Mbit down/500Mbit up), also use OPNsense on Proxmox, kernel is v6.8.8-1 although I have an upgrade to 6.8.8-2 waiting on a reboot, plus my WAN NIC is passed as a device, whereas the LAN is a Proxmox bridge device. My speeds always vary but they're pretty much what I would expect, and did not change much, actually improved when I went from bare-metal to VM.


I also do not use the OPNsense widget nor the Speedtest website for measurements - I installed the Ookla Freebsd command-line tester and use it via a script that tests against 5 different providers I choose manually. I have often found that the one it auto chooses can return very bad results or even be in another country when all the others look good. I revisit the 5 whenever I see large or many drops in speed tests. Oh, and I also have a local LibreSpeed VM on my LAN so I can measure the connection there - it's always good.


I use Zabbix to collect the results and my graph current looks like this:



I cannot fault the performance that both OPNsense and Proxmox are providing me, and having it as a VM at least gives me peace of mind that I can recover from problematic updates. Let me know if you want any more information.

June 27, 2024, 10:14:05 AM #4 Last Edit: June 27, 2024, 10:17:42 AM by d3dl3g
Quote from: meyergru on June 27, 2024, 08:22:53 AM
Since this is not a bare metal configuration: Did you verify that Proxmox is not the culprit? There have been reported problems with some newer 6.8.4 kernels introduced by PVE 8.2.2 which have lead to the update to 6.8.8-1, but many support forum posts suggest installing 6.5 and pinning it....

do you have link for further reading? worth exploring if its right for my setup due to other pve services before copy pasting your given code :)

Quote from: Taomyn on June 27, 2024, 09:08:47 AM
Thought I would add my experience seeing as I have a PPPoE connection (1000Mbit down/500Mbit up), also use OPNsense on Proxmox, kernel is v6.8.8-1 although I have an upgrade to 6.8.8-2 waiting on a reboot, plus my WAN NIC is passed as a device, whereas the LAN is a Proxmox bridge device. My speeds always vary but they're pretty much what I would expect, and did not change much, actually improved when I went from bare-metal to VM...

all my NICs (Onboard or PCIe) are passed via bridge. in my particular case WAN is plugged into Onboard, LAN is plugged into PCIe. i must admit all of my testing has been PCIe side. and not directly through Onboard NIC. *however* i migrated to my 2nd pve and i still see the slow upload.  which lends itself to "not a hardware fault"

im struggling to understand what "changed" at that particular time. it seems too far away from a pve update to make sense (to me) my pve updates at 04.00. so 9+ hours before a speed drop after an update "feels" wrong, id have expected it to show almost instantly.
i do agree that if it is kernel or update driven then it would apply to both my PVEs

Quote from: bartjsmit on June 27, 2024, 08:16:08 AM
Beware of the shotgun: http://catb.org/jargon/html/S/shotgun-debugging.html

Reduce your config down to the absolute minimum - bare metal, NAT only, with default rules. Then add features one by one until you see a drop in speed.

Bart...

not wanting to shotgun, hence my reason for being here ;)
as stated i reduced CTs and VMs, i may very well spool up bare metal, just to test your suggestion. dont really wanna do it on my live build.

Quote from: meyergru on June 27, 2024, 08:22:53 AM
apt-get update
apt install -y pve-kernel-6.5
proxmox-boot-tool kernel pin 6.5.
reboot now


apologies... i am on Proxmox 8.1.3, so not the latest...
current kernel = root@pve:~# uname -r
6.5.11-7-pve

"updated" to most recent 6.5.13-5-pve

907mbit Down... 14mbit Up
:(

VirtIO network interfaces in the OPNsense VM?

Try to set this tunable and reboot: hw.vtnet.csum_disable=1
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

June 27, 2024, 11:24:24 PM #9 Last Edit: June 27, 2024, 11:27:43 PM by d3dl3g
Quote from: Patrick M. Hausen on June 27, 2024, 11:03:45 PM
VirtIO network interfaces in the OPNsense VM?

Try to set this tunable and reboot: hw.vtnet.csum_disable=1

Onboard = vmbr0 (vtnet1) "WAN"
PCIe card= vmbr1 (vtnet0) "LAN"
Both accessible via OPN, and set "correctly" in Interfaces > Assignments, yes the swapping of 0 and 1 is correct, it was an oversight on my part when setting up Prox and OPN. one day ill change it so they match. i need the mrs and kids out of the house to do that though

Tunable applied
   
Download speed    894.31 Mbps
Upload speed    13.03 Mbps



After the 57th (/s) time of turning it off turning it back on again... it works!!!???

no clue as to what solved it unless the ISP needs time to figure out the link has gone down and "resets" something.

Had been disconnected for a little over 30 mins.

Id started to play with openwrt as a replacement/test, maybe it was that that, could have put a digital rocket up its backside...

Now have +900 down/+900 up