Good evening everyone.
Im a little bit stumped by this issue.
Since Monday 17th at 13.00 (GMT) i have had degraded upload speeds. (see below for rudimentary table, taken from log export in speedtest widget, timestamps are in "epoch" format https://www.epochconverter.com/ )
I have a 900mbit FTTP service, and use mimugmail's speed test widget (https://github.com/mimugmail/opn-repo) to monitor speeds. every hour, on the hour, from within OPN. this is not an asyncronous connection its "900 down/900 up" (FTR i can regularly speed test at >900 down/ >800 up before this issue arose)
My network HW is ONT > HP Procurve switch > Dell Power Connect switch > Proxmox Server > OPNsense VM
Proxmox and OPNsense fully up-to-date with latest "stable" releases. neither are of the "enterprise" variety.
The issue has spanned OPNsense 24.1.8 into OPNsense 24.1.9 (release date for .9, i believe, was 18/06)
I have confirmed the following... Shut down all other VMs/CTs Within Proxmox, leaving ONLY OPN VM running. this resulted in no change to the poor upload speeds... iperf3 at any step in above chain results in maximum 1g throughput. when running iperf from ONT side to Proxmox Server i saw a max of 1300 retries but the throughput was there.
i further tested speeds with my ISP router at every stage that there was an ethernet connection... this resulted in >900down, >800up
Given the combination of these 2 results i think i can safely say my line and HW are good.
i talked with ISP, they informed me that they do not apply throttling to the line, regardless of usage or attached router. i have no reason to not believe them at this stage as the testing appears to confirm that the line speed is obtainable, at least with their kit.
My suspicion is now firmly with OPN. No settings were changed in relation to WAN, i did set a port in NAT Port Forward towards my reverse proxy, but thats it, since testing i have disabled this PF and confirmed the attached floating rule has also been disabled.
im not really sure of what to check/adjust. any help greatly appreciated and warmly recieved.
Timestamp DlSpeed UlSpeed Latency1718600429 908.81 406 12.66
1718604028 843.66 361.76 15.37
1718607627 842.27 255.55 12.28
1718611225 899.97 266.33 9.04
1718614824 878.21 298.25 10.83
1718618436 737.92 277.4 12.54
1718622031 843.74 356.48 11.87
1718625635 837.8 404.11 14.78
1718629224 825 140.19 13.621718762425 909.74 22.65 12.45
1718766017 873.95 14.52 9.37
1718769621 841.62 15.98 10.1
1718773228 851.58 15.68 10.72
1718776830 904.69 25.16 14.81
1718780419 877.29 15.86 9.58
1718784027 798.76 7.31 16.35
1718787630 790.9 25.49 13.92
1718791227 487.07 21.18 12.8
1718794827 890.61 20.5 9.54
Speed test widget reports 3341 probes (and counting)
Avg Down:- 809.15 Mbps (min: 14.78 Mbps, max: 939.75 Mbps)
Avg Up:- 308.96 Mbps (min: 0.53 Mbps, max: 827.16 Mbps)
* "Avg Up" is skewed slightly due to multiple <30mbit results over the past couple of days, but only by around 50Mbps
################
Fixed just as quickly as it arose with "no intervention"
################
Quote from: d3dl3g on July 01, 2024, 02:00:38 PM
After the 57th (/s) time of turning it off turning it back on again... it works!!!???
no clue as to what solved it unless the ISP needs time to figure out the link has gone down and "resets" something.
Had been disconnected for a little over 30 mins.
Id started to play with openwrt as a replacement/test, maybe it was that that, could have put a digital rocket up its backside...
Now have +900 down/+900 up
Since this is not a bare metal configuration: Did you verify that Proxmox is not the culprit? There have been reported problems with some newer 6.8.4 kernels introduced by PVE 8.2.2 which have lead to the update to 6.8.8-1, but many support forum posts suggest installing 6.5 and pinning it:
apt-get update
apt install -y pve-kernel-6.5
proxmox-boot-tool kernel pin 6.5.
reboot now
Thought I would add my experience seeing as I have a PPPoE connection (1000Mbit down/500Mbit up), also use OPNsense on Proxmox, kernel is v6.8.8-1 although I have an upgrade to 6.8.8-2 waiting on a reboot, plus my WAN NIC is passed as a device, whereas the LAN is a Proxmox bridge device. My speeds always vary but they're pretty much what I would expect, and did not change much, actually improved when I went from bare-metal to VM.
I also do not use the OPNsense widget nor the Speedtest website for measurements - I installed the Ookla Freebsd command-line tester and use it via a script that tests against 5 different providers I choose manually. I have often found that the one it auto chooses can return very bad results or even be in another country when all the others look good. I revisit the 5 whenever I see large or many drops in speed tests. Oh, and I also have a local LibreSpeed VM on my LAN so I can measure the connection there - it's always good.
I use Zabbix to collect the results and my graph current looks like this:
(https://forum.opnsense.org/index.php?action=dlattach;topic=41283.0;attach=35842;image)
I cannot fault the performance that both OPNsense and Proxmox are providing me, and having it as a VM at least gives me peace of mind that I can recover from problematic updates. Let me know if you want any more information.
Quote from: meyergru on June 27, 2024, 08:22:53 AM
Since this is not a bare metal configuration: Did you verify that Proxmox is not the culprit? There have been reported problems with some newer 6.8.4 kernels introduced by PVE 8.2.2 which have lead to the update to 6.8.8-1, but many support forum posts suggest installing 6.5 and pinning it....
do you have link for further reading? worth exploring if its right for my setup due to other pve services before copy pasting your given code :)
Quote from: Taomyn on June 27, 2024, 09:08:47 AM
Thought I would add my experience seeing as I have a PPPoE connection (1000Mbit down/500Mbit up), also use OPNsense on Proxmox, kernel is v6.8.8-1 although I have an upgrade to 6.8.8-2 waiting on a reboot, plus my WAN NIC is passed as a device, whereas the LAN is a Proxmox bridge device. My speeds always vary but they're pretty much what I would expect, and did not change much, actually improved when I went from bare-metal to VM...
all my NICs (Onboard or PCIe) are passed via bridge. in my particular case WAN is plugged into Onboard, LAN is plugged into PCIe. i must admit all of my testing has been PCIe side. and not directly through Onboard NIC. *however* i migrated to my 2nd pve and i still see the slow upload. which lends itself to "not a hardware fault"
im struggling to understand what "changed" at that particular time. it seems too far away from a pve update to make sense (to me) my pve updates at 04.00. so 9+ hours before a speed drop after an update "feels" wrong, id have expected it to show almost instantly.
i do agree that if it is kernel or update driven then it would apply to both my PVEs
Quote from: bartjsmit on June 27, 2024, 08:16:08 AM
Beware of the shotgun: http://catb.org/jargon/html/S/shotgun-debugging.html
Reduce your config down to the absolute minimum - bare metal, NAT only, with default rules. Then add features one by one until you see a drop in speed.
Bart...
not wanting to shotgun, hence my reason for being here ;)
as stated i reduced CTs and VMs, i may very well spool up bare metal, just to test your suggestion. dont really wanna do it on my live build.
Quote from: Patrick M. Hausen on June 27, 2024, 11:03:45 PM
VirtIO network interfaces in the OPNsense VM?
Try to set this tunable and reboot: hw.vtnet.csum_disable=1
Onboard = vmbr0 (vtnet1) "WAN"
PCIe card= vmbr1 (vtnet0) "LAN"
Both accessible via OPN, and set "correctly" in Interfaces > Assignments, yes the swapping of 0 and 1 is correct, it was an oversight on my part when setting up Prox and OPN. one day ill change it so they match. i need the mrs and kids out of the house to do that though
Tunable applied
Download speed 894.31 Mbps
Upload speed 13.03 Mbps
After the 57th (/s) time of turning it off turning it back on again... it works!!!???
no clue as to what solved it unless the ISP needs time to figure out the link has gone down and "resets" something.
Had been disconnected for a little over 30 mins.
Id started to play with openwrt as a replacement/test, maybe it was that that, could have put a digital rocket up its backside...
Now have +900 down/+900 up