We are currently experiencing every couple of days a slow down in the internet speed. Users report about websites (e.g. linkedin, paypall, atlasian/jira, word press uploads, MS365 webmail, ...) not loading or loading only very slowly. It seems, that not all websites/connections(/users?) are affected.
I do have an smokeping instance running, where I can see an increase from ~200ms to ~4.5sec for a curl call of google.de. At the same time a ping with fping to google stays at ~5ms (see attachments).
During the the affected times, I can see lots of following messages in the squid cache log:
2023-05-18T11:37:03 squid kid1| conn690249 local=162.125.21.3:443 remote=10.63.12.82:53169 FD 323 flags=33: read/write failure: (60) Operation timed out
2023-05-18T11:34:01 squid kid1| conn681351 local=13.88.181.35:443 remote=10.63.13.68:59116 FD 300 flags=33: read/write failure: (60) Operation timed out
2023-05-18T11:33:41 squid kid1| conn467660 local=40.113.103.199:443 remote=10.63.13.68:49576 FD 618 flags=33: read/write failure: (60) Operation timed out
2023-05-18T11:29:37 squid kid1| conn678654 local=35.174.127.31:443 remote=10.63.12.145:58955 FD 785 flags=33: read/write failure: (60) Operation timed out
2023-05-18T11:21:10 squid kid1| conn366702 local=40.115.3.253:443 remote=10.63.19.98:58979 FD 97 flags=33: read/write failure: (60) Operation timed out
2023-05-18T11:04:35 squid kid1| conn637485 local=157.240.253.13:443 remote=10.63.14.104:62702 FD 863 flags=33: read/write failure: (60) Operation timed out
2023-05-18T11:04:05 squid kid1| conn637480 local=157.240.253.13:443 remote=10.63.14.104:62701 FD 829 flags=33: read/write failure: (60) Operation timed out
2023-05-18T11:01:06 squid kid1| conn550254 local=40.115.3.253:443 remote=10.63.10.18:60849 FD 113 flags=33: read/write failure: (60) Operation timed out
2023-05-18T10:52:20 squid kid1| conn578770 local=40.113.110.67:443 remote=10.63.19.18:65516 FD 495 flags=33: read/write failure: (60) Operation timed out
2023-05-18T10:45:37 squid kid1| conn610397 local=40.113.110.67:443 remote=10.63.75.20:51635 FD 633 flags=33: read/write failure: (60) Operation timed out
2023-05-18T10:43:31 squid kid1| conn610286 local=162.125.21.3:443 remote=10.63.75.20:51616 FD 978 flags=33: read/write failure: (60) Operation timed out
2023-05-18T10:43:26 squid kid1| conn610347 local=162.125.21.3:443 remote=10.63.19.133:49822 FD 1061 flags=33: read/write failure: (60) Operation timed out
2023-05-18T10:42:59 squid kid1| conn598172 local=172.217.18.10:443 remote=10.63.19.133:49264 FD 820 flags=33: read/write failure: (60) Operation timed out
as well as some
2023-05-18T11:29:32 squid kid1| conn682058 local=147.160.187.240:443 remote=10.63.14.104:64056 FD 776 flags=33: read/write failure: (32) Broken pipe
2023-05-18T10:37:14 squid kid1| conn604023 local=52.222.214.43:443 remote=10.63.16.66:51656 FD 633 flags=33: read/write failure: (32) Broken pipe
2023-05-18T09:52:11 squid kid1| conn550127 local=139.177.229.129:443 remote=10.63.15.99:34151 FD 440 flags=33: read/write failure: (32) Broken pipe
2023-05-18T07:19:34 squid kid1| conn440033 local=142.250.186.46:443 remote=10.63.11.30:34628 FD 94 flags=33: read/write failure: (32) Broken pipe
2023-05-18T04:21:08 squid kid1| conn358437 local=139.177.229.129:443 remote=10.63.15.117:62679 FD 64 flags=33: read/write failure: (32) Broken pipe
2023-05-18T00:43:12 squid kid1| conn239322 local=146.75.118.248:443 remote=10.63.12.147:63957 FD 316 flags=33: read/write failure: (32) Broken pipe
It seems to start out of the blue and ends perceived randomly. Once rebooting the OPNsense machine helped, another time rebooting it twice did not help but later another restart seemed to have brought it back to normal.
OPNsense 23.1.7_3-amd64
FreeBSD 13.1-RELEASE-p7
OpenSSL 1.1.1t 7 Feb 2023
with following services:
acme
cicap
clamd
configd
cron
dhcpd
dnsmasq
flowd_aggregate
freshclam
ipfw
login
monit
nginx
ntpd
openssh
openvpn
pf
postfix
redis
routing
rspamd
samplicate
squid
suricata
sysctl
syslog-ng
webgui
edit: squid SSL interception only with SNI (no MITM).
Any idea how to debug the issue? Thx! Simon
*push* anyone having an idea? :)
I have had the same problem for some time, and also the same temporary solution. I thought it had something to with cache, memory, logs etc. But i think I solved it by turning on PowerD. All settings are set to HiAdaptive. It was turned off by default.
It has been running for 2 weeks now without any issues, no reboots. Something which wasn't possible before. I hope this helps.
Just an idea, because I saw symptoms like these: Did you try lowering your MTU or do MSS clamping?
If you use VLANs, PPPoE or anything else that limits ord reduces your real MTU, you will experience packet loss with sites that cannot do correct PMTU discovery along the way to some sites. Such sites will then be much slower, because eventually, the dropped packets are being corrected.
Thanks for the replies!
Quote from: meyergru on May 24, 2023, 08:32:57 PM
Just an idea, because I saw symptoms like these: Did you try lowering your MTU or do MSS clamping?
If you use VLANs, PPPoE or anything else that limits ord reduces your real MTU, you will experience packet loss with sites that cannot do correct PMTU discovery along the way to some sites. Such sites will then be much slower, because eventually, the dropped packets are being corrected.
Gateways are indeed PPPoE and internally all interfaces use VLANs. The assigned/default MTU is 1492. I once lowered the MTU and MRU in the settings to 1450, which led to a connection breakdown (but reasons for the breakage might be elsewhere as well).
In the meantime we noticed, that suricata was running in IDS and IPS mode, which is apparently neither supported for VLANs nor for PPPoE. One hypothesis was, that suricata might have messed up with the packages as well (leading to MTU issues). We deactivated suricata and will observe it for a couple of weeks to see if the problem persists.
Quote from: LesterCLL on May 24, 2023, 06:49:59 PM
I have had the same problem for some time, and also the same temporary solution. I thought it had something to with cache, memory, logs etc. But i think I solved it by turning on PowerD. All settings are set to HiAdaptive. It was turned off by default.
It has been running for 2 weeks now without any issues, no reboots. Something which wasn't possible before. I hope this helps.
It is turned off over here as well. I would not have thought of it as a reason but by now I will not rule out anything... I'll keep it in the back of my mind. If disabling suricata will not be the solution and another attempt of lowering the MTU will also not help, I'll for sure try it out.
In any case, I'll report back here. Thx! Simon