1
19.7 Legacy Series / Transparent Proxy: Squid crashe and sonewconn: Listen queue overflow problem
« on: December 26, 2019, 08:36:26 am »
Hello everyone,
We have started using opnsense in our office few months ago. We have 10 people and couple of servers in our LAN. Lots of firewall rules but nothing too fancy or challenging for hardware. Hardware is a powerful modern rack server with lots of cores and 16 gb of RAM. It is barebone installation (without VM). We have lots of VLANs and firewall rules to control the traffic. Outgoing NAT is also heavily ruled without automatic rule generation (manual mode).
I have setup a transparent http/https proxy with filters + antivirus. It was working fine for few weeks. However the clients had started to lose internet couple of minutes after opnsense reset. It got worse and we decided to disable it.
Now after few months for the incident, we have decided that we need a filtering proxy and I have dived in into the problem.
It looks like squid stops accepting new connections and then listen queues of squid ports overflows.
When I look into the logs I have seen some entries like this,
Then I tried to restart squid but it took long time and did not finished restarted. Then I logged in to opnsense through ssh and tried to stop squid by using service command.
I had to kill squid by
After that I restarted squid, I looked into the netstat and I saw lots of ESTABLISHED connections dangling.
Then after a while those connections become CLOSED, CLOSE_WAIT.
I cleared out the log files and restarted the opnsense. When I looked into Squid Access logs I saw that
Access Logs
I saw that there is a io problem and segment violation problem in the logs. When I fscked disk I had recived;
Looks like I'm in deep trouble. Any help would be really appreciated.
Thanks in advance and have a nice day.
We have started using opnsense in our office few months ago. We have 10 people and couple of servers in our LAN. Lots of firewall rules but nothing too fancy or challenging for hardware. Hardware is a powerful modern rack server with lots of cores and 16 gb of RAM. It is barebone installation (without VM). We have lots of VLANs and firewall rules to control the traffic. Outgoing NAT is also heavily ruled without automatic rule generation (manual mode).
I have setup a transparent http/https proxy with filters + antivirus. It was working fine for few weeks. However the clients had started to lose internet couple of minutes after opnsense reset. It got worse and we decided to disable it.
Now after few months for the incident, we have decided that we need a filtering proxy and I have dived in into the problem.
It looks like squid stops accepting new connections and then listen queues of squid ports overflows.
When I look into the logs I have seen some entries like this,
Code: [Select]
Dec 26 09:26:09 kernel: sonewconn: pcb 0xfffff801744e7cb0: Listen queue overflow: 193 already in queue awaiting acceptance (82 occurrences)
Dec 26 09:25:06 kernel: sonewconn: pcb 0xfffff801744e7cb0: Listen queue overflow: 193 already in queue awaiting acceptance (93 occurrences)
Dec 26 09:24:05 kernel: sonewconn: pcb 0xfffff801744e7cb0: Listen queue overflow: 193 already in queue awaiting acceptance (48 occurrences)
Code: [Select]
root@router:~ # netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address
tcp4 0/0/128 192.168.1.1.3128
tcp4 0/0/128 192.168.50.1.3128
tcp4 0/0/128 192.168.60.1.3128
tcp4 0/0/128 192.168.20.1.3128
tcp4 0/0/128 192.168.70.1.3128
tcp4 0/0/128 192.168.0.11.3128
tcp6 0/0/128 ::1.3129
tcp4 193/0/128 127.0.0.1.3129
tcp6 0/0/128 ::1.3128
tcp4 40/0/128 127.0.0.1.3128
tcp6 0/0/128 *.9000
tcp4 0/0/128 *.9000
tcp4 0/0/128 *.8000
tcp4 0/0/128 127.0.0.1.8999
tcp6 0/0/128 *.1344
tcp4 0/0/128 127.0.0.1.3310
tcp4 0/0/128 127.0.0.1.953
tcp4 0/0/128 127.0.0.1.53
tcp4 0/0/128 192.168.50.1.53
tcp4 0/0/128 192.168.1.1.53
tcp4 0/0/128 192.168.20.1.53
tcp4 0/0/128 192.168.0.11.53
tcp6 0/0/128 ::1.80
tcp4 0/0/128 127.0.0.1.80
tcp4 0/0/128 192.168.1.1.80
tcp4 0/0/128 192.168.0.11.80
tcp6 0/0/128 ::1.443
tcp4 0/0/128 127.0.0.1.443
tcp4 0/0/128 192.168.1.1.443
tcp4 0/0/128 192.168.0.11.443
tcp4 0/0/128 192.168.0.11.22
tcp4 0/0/128 127.0.0.1.22
tcp6 0/0/128 ::1.22
unix 0/0/128 /var/run/clamav/clamd.sock
unix 0/0/128 /var/db/syslog-ng.ctl
unix 0/0/5 /var/run/dpinger_IAN_GW.sock
unix 0/0/128 /tmp/php-fastcgi.socket-1
unix 0/0/128 /tmp/php-fastcgi.socket-0
unix 0/0/4 /var/run/devd.pipe
unix 0/0/30 /var/run/configd.socket
unix 0/0/4 /var/run/devd.seqpacket.pipe
Then I tried to restart squid but it took long time and did not finished restarted. Then I logged in to opnsense through ssh and tried to stop squid by using service command.
Code: [Select]
root@router:~ # service squid stop
Performing sanity check on squid configuration.
2019/12/26 09:41:09| WARNING OVERRIDE: Capping startup=5 to the defined maximum (2)
Configuration for squid passes.
Stopping squid.
Waiting for PIDS: 2637
I had to kill squid by
Code: [Select]
kill -9.
After that I restarted squid, I looked into the netstat and I saw lots of ESTABLISHED connections dangling.
Code: [Select]
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 517 0 localhost.3129 192.168.50.11.52834 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.60.10.61437 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.50.11.52833 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.60.10.61436 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.60.10.61435 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.50.11.52832 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.50.11.52831 ESTABLISHED
tcp4 577 0 localhost.3129 192.168.60.10.61434 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.60.10.61433 ESTABLISHED
tcp4 219 0 localhost.3129 192.168.60.12.64834 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.60.10.61432 ESTABLISHED
tcp4 592 0 localhost.3129 192.168.50.11.52830 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.50.11.52829 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.50.11.52828 ESTABLISHED
tcp4 0 0 localhost.3129 192.168.60.12.64833 ESTABLISHED
tcp4 587 0 localhost.3129 192.168.50.11.52827 ESTABLISHED
tcp4 517 0 localhost.3129 192.168.60.12.64832 ESTABLISHED
...
<+200 lots of Connections >
Then after a while those connections become CLOSED, CLOSE_WAIT.
Code: [Select]
tcp4 517 0 localhost.3129 192.168.60.10.61464 CLOSE_WAIT
tcp4 217 0 localhost.3129 192.168.60.12.64856 CLOSED
tcp4 517 0 localhost.3129 192.168.50.11.52851 CLOSE_WAIT
tcp4 519 0 localhost.3129 192.168.60.10.61462 CLOSE_WAIT
tcp4 519 0 localhost.3129 192.168.60.10.61463 CLOSE_WAIT
tcp4 519 0 localhost.3129 192.168.60.10.61460 CLOSE_WAIT
tcp4 519 0 localhost.3129 192.168.60.10.61459 CLOSE_WAIT
tcp4 519 0 localhost.3129 192.168.60.10.61461 CLOSE_WAIT
tcp4 519 0 localhost.3129 192.168.60.10.61458 CLOSE_WAIT
tcp4 219 0 localhost.3129 192.168.60.12.64855 CLOSED
tcp4 562 0 localhost.3129 192.168.60.12.64854 CLOSE_WAIT
tcp4 562 0 localhost.3129 192.168.60.12.64853 CLOSE_WAIT
tcp4 204 0 localhost.3129 192.168.50.12.50278 CLOSED
tcp4 517 0 localhost.3129 192.168.60.10.61457 CLOSE_WAIT
tcp4 517 0 localhost.3129 192.168.50.12.50277 CLOSE_WAIT
...
<+200 lots of Connections >
I cleared out the log files and restarted the opnsense. When I looked into Squid Access logs I saw that
Code: [Select]
2019/12/26 10:23:15 pinger: ICMPv6 socket opened
2019/12/26 10:23:15 pinger: ICMP socket opened.
2019/12/26 10:23:15 pinger: Initialising ICMP pinger ...
Page faults with physical i/o: 10
Maximum Resident Size: 568160 KB
CPU Usage: 1.245 seconds = 1.229 user + 0.016 sys
2019/12/26 10:23:15 kid1| Squid Cache (Version 4.9): Terminated abnormally.
2019/12/26 10:23:15 kid1| FATAL: The /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB helpers are crashing too rapidly, need help!
2019/12/26 10:23:15 kid1| Took 0.00 seconds ( 0.00 entries/sec).
2019/12/26 10:23:15 kid1| Finished. Wrote 0 entries.
2019/12/26 10:23:15 kid1| storeDirWriteCleanLogs: Starting...
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.1.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.50.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.60.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.20.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.70.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.0.11:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port [::1]:3129
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 127.0.0.1:3129
2019/12/26 10:23:15 kid1| Closing HTTP(S) port [::1]:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 127.0.0.1:3128
2019/12/26 10:23:15 kid1| Too few /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB processes are running (need 1/2)
2019/12/26 10:23:15 kid1| WARNING: /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB #Hlpr1 exited
2019/12/26 10:23:15 kid1| store_swap_size = 0.00 KB
2019/12/26 10:23:15 kid1| Validated 0 Entries
2019/12/26 10:23:15 kid1| Completed Validation Procedure
2019/12/26 10:23:15 kid1| Beginning Validation Procedure
2019/12/26 10:23:15 kid1| Took 0.17 seconds ( 0.00 objects/sec).
2019/12/26 10:23:15 kid1| 0 Swapfile clashes avoided.
2019/12/26 10:23:15 kid1| 0 Duplicate URLs purged.
2019/12/26 10:23:15 kid1| 0 Objects cancelled.
2019/12/26 10:23:15 kid1| 0 Objects expired.
2019/12/26 10:23:15 kid1| 0 Objects loaded.
2019/12/26 10:23:15 kid1| 0 With invalid flags.
2019/12/26 10:23:15 kid1| 0 Invalid entries.
2019/12/26 10:23:15 kid1| 0 Entries scanned
2019/12/26 10:23:15 kid1| Finished rebuilding storage from disk.
2019/12/26 10:23:15 kid1| Store rebuilding is 0.00% complete
2019/12/26 10:23:15 kid1| Done reading /var/squid/cache swaplog (0 entries)
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.1.1:3128 remote=[::] FD 31 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.50.1:3128 remote=[::] FD 30 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.60.1:3128 remote=[::] FD 29 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.20.1:3128 remote=[::] FD 28 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.70.1:3128 remote=[::] FD 27 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.0.11:3128 remote=[::] FD 26 flags=9
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTPS Socket connections at local=[::1]:3129 remote=[::] FD 25 flags=41
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTPS Socket connections at local=127.0.0.1:3129 remote=[::] FD 24 flags=41
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTP Socket connections at local=[::1]:3128 remote=[::] FD 23 flags=41
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTP Socket connections at local=127.0.0.1:3128 remote=[::] FD 22 flags=41
Access Logs
Code: [Select]
Dec 26 10:23:15 (squid-1): FATAL: Received Segment Violation...dying.
Dec 26 10:23:15 (squid-1): FATAL: The /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB helpers are crashing too rapidly, need help!
I saw that there is a io problem and segment violation problem in the logs. When I fscked disk I had recived;
Code: [Select]
root@router:~ # fsck
** /dev/ufs/OPNsense (NO WRITE)
** Last Mounted on /mnt
** Root file system
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=67896705 (379520 should be 379456)
CORRECT? no
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? no
SUMMARY INFORMATION BAD
SALVAGE? no
BLK(S) MISSING IN BIT MAPS
SALVAGE? no
55995 files, 782543 used, 233716500 free (8028 frags, 29213559 blocks, 0.0% fragmentation)
Looks like I'm in deep trouble. Any help would be really appreciated.
Thanks in advance and have a nice day.