Transparent Proxy: Squid crashe and sonewconn: Listen queue overflow problem

Started by Orcun, December 26, 2019, 08:36:26 AM

Previous topic - Next topic
Hello everyone,

We have started using opnsense in our office few months ago. We have 10 people and couple of servers in our LAN. Lots of firewall rules but nothing too fancy or challenging for hardware. Hardware is a powerful modern rack server with lots of cores and 16 gb of RAM. It is barebone installation (without VM). We have lots of VLANs and firewall rules to control the traffic. Outgoing NAT is also heavily ruled without automatic rule generation (manual mode).

I have setup a transparent http/https proxy with filters + antivirus. It was working fine for few weeks. However the clients had started to lose internet couple of minutes after opnsense reset. It got worse and we decided to disable it.

Now after few months for the incident, we have decided that we need a filtering proxy and I have dived in into the problem.

It looks like squid stops accepting new connections and then listen queues of squid ports overflows.

When I look into the logs I have seen some entries like this,

Dec 26 09:26:09 kernel: sonewconn: pcb 0xfffff801744e7cb0: Listen queue overflow: 193 already in queue awaiting acceptance (82 occurrences)
Dec 26 09:25:06 kernel: sonewconn: pcb 0xfffff801744e7cb0: Listen queue overflow: 193 already in queue awaiting acceptance (93 occurrences)
Dec 26 09:24:05 kernel: sonewconn: pcb 0xfffff801744e7cb0: Listen queue overflow: 193 already in queue awaiting acceptance (48 occurrences)



root@router:~ # netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen                           Local Address
tcp4  0/0/128                          192.168.1.1.3128
tcp4  0/0/128                          192.168.50.1.3128
tcp4  0/0/128                          192.168.60.1.3128
tcp4  0/0/128                          192.168.20.1.3128
tcp4  0/0/128                          192.168.70.1.3128
tcp4  0/0/128                          192.168.0.11.3128
tcp6  0/0/128                          ::1.3129
tcp4  193/0/128                        127.0.0.1.3129
tcp6  0/0/128                          ::1.3128
tcp4  40/0/128                         127.0.0.1.3128
tcp6  0/0/128                          *.9000
tcp4  0/0/128                          *.9000
tcp4  0/0/128                          *.8000
tcp4  0/0/128                          127.0.0.1.8999
tcp6  0/0/128                          *.1344
tcp4  0/0/128                          127.0.0.1.3310
tcp4  0/0/128                          127.0.0.1.953
tcp4  0/0/128                          127.0.0.1.53
tcp4  0/0/128                          192.168.50.1.53
tcp4  0/0/128                          192.168.1.1.53
tcp4  0/0/128                          192.168.20.1.53
tcp4  0/0/128                          192.168.0.11.53
tcp6  0/0/128                          ::1.80
tcp4  0/0/128                          127.0.0.1.80
tcp4  0/0/128                          192.168.1.1.80
tcp4  0/0/128                          192.168.0.11.80
tcp6  0/0/128                          ::1.443
tcp4  0/0/128                          127.0.0.1.443
tcp4  0/0/128                          192.168.1.1.443
tcp4  0/0/128                          192.168.0.11.443
tcp4  0/0/128                          192.168.0.11.22
tcp4  0/0/128                          127.0.0.1.22
tcp6  0/0/128                          ::1.22
unix  0/0/128                          /var/run/clamav/clamd.sock
unix  0/0/128                          /var/db/syslog-ng.ctl
unix  0/0/5                            /var/run/dpinger_IAN_GW.sock
unix  0/0/128                          /tmp/php-fastcgi.socket-1
unix  0/0/128                          /tmp/php-fastcgi.socket-0
unix  0/0/4                            /var/run/devd.pipe
unix  0/0/30                           /var/run/configd.socket
unix  0/0/4                            /var/run/devd.seqpacket.pipe


Then I tried to restart squid but it took long time and did not finished restarted. Then I logged in to opnsense through ssh and tried to stop squid by using service command.

root@router:~ # service squid stop
Performing sanity check on squid configuration.
2019/12/26 09:41:09| WARNING OVERRIDE: Capping startup=5 to the defined maximum (2)
Configuration for squid passes.
Stopping squid.
Waiting for PIDS: 2637


I had to kill squid by

kill -9.


After that I restarted squid, I looked into the netstat and I saw lots of ESTABLISHED connections dangling.

Active Internet connections
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4     517      0 localhost.3129         192.168.50.11.52834    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.60.10.61437    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.50.11.52833    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.60.10.61436    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.60.10.61435    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.50.11.52832    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.50.11.52831    ESTABLISHED
tcp4     577      0 localhost.3129         192.168.60.10.61434    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.60.10.61433    ESTABLISHED
tcp4     219      0 localhost.3129         192.168.60.12.64834    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.60.10.61432    ESTABLISHED
tcp4     592      0 localhost.3129         192.168.50.11.52830    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.50.11.52829    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.50.11.52828    ESTABLISHED
tcp4       0      0 localhost.3129         192.168.60.12.64833    ESTABLISHED
tcp4     587      0 localhost.3129         192.168.50.11.52827    ESTABLISHED
tcp4     517      0 localhost.3129         192.168.60.12.64832    ESTABLISHED
...
<+200 lots of Connections >


Then after a while those connections become CLOSED, CLOSE_WAIT.

tcp4     517      0 localhost.3129         192.168.60.10.61464    CLOSE_WAIT
tcp4     217      0 localhost.3129         192.168.60.12.64856    CLOSED
tcp4     517      0 localhost.3129         192.168.50.11.52851    CLOSE_WAIT
tcp4     519      0 localhost.3129         192.168.60.10.61462    CLOSE_WAIT
tcp4     519      0 localhost.3129         192.168.60.10.61463    CLOSE_WAIT
tcp4     519      0 localhost.3129         192.168.60.10.61460    CLOSE_WAIT
tcp4     519      0 localhost.3129         192.168.60.10.61459    CLOSE_WAIT
tcp4     519      0 localhost.3129         192.168.60.10.61461    CLOSE_WAIT
tcp4     519      0 localhost.3129         192.168.60.10.61458    CLOSE_WAIT
tcp4     219      0 localhost.3129         192.168.60.12.64855    CLOSED
tcp4     562      0 localhost.3129         192.168.60.12.64854    CLOSE_WAIT
tcp4     562      0 localhost.3129         192.168.60.12.64853    CLOSE_WAIT
tcp4     204      0 localhost.3129         192.168.50.12.50278    CLOSED
tcp4     517      0 localhost.3129         192.168.60.10.61457    CLOSE_WAIT
tcp4     517      0 localhost.3129         192.168.50.12.50277    CLOSE_WAIT
...
<+200 lots of Connections >


I cleared out the log files and restarted the opnsense. When I looked into Squid Access logs I saw that

2019/12/26 10:23:15 pinger: ICMPv6 socket opened
2019/12/26 10:23:15 pinger: ICMP socket opened.
2019/12/26 10:23:15 pinger: Initialising ICMP pinger ...
Page faults with physical i/o: 10
Maximum Resident Size: 568160 KB
CPU Usage: 1.245 seconds = 1.229 user + 0.016 sys
2019/12/26 10:23:15 kid1| Squid Cache (Version 4.9): Terminated abnormally.
2019/12/26 10:23:15 kid1| FATAL: The /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB helpers are crashing too rapidly, need help!
2019/12/26 10:23:15 kid1| Took 0.00 seconds ( 0.00 entries/sec).
2019/12/26 10:23:15 kid1| Finished. Wrote 0 entries.
2019/12/26 10:23:15 kid1| storeDirWriteCleanLogs: Starting...
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.1.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.50.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.60.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.20.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.70.1:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 192.168.0.11:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port [::1]:3129
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 127.0.0.1:3129
2019/12/26 10:23:15 kid1| Closing HTTP(S) port [::1]:3128
2019/12/26 10:23:15 kid1| Closing HTTP(S) port 127.0.0.1:3128
2019/12/26 10:23:15 kid1| Too few /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB processes are running (need 1/2)
2019/12/26 10:23:15 kid1| WARNING: /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB #Hlpr1 exited
2019/12/26 10:23:15 kid1| store_swap_size = 0.00 KB
2019/12/26 10:23:15 kid1| Validated 0 Entries
2019/12/26 10:23:15 kid1| Completed Validation Procedure
2019/12/26 10:23:15 kid1| Beginning Validation Procedure
2019/12/26 10:23:15 kid1| Took 0.17 seconds ( 0.00 objects/sec).
2019/12/26 10:23:15 kid1| 0 Swapfile clashes avoided.
2019/12/26 10:23:15 kid1| 0 Duplicate URLs purged.
2019/12/26 10:23:15 kid1| 0 Objects cancelled.
2019/12/26 10:23:15 kid1| 0 Objects expired.
2019/12/26 10:23:15 kid1| 0 Objects loaded.
2019/12/26 10:23:15 kid1| 0 With invalid flags.
2019/12/26 10:23:15 kid1| 0 Invalid entries.
2019/12/26 10:23:15 kid1| 0 Entries scanned
2019/12/26 10:23:15 kid1| Finished rebuilding storage from disk.
2019/12/26 10:23:15 kid1| Store rebuilding is 0.00% complete
2019/12/26 10:23:15 kid1| Done reading /var/squid/cache swaplog (0 entries)
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.1.1:3128 remote=[::] FD 31 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.50.1:3128 remote=[::] FD 30 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.60.1:3128 remote=[::] FD 29 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.20.1:3128 remote=[::] FD 28 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.70.1:3128 remote=[::] FD 27 flags=9
2019/12/26 10:23:15 kid1| Accepting SSL bumped HTTP Socket connections at local=192.168.0.11:3128 remote=[::] FD 26 flags=9
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTPS Socket connections at local=[::1]:3129 remote=[::] FD 25 flags=41
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTPS Socket connections at local=127.0.0.1:3129 remote=[::] FD 24 flags=41
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTP Socket connections at local=[::1]:3128 remote=[::] FD 23 flags=41
2019/12/26 10:23:15 kid1| Accepting NAT intercepted SSL bumped HTTP Socket connections at local=127.0.0.1:3128 remote=[::] FD 22 flags=41


Access Logs

Dec 26 10:23:15 (squid-1): FATAL: Received Segment Violation...dying.
Dec 26 10:23:15 (squid-1): FATAL: The /usr/local/libexec/squid/security_file_certgen -s /var/squid/ssl_crtd -M 1024MB helpers are crashing too rapidly, need help!


I saw that there is a io problem and segment violation problem in the logs. When I fscked disk I had recived;

root@router:~ # fsck
** /dev/ufs/OPNsense (NO WRITE)
** Last Mounted on /mnt
** Root file system
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=67896705 (379520 should be 379456)
CORRECT? no

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? no

SUMMARY INFORMATION BAD
SALVAGE? no

BLK(S) MISSING IN BIT MAPS
SALVAGE? no

55995 files, 782543 used, 233716500 free (8028 frags, 29213559 blocks, 0.0% fragmentation)


Looks like I'm in deep trouble. :D Any help would be really appreciated.

Thanks in advance and have a nice day.