OPNsense Forum

Archive => 17.1 Legacy Series => Topic started by: faunsen on May 09, 2017, 10:57:11 am

Title: [SOLVED] Packet loss
Post by: faunsen on May 09, 2017, 10:57:11 am
Hi,

as soon as I put a little bit more load on my firewall cluster it looses packets and the TCP connections get closed.
The nodes are ProLiant DL380 G7 with 32GB RAM, two Quad-Core Xeons X5660 and three Quad-Port Intel 82580 NICs. So I assume the hardware is not the problem. It has link aggregation with loadbalance mode on all interfaces.
The system is not under stress. It has approx. 10k sessions. 1% CPU load. Lots of mbufs, no errors, no drops neither on the NICs nor on the switch ports.

At some indefinite point the firewall looses packets.
The trouble starts after acknowledging number 291137. The database server sends packages until the TCP window gets full. But these packages didn't reach the other site as well as the ACK's from the webserver didn't reach the database. And after retransmission timed out the connection is reset from the database server.

The traces were made on the firewall. I've made them on the physical and the lagg interfaces with no difference.

Any ideas where to look further?
And why do I see ICMP packages from the firewall on this TCP connection?


Many thanks
Frank

lagg0 - 192.168.19.0/24
Code: [Select]
330 299.939233  172.16.6.69 -> 192.168.19.4   TCP 54 55353 > ms-sql-s [ACK] Seq=12642 Ack=283137 Win=45312 Len=0
331 299.939238   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
332 299.939252  172.16.6.69 -> 192.168.19.4   TCP 54 55353 > ms-sql-s [ACK] Seq=12642 Ack=291137 Win=37376 Len=0
333 299.939397   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
334 299.939572   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0 (Not last buffer)
335 299.939576   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
336 299.939579   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0 (Not last buffer)
337 299.939582   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0 (Not last buffer)
338 299.939585   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0 (Not last buffer)
339 299.939588   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0 (Not last buffer)
340 299.939591   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
341 299.939595   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
342 299.939599   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0 (Not last buffer)
343 299.939602   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
344 299.939605   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
345 299.939608   192.168.19.4 -> 172.16.6.69  TCP 1514 ms-sql-s > 55353 [PSH, ACK] Seq=324657 Ack=12642 Win=65536 Len=1460
346 299.939610   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
347 300.239719   192.168.19.4 -> 172.16.6.69  TCP 1514 [TCP Retransmission] ms-sql-s > 55353 [ACK] Seq=291137 Ack=12642 Win=65536 Len=1460
348 300.239743    192.168.19.31 -> 192.168.19.4   ICMP 82 Destination unreachable (Host unreachable)
349 300.838833   192.168.19.4 -> 172.16.6.69  TCP 1514 [TCP Retransmission] ms-sql-s > 55353 [ACK] Seq=291137 Ack=12642 Win=65536 Len=1460
350 300.838859    192.168.19.31 -> 192.168.19.4   ICMP 82 Destination unreachable (Host unreachable)
351 302.041479   192.168.19.4 -> 172.16.6.69  TCP 1514 [TCP Retransmission] ms-sql-s > 55353 [ACK] Seq=291137 Ack=12642 Win=65536 Len=1460
352 302.041502    192.168.19.31 -> 192.168.19.4   ICMP 82 Destination unreachable (Host unreachable)
353 304.438934   192.168.19.4 -> 172.16.6.69  TCP 1514 [TCP Retransmission] ms-sql-s > 55353 [ACK] Seq=291137 Ack=12642 Win=65536 Len=1460
354 304.438957    192.168.19.31 -> 192.168.19.4   ICMP 82 Destination unreachable (Host unreachable)
355 309.239126   192.168.19.4 -> 172.16.6.69  TCP 1514 [TCP Retransmission] ms-sql-s > 55353 [ACK] Seq=291137 Ack=12642 Win=65536 Len=1460
356 309.239148    192.168.19.31 -> 192.168.19.4   ICMP 82 Destination unreachable (Host unreachable)
357 318.839481   192.168.19.4 -> 172.16.6.69  TCP 60 ms-sql-s > 55353 [RST, ACK] Seq=292597 Ack=12642 Win=0 Len=0
358 329.939143  172.16.6.69 -> 192.168.19.4   TCP 55 [TCP Keep-Alive] [TCP Window Full] 55353 > ms-sql-s [ACK] Seq=12641 Ack=307137 Win=131328 Len=1
359 329.939261   192.168.19.4 -> 172.16.6.69  TCP 60 ms-sql-s > 55353 [RST] Seq=307137 Win=0 Len=0

lagg1 - 172.16.6.0/24
Code: [Select]
329 299.939251  172.16.6.69 -> 192.168.19.4   TCP 60 55353 > ms-sql-s [ACK] Seq=12642 Ack=283137 Win=45312 Len=0
330 299.939261   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
331 299.939271  172.16.6.69 -> 192.168.19.4   TCP 60 55353 > ms-sql-s [ACK] Seq=12642 Ack=291137 Win=37376 Len=0
332 299.939273   192.168.19.4 -> 172.16.6.69  TDS 1514 Unknown Packet Type: 0
333 299.939321  172.16.6.69 -> 192.168.19.4   TCP 60 55353 > ms-sql-s [ACK] Seq=12642 Ack=299137 Win=29440 Len=0
334 299.939492  172.16.6.69 -> 192.168.19.4   TCP 60 55353 > ms-sql-s [ACK] Seq=12642 Ack=307137 Win=21504 Len=0
335 299.939636  172.16.6.69 -> 192.168.19.4   TCP 60 [TCP Window Update] 55353 > ms-sql-s [ACK] Seq=12642 Ack=307137 Win=69376 Len=0
336 299.940190  172.16.6.69 -> 192.168.19.4   TCP 60 [TCP Window Update] 55353 > ms-sql-s [ACK] Seq=12642 Ack=307137 Win=131328 Len=0
337 318.839520   192.168.19.4 -> 172.16.6.69  TCP 54 ms-sql-s > 55353 [RST, ACK] Seq=292597 Ack=12642 Win=0 Len=0
338 329.939156  172.16.6.69 -> 192.168.19.4   TCP 60 [TCP Keep-Alive] [TCP Window Full] 55353 > ms-sql-s [ACK] Seq=12641 Ack=307137 Win=131328 Len=1
339 329.939300   192.168.19.4 -> 172.16.6.69  TCP 54 ms-sql-s > 55353 [RST] Seq=307137 Win=0 Len=0

Title: [SOLVED] Packet loss
Post by: faunsen on May 24, 2017, 09:01:34 am
Solved it by increasing the undocumented igb(4) hw.igb.buf_ring_size setting.
It seems that the HPE NC365T adapter cannot push the packets fast enough out to the wire.
But that's anyone's guess.

If someone runs OPNsense on a ProLiant too here are my settings.
@franco: Could be the first settings for the network card tweak plugin.  :)

/boot/loader.conf.local
Code: [Select]
ipmi_load="YES"
net.link.ifqmaxlen="8192"
hw.igb.buf_ring_size="32768"
hw.igb.max_interrupt_rate="96000"
hw.igb.num_queues="1"
hw.igb.rx_process_limit="4096"
hw.igb.tx_process_limit="4096"
hw.igb.rxd="4096"
hw.igb.txd="4096"
net.pf.states_hashsize="16777216"

System -> Settings -> Tunables
Code: [Select]
kern.ipc.maxsockbuf 8388608
net.inet.tcp.sendbuf_max 16777216
net.inet.tcp.recvbuf_max 16777216
net.inet.tcp.sendspace 131072
net.inet.tcp.recvspace 131072
net.inet.tcp.sendbuf_inc 32768
net.inet.tcp.recvbuf_inc 65536
kern.ipc.soacceptqueue 1024

Interfaces -> Settings
Code: [Select]
uncheck 'Disable hardware CRC, TSO and LRO'
Title: Re: [SOLVED] Packet loss
Post by: weust on May 24, 2017, 09:30:22 am
The HPE NC365T is a add-on card iirc? So probably not just Proliant related?
Title: Re: [SOLVED] Packet loss
Post by: faunsen on May 24, 2017, 10:25:14 am
Correct.
These settings are the result of many tries I've made until I've got the most stability and performance.
Feel free and test it out on other hardware ;)
Title: Re: [SOLVED] Packet loss
Post by: weust on May 24, 2017, 10:58:08 am
No doubt others will. Good find.

I have, or have had, the 364T. Still have some dual ports at home.
Don't use them anymore when I went SFP+ (home usage, because I can).