Issue with I340-T4 Intel 82580 Network controller

Started by abel408, March 21, 2017, 07:54:50 PM

Previous topic - Next topic
Quote from: brononius on May 22, 2017, 09:40:13 AM
My understanding of a trunk: multiple VLANs through 1 cable/port...
No, its a bundling of physical links. A link aggregation.
VLAN's doesn't have trunks. They are virtual network segments that use the same physical segment.
The term "VLAN trunk" is used by Cisco and means that the port/trunk accepts tagged packages for different VLANs.

Quote from: brononius on May 22, 2017, 09:40:13 AM
And for proxmox:
root@proxmoxus:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface ens1f0 inet manual

iface ens1f1 inet manual

auto bond0
iface bond0 inet manual
slaves ens1f0
bond_miimon 100
bond_mode 802.3ad

Only one interface for a bond, aka trunk ;) makes no sense.
Consider using either both interfaces in a bond or a single interface without bonding.
Have a look at https://pve.proxmox.com/wiki/Network_Model

In your case I'd configure 2 ports as layer 4 load balanced "Ether Channel" with a "VLAN trunk" on the Switch.
Then on the Proxmox host add ens1f1 to the bond slaves and change the bond_mode to balance-xor.

And consider using virtio interfaces instead of E1000.

Quote from: faunsen on May 22, 2017, 10:27:06 AM
In your case I'd configure 2 ports as layer 4 load balanced "Ether Channel" with a "VLAN trunk" on the Switch.
Then on the Proxmox host add ens1f1 to the bond slaves and change the bond_mode to balance-xor.
And consider using virtio interfaces instead of E1000.

The idea was that in the future, I would configure both server interfaces in bonding. For the moment, I don't have a spare port on my switch. I need to free up my old vmware server before I can do this.
Nevetheless is the idea of bonding more to have an extra connection in case the first one fails. So it should work also with 1 interface, no?

But can this explain why opnsense just shutdown 1 vlan?

Yes it should work with one interface. But I'd use the active-backup mode then.

I have no explanation why only vlan 1 goes down.
But I bet the problem comes from the high ambient temperature.
70 degree Celsius are by far to much for a disk and other things like network interface cards  ;)

Quote from: faunsen on May 19, 2017, 05:19:49 PM
When the problem occurs again, before you reboot the OPNsense VM please do a
netstat -m
netstat -s
sysctl dev.em
etc.

Had the issue again when today when I came home. Seems it was down since a couple of houres.
And again, it was the same virtual interface? The others seems to stay stable. So if it's caused by the temperature, it's a hardware issue, and the other interfaces should also suffer, no?

Some debugs (i've removed lines with '0' since else the topic-reply was to long :$ ):
netstat -m
1028/6817/7845 mbufs in use (current/cache/total)
1027/3549/4576/379234 mbuf clusters in use (current/cache/total/max)
1027/3527 mbuf+clusters out of packet secondary zone in use (current/cache)
0/62/62/189617 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/56182 9k jumbo clusters in use (current/cache/total/max)
0/0/0/31602 16k jumbo clusters in use (current/cache/total/max)
2311K/9050K/11361K bytes allocated to network (current/cache/total)


netstat -s
tcp:
        22994 packets sent
                18533 data packets (25117361 bytes)
                25 data packets (33645 bytes) retransmitted
                1 data packet unnecessarily retransmitted
                4237 ack-only packets (0 delayed)
                1 window update packet
                198 control packets
        21234 packets received
                17209 acks (for 25106463 bytes)
                56 duplicate acks
                3947 packets (2856419 bytes) received in-sequence
                10 completely duplicate packets (1452 bytes)
                8 out-of-order packets (10903 bytes)
        21 connection requests
        236 connection accepts
        8 ignored RSTs in the windows
        257 connections established (including accepts)
        16451 connections closed (including 1 drop)
                238 connections updated cached RTT on close
                238 connections updated cached RTT variance on close
                4 connections updated cached ssthresh on close
        17122 segments updated rtt (of 7597 attempts)
        1 retransmit timeout
        3523 correct ACK header predictions
        3096 correct data packet header predictions
        236 syncache entries added
                236 completed
        236 cookies sent
        19 hostcache entries added
                0 bucket overflow
        1 SACK recovery episode
        24 segment rexmits in SACK recovery episodes
        33606 byte rexmits in SACK recovery episodes
        124 SACK options (SACK blocks) received
        6 SACK options (SACK blocks) sent
TCP connection count by state:
        6 connections in LISTEN state
        1 connection  in ESTABLISHED state
udp:
        1793582 datagrams received
        5953 with no checksum
        109794 dropped due to no socket
        358136 broadcast/multicast datagrams undelivered
        1325652 delivered
        1437215 datagrams output
ip:
        1654426635 total packets received
        9 with data size < data length
        1825071 packets for this host
        1651951898 packets forwarded (0 packets fast forwarded)
        57445 packets not forwardable
        1526352 packets sent from this host
        280 packets sent with fabricated ip header
icmp:
        56056 calls to icmp_error
        Output histogram:
                echo reply: 10073
                destination unreachable: 55855
                time exceeded: 201
        Input histogram:
                echo reply: 3
                destination unreachable: 178
                echo: 10073
        10073 message responses generated
        ICMP address mask responses are disabled
arp:
        111092 ARP requests sent
        63047 ARP replies sent
        111111 ARP requests received
        5343 ARP replies received
        117077 ARP packets received
        114881 total packets dropped due to no ARP entry
        21749 ARP entrys timed out
        0 Duplicate IPs seen
ip6:
        3936 total packets received
        37 packets sent from this host
        19 output packets discarded due to no route
        Input histogram:
                UDP: 68
                ICMP6: 3868
        Mbuf statistics:
                0 one mbuf
                3936 one ext mbuf
                0 two or more ext mbuf



sysctl dev.em

dev.em.3.mac_stats.tso_ctx_fail: 0
dev.em.3.mac_stats.tso_txd: 0
dev.em.3.mac_stats.tx_frames_1024_1522: 20922846
dev.em.3.mac_stats.tx_frames_512_1023: 132219
dev.em.3.mac_stats.tx_frames_256_511: 262934
dev.em.3.mac_stats.tx_frames_128_255: 408116
dev.em.3.mac_stats.tx_frames_65_127: 58215365
dev.em.3.mac_stats.tx_frames_64: 46
dev.em.3.mac_stats.mcast_pkts_txd: 4
dev.em.3.mac_stats.bcast_pkts_txd: 2515
dev.em.3.mac_stats.good_pkts_txd: 85927270
dev.em.3.mac_stats.total_pkts_txd: 85949019
dev.em.3.mac_stats.good_octets_txd: 36226734433
dev.em.3.mac_stats.good_octets_recvd: 247276921066
dev.em.3.mac_stats.rx_frames_1024_1522: 162410701
dev.em.3.mac_stats.rx_frames_512_1023: 300853
dev.em.3.mac_stats.rx_frames_256_511: 389427
dev.em.3.mac_stats.rx_frames_128_255: 469147
dev.em.3.mac_stats.rx_frames_65_127: 13040706
dev.em.3.mac_stats.rx_frames_64: 1807739
dev.em.3.mac_stats.mcast_pkts_recvd: 7515
dev.em.3.mac_stats.bcast_pkts_recvd: 754
dev.em.3.mac_stats.good_pkts_recvd: 178361149
dev.em.3.mac_stats.total_pkts_recvd: 178418573
dev.em.3.rxd_tail: 143
dev.em.3.rxd_head: 144
dev.em.3.txd_tail: 82
dev.em.3.txd_head: 82
dev.em.3.fifo_reset: 0
dev.em.3.fc_low_water: 45604
dev.em.3.fc_high_water: 47104
dev.em.3.rx_control: 32770
dev.em.3.device_control: 1075053120
dev.em.3.rx_processing_limit: 100
dev.em.3.itr: 488
dev.em.3.tx_abs_int_delay: 66
dev.em.3.rx_abs_int_delay: 66
dev.em.3.tx_int_delay: 66
dev.em.3.rx_int_delay: 0
dev.em.3.nvm: -1
dev.em.3.%parent: pci0
dev.em.3.%pnpinfo: vendor=0x8086 device=0x100e subvendor=0x1af4 subdevice=0x1100 class=0x020000
dev.em.3.%location: slot=21 function=0 dbsf=pci0:0:21:0 handle=\_SB_.PCI0.SA8_
dev.em.3.%driver: em
dev.em.3.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.1.0
dev.em.2.mac_stats.tso_ctx_fail: 0
dev.em.2.mac_stats.tso_txd: 0
dev.em.2.mac_stats.tx_frames_1024_1522: 0
dev.em.2.mac_stats.tx_frames_512_1023: 0
dev.em.2.mac_stats.tx_frames_256_511: 0
dev.em.2.mac_stats.tx_frames_128_255: 2
dev.em.2.mac_stats.tx_frames_65_127: 4
dev.em.2.mac_stats.tx_frames_64: 0
dev.em.2.mac_stats.mcast_pkts_txd: 6
dev.em.2.mac_stats.bcast_pkts_txd: 1
dev.em.2.mac_stats.good_pkts_txd: 7
dev.em.2.mac_stats.total_pkts_txd: 7
dev.em.2.mac_stats.good_octets_txd: 686
dev.em.2.mac_stats.good_octets_recvd: 0
dev.em.2.mac_stats.rx_frames_1024_1522: 0
dev.em.2.mac_stats.rx_frames_512_1023: 0
dev.em.2.mac_stats.rx_frames_256_511: 0
dev.em.2.mac_stats.rx_frames_128_255: 0
dev.em.2.mac_stats.rx_frames_65_127: 0
dev.em.2.mac_stats.rx_frames_64: 0
dev.em.2.rxd_tail: 255
dev.em.2.txd_tail: 25
dev.em.2.txd_head: 25
dev.em.2.fc_low_water: 45604
dev.em.2.fc_high_water: 47104
dev.em.2.rx_control: 32770
dev.em.2.device_control: 1075053120
dev.em.2.flow_control: 3
dev.em.2.rx_processing_limit: 100
dev.em.2.itr: 488
dev.em.2.tx_abs_int_delay: 66
dev.em.2.rx_abs_int_delay: 66
dev.em.2.tx_int_delay: 66
dev.em.2.nvm: -1
dev.em.2.%parent: pci0
dev.em.2.%pnpinfo: vendor=0x8086 device=0x100e subvendor=0x1af4 subdevice=0x1100 class=0x020000
dev.em.2.%location: slot=20 function=0 dbsf=pci0:0:20:0 handle=\_SB_.PCI0.SA0_
dev.em.2.%driver: em
dev.em.2.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.1.0
dev.em.1.mac_stats.tso_ctx_fail: 0
dev.em.1.mac_stats.tso_txd: 0
dev.em.1.mac_stats.tx_frames_1024_1522: 807177625
dev.em.1.mac_stats.tx_frames_512_1023: 61499407
dev.em.1.mac_stats.tx_frames_256_511: 10051401
dev.em.1.mac_stats.tx_frames_128_255: 10380894
dev.em.1.mac_stats.tx_frames_65_127: 168395840
dev.em.1.mac_stats.tx_frames_64: 124
dev.em.1.mac_stats.mcast_pkts_txd: 5
dev.em.1.mac_stats.bcast_pkts_txd: 30975
dev.em.1.mac_stats.good_pkts_txd: 1058410184
dev.em.1.mac_stats.total_pkts_txd: 1058578569
dev.em.1.mac_stats.good_octets_txd: 1274381690472
dev.em.1.mac_stats.good_octets_recvd: 210095791832
dev.em.1.mac_stats.rx_frames_1024_1522: 114080043
dev.em.1.mac_stats.rx_frames_512_1023: 2387087
dev.em.1.mac_stats.rx_frames_256_511: 3081951
dev.em.1.mac_stats.rx_frames_128_255: 21197508
dev.em.1.mac_stats.rx_frames_65_127: 433582216
dev.em.1.mac_stats.rx_frames_64: 5409878
dev.em.1.mac_stats.mcast_pkts_recvd: 3
dev.em.1.mac_stats.bcast_pkts_recvd: 5567
dev.em.1.mac_stats.good_pkts_recvd: 579625210
dev.em.1.mac_stats.total_pkts_recvd: 579738683
dev.em.1.mac_stats.recv_undersize: 823828
dev.em.1.rxd_tail: 59
dev.em.1.rxd_head: 59
dev.em.1.txd_tail: 185
dev.em.1.txd_head: 185
dev.em.1.fc_low_water: 45604
dev.em.1.fc_high_water: 47104
dev.em.1.rx_control: 32770
dev.em.1.device_control: 1075053120
dev.em.1.flow_control: 3
dev.em.1.rx_processing_limit: 100
dev.em.1.itr: 488
dev.em.1.tx_abs_int_delay: 66
dev.em.1.rx_abs_int_delay: 66
dev.em.1.tx_int_delay: 66
dev.em.1.rx_int_delay: 0
dev.em.1.nvm: -1
dev.em.1.%parent: pci0
dev.em.1.%pnpinfo: vendor=0x8086 device=0x100e subvendor=0x1af4 subdevice=0x1100 class=0x020000
dev.em.1.%location: slot=19 function=0 dbsf=pci0:0:19:0 handle=\_SB_.PCI0.S98_
dev.em.1.%driver: em
dev.em.1.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.1.0
dev.em.0.mac_stats.tso_ctx_fail: 0
dev.em.0.mac_stats.tso_txd: 0
dev.em.0.mac_stats.tx_frames_1024_1522: 101369123
dev.em.0.mac_stats.tx_frames_512_1023: 2562105
dev.em.0.mac_stats.tx_frames_256_511: 3160012
dev.em.0.mac_stats.tx_frames_128_255: 21456364
dev.em.0.mac_stats.tx_frames_65_127: 380160410
dev.em.0.mac_stats.tx_frames_64: 57
dev.em.0.mac_stats.mcast_pkts_txd: 1
dev.em.0.mac_stats.bcast_pkts_txd: 77649
dev.em.0.mac_stats.good_pkts_txd: 508954398
dev.em.0.mac_stats.total_pkts_txd: 509045855
dev.em.0.mac_stats.good_octets_txd: 184144261120
dev.em.0.mac_stats.good_octets_recvd: 1044119307954
dev.em.0.mac_stats.rx_frames_1024_1522: 653001254
dev.em.0.mac_stats.rx_frames_512_1023: 61543647
dev.em.0.mac_stats.rx_frames_256_511: 10548611
dev.em.0.mac_stats.rx_frames_128_255: 10566114
dev.em.0.mac_stats.rx_frames_65_127: 160011699
dev.em.0.mac_stats.rx_frames_64: 699332
dev.em.0.mac_stats.mcast_pkts_recvd: 96
dev.em.0.mac_stats.bcast_pkts_recvd: 413446
dev.em.0.mac_stats.good_pkts_recvd: 896238495
dev.em.0.mac_stats.total_pkts_recvd: 896370657
dev.em.0.rxd_tail: 227
dev.em.0.rxd_head: 228
dev.em.0.txd_tail: 250
dev.em.0.txd_head: 250
dev.em.0.fc_low_water: 45604
dev.em.0.fc_high_water: 47104
dev.em.0.rx_control: 32770
dev.em.0.device_control: 1075053120
dev.em.0.flow_control: 3
dev.em.0.rx_processing_limit: 100
dev.em.0.itr: 488
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_int_delay: 66
dev.em.0.nvm: -1
dev.em.0.%parent: pci0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x100e subvendor=0x1af4 subdevice=0x1100 class=0x020000
dev.em.0.%location: slot=18 function=0 dbsf=pci0:0:18:0 handle=\_SB_.PCI0.S90_
dev.em.0.%driver: em
dev.em.0.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.1.0
dev.em.%parent:


ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC>
        ether 96:7e:28:e9:06:1c
        inet6 fe80::947e:28ff:fee9:61c%em0 prefixlen 64 scopeid 0x1
        inet 192.168.111.254 netmask 0xffffff00 broadcast 192.168.111.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet 1000baseT (1000baseT <full-duplex>)
        status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC>
        ether 1e:2a:a7:75:12:28
        inet6 fe80::1c2a:a7ff:fe75:1228%em1 prefixlen 64 scopeid 0x2
        inet 192.168.222.254 netmask 0xffffff00 broadcast 192.168.222.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC>
        ether fa:13:81:16:cb:08
        inet6 fe80::f813:81ff:fe16:cb08%em2 prefixlen 64 scopeid 0x3
        inet 192.168.1.254 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet 1000baseT (1000baseT <full-duplex>)
        status: active
em3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC>
        ether 22:98:02:e4:30:b6
        inet6 fe80::2098:2ff:fee4:30b6%em3 prefixlen 64 scopeid 0x4
        inet 192.168.0.226 netmask 0xffffff00 broadcast 192.168.0.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet 1000baseT (1000baseT <full-duplex>)
        status: active
enc0: flags=0<> metric 0 mtu 1536
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: enc
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
pflog0: flags=100<PROMISC> metric 0 mtu 33160
        groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
        groups: pfsync
        syncpeer: 0.0.0.0 maxupd: 128 defer: off
ovpns1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        inet6 fe80::4c76:46ba:6695:f5bc%ovpns1 prefixlen 64 scopeid 0x9
        inet 192.168.112.1 --> 192.168.112.2  netmask 0xffffffff
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: tun openvpn
        Opened by PID 57520
em0_vlan666: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 96:7e:28:e9:06:1c
        inet6 fe80::947e:28ff:fee9:61c%em0_vlan666 prefixlen 64 scopeid 0xa
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet 1000baseT (1000baseT <full-duplex>)
        status: active
        vlan: 666 vlanpcp: 0 parent interface: em0
        groups: vlan
em1_vlan777: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 1e:2a:a7:75:12:28
        inet6 fe80::1c2a:a7ff:fe75:1228%em1_vlan777 prefixlen 64 scopeid 0xb
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 777 vlanpcp: 0 parent interface: em1
        groups: vlan
em2_vlan888: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether fa:13:81:16:cb:08
        inet6 fe80::f813:81ff:fe16:cb08%em2_vlan888 prefixlen 64 scopeid 0xc
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet 1000baseT (1000baseT <full-duplex>)
        status: active
        vlan: 888 vlanpcp: 0 parent interface: em2
        groups: vlan
em3_vlan999: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 22:98:02:e4:30:b6
        inet6 fe80::2098:2ff:fee4:30b6%em3_vlan999 prefixlen 64 scopeid 0xd
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet 1000baseT (1000baseT <full-duplex>)
        status: active
        vlan: 999 vlanpcp: 0 parent interface: em3
        groups: vlan


To solve it, I just did a shut, no-shut of the failing virtual interface...

ping 192.168.222.22
PING 192.168.222.22 (192.168.222.22): 56 data bytes
ping: sendto: Host is down
ping: sendto: Host is down


ifconfig em1 down

ifconfig em1 up

ping 192.168.222.22
PING 192.168.222.22 (192.168.222.22): 56 data bytes
64 bytes from 192.168.222.22: icmp_seq=0 ttl=64 time=1.279 ms
64 bytes from 192.168.222.22: icmp_seq=1 ttl=64 time=0.883 ms


So guess I could write a small script to check and re-enable the interface. But of course, would be better to solve the issue instead of bypassing it. So if anybody has an idea where it goes wrong...



Hi brononius,

nothing suspicious to see here. Not even much traffic.
Have you disabled the 'Energy Efficient Ethernet' setting yet?
If not add hw.em.eee_setting="0" to your /boot/loader.conf.local and reboot.


Regards,
Frank

Quote from: faunsen on May 30, 2017, 11:22:44 AM
Have you disabled the 'Energy Efficient Ethernet' setting yet?

I've tried that, but gave me the same results.

For now, I've migrated the proxmox/opnsense to other hardware, and since 1 week, it's stable.
So somewhere, at first sight, it was a link between the cisco hardware blade and proxmox/opnsense.