WAN link gone sometimes (igb driver, I211 nics), ifconfig d/u fixes it

Started by Werner Fischer, July 17, 2017, 03:54:41 PM

Previous topic - Next topic

Thank you for the hint. I have downloaded the tool (although I'm not sure if the tool should be used with I211-AT chips, as the Intel download site does not list the I211-AT as valid product for this download). In the doc file bootutil.txt I have found this hint regarding -WOLD:


POWER MANAGEMENT OPTIONS:
-WOLENABLE or -WOLE
  Enables Wake On LAN (WOL) functionality on the selected port.
-WOLDISABLE or -WOLD
  Disables Wake On LAN (WOL) functionality on the selected port.


The I211 data sheet - see https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i211-ethernet-controller-datasheet.pdf?asset=9567 - lists 10 different power management features in Table 1-9.

I'm not sure how -WOLD really affects those 10 different power management features. So just in case that you as a user are experiencing link down issues, and you are not sure how you could fix it, ask your hardware vendor if there is a firmware which has the power management deactivated.

Unfortunately, I got now once again the problem :(

With the new BIOS running, I changed the setting "System State after Power Failure" from "Always Off" to "Always On". I have then saved&exited (using the F4 key) and booted OPNsense. After a while, I plugged the power cable, so the system was off. I plugged in power again, and I have noticed during bootup that fsck has been done. After running a few minutes, the network problem was there again:


wfischer@tpw:~$ ssh root@192.168.1.1
Password for root@OPNsense.test.thomas-krenn.com:
Last login: Thu Dec 21 08:43:11 2017 from 192.168.1.100
----------------------------------------------
|      Hello, this is OPNsense 17.7          |         @@@@@@@@@@@@@@@
|                                            |        @@@@         @@@@
| Website: https://opnsense.org/        |         @@@\\\   ///@@@
| Handbook: https://docs.opnsense.org/   |       ))))))))   ((((((((
| Forums: https://forum.opnsense.org/  |         @@@///   \\\@@@
| Lists: https://lists.opnsense.org/  |        @@@@         @@@@
| Code: https://github.com/opnsense  |         @@@@@@@@@@@@@@@
----------------------------------------------

  0) Logout                              7) Ping host
  1) Assign interfaces                   8) Shell
  2) Set interface IP address            9) pfTop
  3) Reset the root password            10) Firewall log
  4) Reset to factory defaults          11) Reload all services
  5) Power off system                   12) Upgrade from console
  6) Reboot system                      13) Restore a backup

Enter an option: 8

root@OPNsense:~ # ifconfig
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
ether 00:30:18:cd:e8:54
inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
inet6 fe80::1:1%igb0 prefixlen 64 scopeid 0x1
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:e8:55
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:80
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb3: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:81
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb4: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:82
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb5: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:83
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb6: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ec:60
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb7: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ec:61
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb8: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ec:62
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb9: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,TXCSUM_IPV6>
ether 00:30:18:cd:ec:63
inet6 fe80::230:18ff:fecd:ec63%igb9 prefixlen 64 scopeid 0xa
inet 10.1.102.55 netmask 0xffffff00 broadcast 10.1.102.255
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
enc0: flags=0<> metric 0 mtu 1536
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: enc
pflog0: flags=100<PROMISC> metric 0 mtu 33160
groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
groups: pfsync
syncpeer: 0.0.0.0 maxupd: 128 defer: off
root@OPNsense:~ # arp -a
? (10.1.102.1) at 4c:5e:0c:4b:23:30 on igb9 expires in 1011 seconds [ethernet]
OPNsense.test.thomas-krenn.com (10.1.102.55) at 00:30:18:cd:ec:63 on igb9 permanent [ethernet]
OPNsense.test.thomas-krenn.com (192.168.1.1) at 00:30:18:cd:e8:54 on igb0 permanent [ethernet]
? (192.168.1.100) at f0:de:f1:f3:17:88 on igb0 expires in 1088 seconds [ethernet]
root@OPNsense:~ # ping 10.1.102.1
PING 10.1.102.1 (10.1.102.1): 56 data bytes
^C
--- 10.1.102.1 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss
root@OPNsense:~ # arp -a
? (10.1.102.1) at 4c:5e:0c:4b:23:30 on igb9 expires in 957 seconds [ethernet]
OPNsense.test.thomas-krenn.com (10.1.102.55) at 00:30:18:cd:ec:63 on igb9 permanent [ethernet]
OPNsense.test.thomas-krenn.com (192.168.1.1) at 00:30:18:cd:e8:54 on igb0 permanent [ethernet]
? (192.168.1.100) at f0:de:f1:f3:17:88 on igb0 expires in 1034 seconds [ethernet]
root@OPNsense:~ # date
Thu Dec 21 13:19:23 UTC 2017
root@OPNsense:~ # freebsd-version -ku
11.0-RELEASE-p17
11.0-RELEASE-p17
root@OPNsense:~ # sysctl hw.igb.num_queues
hw.igb.num_queues: 0
root@OPNsense:~ # sysctl hw.pci.enable_msix
hw.pci.enable_msix: 1
root@OPNsense:~ # sysctl hw.igb.enable_msix
hw.igb.enable_msix: 1
root@OPNsense:~ # ping 10.1.102.1
PING 10.1.102.1 (10.1.102.1): 56 data bytes
^C
--- 10.1.102.1 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss
root@OPNsense:~ # arp -a
? (10.1.102.1) at 4c:5e:0c:4b:23:30 on igb9 expires in 781 seconds [ethernet]
OPNsense.test.thomas-krenn.com (10.1.102.55) at 00:30:18:cd:ec:63 on igb9 permanent [ethernet]
OPNsense.test.thomas-krenn.com (192.168.1.1) at 00:30:18:cd:e8:54 on igb0 permanent [ethernet]
? (192.168.1.100) at f0:de:f1:f3:17:88 on igb0 expires in 858 seconds [ethernet]
root@OPNsense:~ # arp -a && date
? (10.1.102.1) at 4c:5e:0c:4b:23:30 on igb9 expires in 695 seconds [ethernet]
OPNsense.test.thomas-krenn.com (10.1.102.55) at 00:30:18:cd:ec:63 on igb9 permanent [ethernet]
OPNsense.test.thomas-krenn.com (192.168.1.1) at 00:30:18:cd:e8:54 on igb0 permanent [ethernet]
? (192.168.1.100) at f0:de:f1:f3:17:88 on igb0 expires in 772 seconds [ethernet]
Thu Dec 21 13:23:42 UTC 2017
root@OPNsense:~ # ping 10.1.102.1
PING 10.1.102.1 (10.1.102.1): 56 data bytes
^C
--- 10.1.102.1 ping statistics ---
7 packets transmitted, 0 packets received, 100.0% packet loss
root@OPNsense:~ # ifconfig igb9 down
root@OPNsense:~ # ifconfig igb9 up
root@OPNsense:~ # ping 10.1.102.1
PING 10.1.102.1 (10.1.102.1): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host
64 bytes from 10.1.102.1: icmp_seq=2 ttl=64 time=0.316 ms
64 bytes from 10.1.102.1: icmp_seq=3 ttl=64 time=0.298 ms
64 bytes from 10.1.102.1: icmp_seq=4 ttl=64 time=0.377 ms
64 bytes from 10.1.102.1: icmp_seq=5 ttl=64 time=0.294 ms
^C
--- 10.1.102.1 ping statistics ---
6 packets transmitted, 4 packets received, 33.3% packet loss
round-trip min/avg/max/stddev = 0.294/0.321/0.377/0.033 ms
root@OPNsense:~ # arp -a && date
? (10.1.102.1) at 4c:5e:0c:4b:23:30 on igb9 expires in 1198 seconds [ethernet]
OPNsense.test.thomas-krenn.com (10.1.102.55) at 00:30:18:cd:ec:63 on igb9 permanent [ethernet]
OPNsense.test.thomas-krenn.com (192.168.1.1) at 00:30:18:cd:e8:54 on igb0 permanent [ethernet]
? (192.168.1.100) at f0:de:f1:f3:17:88 on igb0 expires in 1198 seconds [ethernet]
Thu Dec 21 13:24:22 UTC 2017
root@OPNsense:~ #


Another user of this system (I think he is using pfSense 2.4) switched EEE off via the driver, and at the same time he has set hw.igb.num_queues=1. Up until now, he did not see any issues. I will try this, too. I'll keep you updated.


There are a few items of concern here.
First, the num_queues setting has to do with the number of cores available divided by the number of ports. There should never be more ports than cores or the queues will overrun and could cause a reset of the port. The value of num_queues should be less than or equal to the cores/ports number. This is automatically calculated by the OS if not overridden by the settings. As an example, if you have 4 cores and 3 ports, the num_queues should be 4/3=1.33 which should be set to 1.

Secondly, the eee setting must be done in the tunables section, as the eee setting does not work in the loader.conf.local. Also, all power management settings in the BIOS should be disabled.
You can use the command 'sysctrl -A' in the shell to see the actual settings in use.


Quote from: wefinet on December 21, 2017, 02:31:43 PM
Another user of this system (I think he is using pfSense 2.4) switched EEE off via the driver, and at the same time he has set hw.igb.num_queues=1. Up until now, he did not see any issues. I will try this, too. I'll keep you updated.

Hi all,

that sounds good. Therefore, I also meant that this should be default or adjustable by gui.
happy New Year

cheers till


That Intel download link is for drivers, not firmware. In the FreeBSD environment we have no control over the drivers that are used. The firmware is included as part of the bootutil software. But, would be nice to have a hacked driver with no PM at all that could be compiled into the FreeBSD OS.


That link is for development and simulation tools. probably a good place if you were going to hack the drivers or firmware.
This is what you want
https://downloadcenter.intel.com/download/19186

Thanks dcol! I had many problems with the SFP+ cards the last months, regarding SFP it's a much easier process updating the firmware on the chip.