WAN link gone sometimes (igb driver, I211 nics), ifconfig d/u fixes it

Started by Werner Fischer, July 17, 2017, 03:54:41 PM

Previous topic - Next topic
Where do you set these values, under System: Settings: Tunables?

The settings may be too late if the boot takes long, maybe they could also be set under /boot/loader.conf.local, but I'm not sure.


Cheers,
Franco

Hi Franco,

thank you very much for your hint. Indeed, I'm setting the variables as tunables - like I described here: https://www.thomas-krenn.com/de/wiki/OPNsense_igb_EEE_Funktion_deaktivieren

Regarding setting it via configuration files you mentioned that I could try to set it in /boot/loader.conf.local
In a blog posting about network tuning in BSD - https://calomel.org/freebsd_network_tuning.html - the Intel igb EEE setting is described to be set in /etc/sysctl.conf

My questions:

  • Should I add the settings in both files to be on the safe side?
  • Is there anything else I should test right now (as the error is currently present) before I apply the settings and reboot the system?

Thanks again very much for your valuable help.

On my (meanwhile OPNsense 17.7 test system) I have now added the setting to both /boot/loader.conf.local and /etc/sysctl.conf (and rebooted the system afterwards):


root@OPNsense:~ # cat /boot/loader.conf.local
dev.igb.0.eee_disabled=1
dev.igb.1.eee_disabled=1
dev.igb.2.eee_disabled=1
dev.igb.3.eee_disabled=1
dev.igb.4.eee_disabled=1
dev.igb.5.eee_disabled=1
dev.igb.6.eee_disabled=1
dev.igb.7.eee_disabled=1
dev.igb.8.eee_disabled=1
dev.igb.9.eee_disabled=1
root@OPNsense:~ # cat /etc/sysctl.conf
# $FreeBSD$
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
dev.igb.0.eee_disabled=1
dev.igb.1.eee_disabled=1
dev.igb.2.eee_disabled=1
dev.igb.3.eee_disabled=1
dev.igb.4.eee_disabled=1
dev.igb.5.eee_disabled=1
dev.igb.6.eee_disabled=1
dev.igb.7.eee_disabled=1
dev.igb.8.eee_disabled=1
dev.igb.9.eee_disabled=1
root@OPNsense:~ #


I'll continue to use this setup for the next 2 weeks (I'm powering down the firewall before I leave the office, as the problem occurred only after 3-5 after boot of the firewall - at least in my tests).

By the way: With the current pfSense version I have not been able to reproduce this issue, although it's FreeBSD 10.3 base uses the same igb driver version (2.5.3) like FreeBSD 11.0 does. Are there maybe any other networking changes/tunables between FreeBSD 10.3 and 11.0 that could lead to this issue?

I'll keep you updated once I have any news on the issue.

Best regards,
Werner

Unfortunately, the problem now occurred again although I have added the settings to both /boot/loader.conf.local and /etc/sysctl.conf as described above.

As a next step to narrow down the root cause, I will continue to test with another system. The current system has 10 * Intel i211AT Gigabit LAN, the second (which I want to test now - http://www.jetwaycomputer.com/JBC385F551.html) has the following NICs:

  • 1 x Intel i219-LM PHY Gigabit LAN (iAMT 11)
  • 1 x Intel i211-AT PCI-E Gigabit LAN
  • 4 x Intel i350-AM4 PCI-E Gigabit LAN

Maybe the problem only affects the Intel i211-AT chip...

I will keep you updated. In case you have any news/ideas, just let me know.

Thanks & best regards,
Werner

As I think that the network issues might be related to some energy saving functions, I have now switched back to the JBC390F541AA-19-B system with its 10 Intel i211-AT based NICs.

I have changed the BIOS setting (BIOS Version file BAR1NA02, BIOS Date 02/25/2016) to the following settings:

  • [F3] (Load Optimized Defaults)
  • Advanced -> OS Selection -> Android (instead of the default "Windows 7")
  • Advanced -> ACPI Settings -> ACPI Sleep State -> Suspend Disabled (instead of the default "S3 (Suspend to RAM)")
  • Advanced -> CPU Configuration -> EIST -> Disabled (instead of the default "Enabled")
  • Advanced -> CPU Configuration -> Max CPU C State -> C1 (instead of the default "C7")
  • Chipset -> South Bridge -> Audio Controller -> Disabled (instead of the default "Enabled")
  • Chipset -> South Bridge -> Azalia HDMI Codec -> Disabled (instead of the default "Enabled")
  • Chipset -> South Bridge -> System State after Power Failure -> Former State (insted of the default "Always Off")

And I have set the following variables as suggested/mentioned in https://www.freebsd.org/cgi/man.cgi?query=pci&sektion=4 and https://calomel.org/freebsd_network_tuning.html


root@OPNsense:~ # cat /boot/loader.conf.local
dev.igb.0.eee_disabled=1
dev.igb.1.eee_disabled=1
dev.igb.2.eee_disabled=1
dev.igb.3.eee_disabled=1
dev.igb.4.eee_disabled=1
dev.igb.5.eee_disabled=1
dev.igb.6.eee_disabled=1
dev.igb.7.eee_disabled=1
dev.igb.8.eee_disabled=1
dev.igb.9.eee_disabled=1

hw.pci.do_power_suspend=0

dev.igb.0.fc=0
dev.igb.1.fc=0
dev.igb.2.fc=0
dev.igb.3.fc=0
dev.igb.4.fc=0
dev.igb.5.fc=0
dev.igb.6.fc=0
dev.igb.7.fc=0
dev.igb.8.fc=0
dev.igb.9.fc=0
root@OPNsense:~ # cat /etc/sysctl.conf
# $FreeBSD$
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
dev.igb.0.eee_disabled=1
dev.igb.1.eee_disabled=1
dev.igb.2.eee_disabled=1
dev.igb.3.eee_disabled=1
dev.igb.4.eee_disabled=1
dev.igb.5.eee_disabled=1
dev.igb.6.eee_disabled=1
dev.igb.7.eee_disabled=1
dev.igb.8.eee_disabled=1
dev.igb.9.eee_disabled=1

hw.pci.do_power_suspend=0

dev.igb.0.fc=0
dev.igb.1.fc=0
dev.igb.2.fc=0
dev.igb.3.fc=0
dev.igb.4.fc=0
dev.igb.5.fc=0
dev.igb.6.fc=0
dev.igb.7.fc=0
dev.igb.8.fc=0
dev.igb.9.fc=0
root@OPNsense:~ #


I have also set all these variables as "tunables" in OPNsense, as some settings (e.g. "dev.igb.0.fc=0") have not set the desired value (sysctl -a reported e.g. "dev.igb.0.fc=0" - adding the variables as tunables in OPNsense fixed this).

I have added the output of "pciconf -lvbce" as an attachment (forum login needed to see it). I wanted to check the setting for Active State Power Management - ASPM (all devices show "ASPM disabled(L0s/L1)"). I have found a posting (although 5 years old), were someone suggest to disable this feature in the BIOS ("Just make sure you keep the Active State Power Management option in the Advanced Chipset Control BIOS screen at the Disabled setting (this is the default), because when I enabled this, my Intel NICs occasionally got stuck in a low power state, needing a full reset to resolve." see https://forums.freebsd.org/threads/35529/#post-195907). There is currently no option in the BIOS of the JBC390F541AA-19-B for ASPM. But as pciconf reports it as "disabled" I _think_ this should be ok.

I keep you updated whether I get the NIC issues again or not.

I did not have the impression that the settings have helped (although I have not tested over a longer period and I did not see the issue during my short tests).

Meanwhile I got feedback from the board manufacturer regarding "Active State Power Management" (ASPM) for PCIe. There is no option for this in the BIOS version BAR1NA02, but the default setting is already disable (like "pciconf -lvbce" shows it). As ASPM is not activated, it cannot be causing my issue.

I now want to narrow down whether the problem has to do with FreeBSD version 11.0. With pfSense 2.3 (FreeBSD 10.3) we have not observed the issue. As pfSense 2.4 RC is out (currently using 11.0-RELEASE-p12), I'll check whether it is running into this problem (I _think/assume_ that the problem could arise then, too).

For my test I went back to the BIOS and set the following options:

  • [F3] (Load Optimized Defaults)
  • Advanced -> OS Selection -> Android (instead of the default "Windows 7")
I have kept the default igb driver settings (so dev.igb.9.eee_disabled is set to the default "0" for all 10 NICs).

I will keep you updated once I have any new information.

Hi Werner,

Sorry to hear this is still happening, but thank you for keeping on top of it! :)


Cheers,
Franco

Using pfSense 2.4 RC I also got some issues after some minutes of using it. From my client laptop (connected directly to igb0 (LAN link of the firewall)) I was not able to reach the WAN network anymore. I did not break completely, I was able to reach pfSense's web interface, on the other side a SSH session, which I have opened before, broke:


[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: ifconfig
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:e8:54
hwaddr 00:30:18:cd:e8:54
inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255
inet6 fe80::1:1%igb0 prefixlen 64 scopeid 0x1
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:e8:55
hwaddr 00:30:18:cd:e8:55
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:80
hwaddr 00:30:18:cd:ef:80
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb3: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:81
hwaddr 00:30:18:cd:ef:81
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb4: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:82
hwaddr 00:30:18:cd:ef:82
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb5: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ef:83
hwaddr 00:30:18:cd:ef:83
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb6: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ec:60
hwaddr 00:30:18:cd:ec:60
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb7: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ec:61
hwaddr 00:30:18:cd:ec:61
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb8: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ec:62
hwaddr 00:30:18:cd:ec:62
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
igb9: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:30:18:cd:ec:63
hwaddr 00:30:18:cd:ec:63
inet6 fe80::230:18ff:fecd:ec63%igb9 prefixlen 64 scopeid 0xa
inet 10.1.102.55 netmask 0xffffff00 broadcast 10.1.102.255
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
enc0: flags=0<> metric 0 mtu 1536
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: enc
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xc
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
pfsync0: flags=0<> metric 0 mtu 1500
groups: pfsync
syncpeer: 224.0.0.240 maxupd: 128 defer: on
syncok: 1
pflog0: flags=100<PROMISC> metric 0 mtu 33160
groups: pflog
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: ps aux
USER      PID  %CPU %MEM    VSZ   RSS TT  STAT STARTED     TIME COMMAND
root       11 398.3  0.0      0    64  -  RL   10:33   78:02.96 [idle]
root        0   0.0  0.0      0   560  -  DLs  10:33    0:00.01 [kernel]
root        1   0.0  0.0   5004   840  -  ILs  10:33    0:00.01 /sbin/init --
root        2   0.0  0.0      0    16  -  DL   10:33    0:00.00 [crypto]
root        3   0.0  0.0      0    16  -  DL   10:33    0:00.00 [crypto returns]
root        4   0.0  0.0      0    32  -  DL   10:33    0:00.00 [cam]
root        5   0.0  0.0      0    16  -  DL   10:33    0:00.00 [sctp_iterator]
root        6   0.0  0.0      0    16  -  DL   10:33    0:00.26 [pf purge]
root        7   0.0  0.0      0    16  -  DL   10:33    0:00.37 [rand_harvestq]
root        8   0.0  0.0      0    16  -  DL   10:33    0:00.00 [soaiod1]
root        9   0.0  0.0      0    16  -  DL   10:33    0:00.00 [soaiod2]
root       10   0.0  0.0      0    16  -  DL   10:33    0:00.00 [audit]
root       12   0.0  0.0      0  1040  -  WL   10:33    0:05.85 [intr]
root       13   0.0  0.0      0    64  -  DL   10:33    0:00.00 [ng_queue]
root       14   0.0  0.0      0    48  -  DL   10:33    0:00.02 [geom]
root       15   0.0  0.0      0    96  -  DL   10:33    0:00.06 [usb]
root       16   0.0  0.0      0    16  -  DL   10:33    0:00.02 [acpi_thermal]
root       17   0.0  0.0      0    16  -  DL   10:33    0:00.00 [soaiod3]
root       18   0.0  0.0      0    16  -  DL   10:33    0:00.00 [soaiod4]
root       19   0.0  0.0      0    32  -  DL   10:33    0:00.02 [pagedaemon]
root       20   0.0  0.0      0    16  -  DL   10:33    0:00.00 [vmdaemon]
root       21   0.0  0.0      0    16  -  DL   10:33    0:00.00 [pagezero]
root       22   0.0  0.0      0    16  -  DL   10:33    0:00.01 [bufspacedaemon]
root       23   0.0  0.0      0    32  -  DL   10:33    0:00.03 [bufdaemon]
root       24   0.0  0.0      0    16  -  DL   10:33    0:00.01 [vnlru]
root       25   0.0  0.0      0    16  -  DL   10:33    0:00.05 [syncer]
root       56   0.0  0.0      0    16  -  DL   10:33    0:00.01 [md0]
root      294   0.0  0.3 269012 27140  -  Ss   10:33    0:00.03 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
root      308   0.0  0.1  19404  4504  -  INs  10:33    0:00.01 /usr/local/sbin/check_reload_status
root      310   0.0  0.1  19404  4300  -  IN   10:33    0:00.00 check_reload_status: Monitoring daemon of check_reload_status
root      322   0.0  0.1   9508  4912  -  Ss   10:33    0:00.01 /sbin/devd -q -f /etc/pfSense-devd.conf
root     6410   0.0  0.0  10496  2304  -  Is   10:33    0:00.00 dhclient: igb9 [priv] (dhclient)
_dhcp   12169   0.0  0.0  10496  2404  -  Is   10:33    0:00.00 dhclient: igb9 (dhclient)
root    15133   0.0  0.0  12636  2344  -  Ss   10:33    0:00.08 /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
root    24528   0.0  0.0  10948  2316  -  Is   10:33    0:00.26 /usr/local/bin/dpinger -S -r 0 -i WAN_DHCP -B 10.1.102.55 -p /v
unbound 25239   0.0  0.3  72792 24572  -  Ss   10:33    0:00.48 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root    30402   0.0  0.1  35588  6900  -  Is   10:33    0:00.00 nginx: master process /usr/local/sbin/nginx -c /var/etc/nginx-w
root    30627   0.0  0.1  37636  7652  -  I    10:33    0:00.02 nginx: worker process (nginx)
root    30669   0.0  0.1  35588  7492  -  I    10:33    0:00.00 nginx: worker process (nginx)
root    31269   0.0  0.0  12468  2360  -  Is   10:33    0:00.00 /usr/sbin/cron -s
root    31823   0.0  0.2  24564 12396  -  Ss   10:33    0:00.20 /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.
dhcpd   35871   0.0  0.2  22808 13404  -  Ss   10:33    0:00.07 /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhc
root    36272   0.0  0.0  10332  2296  -  S    10:33    0:00.02 /usr/local/sbin/radvd -p /var/run/radvd.pid -C /var/etc/radvd.c
root    39167   0.0  0.4 269012 35036  -  I    10:47    0:00.03 php-fpm: pool nginx (php-fpm)
root    41695   0.0  0.1  53408  7524  -  Is   10:47    0:00.00 /usr/sbin/sshd
root    42294   0.0  0.1  78756  8056  -  Ss   10:47    0:00.06 sshd: admin@pts/0 (sshd)
root    55386   0.0  0.0  10448  2516  -  Ss   10:34    0:00.05 /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/ru
root    56705   0.0  0.0   8200  1984  -  Is   10:34    0:00.00 /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr/local/
root    57021   0.0  0.0   8200  2000  -  I    10:34    0:00.00 minicron: helper /usr/local/bin/ping_hosts.sh  (minicron)
root    57155   0.0  0.0   8200  1984  -  Is   10:34    0:00.00 /usr/local/bin/minicron 3600 /var/run/expire_accounts.pid /usr/
root    57320   0.0  0.0   6148  1908  -  IN   10:52    0:00.00 sleep 60
root    57414   0.0  0.0   8200  2000  -  I    10:34    0:00.00 minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.expireaccou
root    57694   0.0  0.0   8200  1984  -  Is   10:34    0:00.00 /usr/local/bin/minicron 86400 /var/run/update_alias_url_data.pi
root    57970   0.0  0.0   8200  2000  -  I    10:34    0:00.00 minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.update_alia
root    89712   0.0  0.0  10552  2296  -  Is   10:34    0:00.00 /usr/local/sbin/sshlockout_pf 15
root    40597   0.0  0.0  13048  2544 v0- IN   10:34    0:00.23 /bin/sh /var/db/rrd/updaterrd.sh
root    88321   0.0  0.0  39404  2816 v0  Is   10:34    0:00.01 login [pam] (login)
root    89819   0.0  0.0  13048  2888 v0  I    10:34    0:00.01 -sh (sh)
root    89857   0.0  0.0  13048  2760 v0  I+   10:34    0:00.00 /bin/sh /etc/rc.initial
root    88339   0.0  0.0  10364  2120 v1  Is+  10:34    0:00.00 /usr/libexec/getty Pc ttyv1
root    88645   0.0  0.0  10364  2120 v2  Is+  10:34    0:00.00 /usr/libexec/getty Pc ttyv2
root    88769   0.0  0.0  10364  2120 v3  Is+  10:34    0:00.00 /usr/libexec/getty Pc ttyv3
root    88815   0.0  0.0  10364  2120 v4  Is+  10:34    0:00.00 /usr/libexec/getty Pc ttyv4
root    88999   0.0  0.0  10364  2120 v5  Is+  10:34    0:00.00 /usr/libexec/getty Pc ttyv5
root    89192   0.0  0.0  10364  2120 v6  Is+  10:34    0:00.00 /usr/libexec/getty Pc ttyv6
root    89470   0.0  0.0  10364  2120 v7  Is+  10:34    0:00.00 /usr/libexec/getty Pc ttyv7
root    42960   0.0  0.0  13048  2760  0  Is   10:47    0:00.01 /bin/sh /etc/rc.initial
root    57901   0.0  0.0  21056  2684  0  R+   10:53    0:00.00 ps aux
root    60110   0.0  0.0  13336  3780  0  S    10:47    0:00.04 /bin/tcsh
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: packet_write_wait: Connection to 192.168.1.1 port 22: Broken pipe
wfischer@tpw:/home/wfischer-isos/pfsense$


A "ifconfig igb0 down" and "ifconfig igb0 up" and applying a DHCP-client configuration on my laptop (using Network Manager in Ubuntu 16.04) fixed it, but I'm not sure how long it works.

As these symptoms are not exactly the same as I had with OPNsense before, it might be that these issues could be related to pfSense 2.4 RC. But I think for this very system (JBC390F541AA-19-B) there are some networking issues with FreeBSD 11.0 which have not been there with FreeBSD 10.3.

I'll keep you updated once I have any news.

I also have a preliminary kernel for 11.1 (no HardenedBSD additions, no shared forwarding) if you want to try. There could be some fixes we are simply missing?

This would be nice, and for sure worth a try!
Can you send me some details how I could grab this Kernel and how I apply it to OPNsense?


Thanks a lot for the details in the PM, I'll try this as soon as 17.7.1 is out and keep you updated once I have any new findings.

I'm currently still having pfSense 2.4 RC on the system, and today in the morning I got exactly the same problem like I get with OPNsense 17.1/17.7: after some time of operation (about 1-3 minutes after boot) I run into the problem. EEE was _not_ deactivated, but as I have seen the issue on OPNsense with EEE having deactivated, too, I won't do currently any tests with EEE deactivated with pfSense 2.4 RC.

Attached you find a full bunch of logs and command output (in case it helps us to analyze the root cause of the issue).

One thing I want to show you right here - as you can see an "ifconfig igb9 down" followed by an "ifconfig igb9 up" fixes the issue:

[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: ping 10.1.102.1
PING 10.1.102.1 (10.1.102.1): 56 data bytes
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
ping: sendto: Host is down
^C
--- 10.1.102.1 ping statistics ---
4 packets transmitted, 0 packets received, 100.0% packet loss
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: arp -a
? (10.1.102.1) at (incomplete) on igb9 expired [ethernet]
? (10.1.102.55) at 00:30:18:cd:ec:63 on igb9 permanent [ethernet]
pfSense24.test.thomas-krenn.com (192.168.1.1) at 00:30:18:cd:e8:54 on igb0 permanent [ethernet]
? (192.168.1.100) at f0:de:f1:f3:17:88 on igb0 expires in 480 seconds [ethernet]
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: date
Thu Aug 31 09:55:44 CEST 2017
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: ifconfig igb9 down
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: date
Thu Aug 31 09:56:26 CEST 2017
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: ifconfig igb9 up
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: date
Thu Aug 31 09:56:34 CEST 2017
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: date
Thu Aug 31 09:56:41 CEST 2017
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: arp -a
? (10.1.102.1) at 4c:5e:0c:4b:23:30 on igb9 expires in 1196 seconds [ethernet]
? (10.1.102.55) at 00:30:18:cd:ec:63 on igb9 permanent [ethernet]
pfSense24.test.thomas-krenn.com (192.168.1.1) at 00:30:18:cd:e8:54 on igb0 permanent [ethernet]
? (192.168.1.100) at f0:de:f1:f3:17:88 on igb0 expires in 1199 seconds [ethernet]
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: date
Thu Aug 31 09:56:47 CEST 2017
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root: ping 10.1.102.1
PING 10.1.102.1 (10.1.102.1): 56 data bytes
64 bytes from 10.1.102.1: icmp_seq=0 ttl=64 time=0.340 ms
64 bytes from 10.1.102.1: icmp_seq=1 ttl=64 time=0.246 ms
64 bytes from 10.1.102.1: icmp_seq=2 ttl=64 time=0.238 ms
64 bytes from 10.1.102.1: icmp_seq=3 ttl=64 time=0.225 ms
^C
--- 10.1.102.1 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.225/0.262/0.340/0.046 ms
[2.4.0-RC][admin@pfSense24.test.thomas-krenn.com]/root:


Before I continue to test with the upcoming 17.7.1 and the preliminary Kernel, I think I'll test with pfSense 2.3 (which is based on FreeBSD 10.3) and watch out if it really works rock-solid (to have more evidence that FreeBSD 11.0 is causing the issue, while FreeBSD 10.3 brings no issues).

When I see that pfSense 2.3 indeed runs solid like expected, I'll grab all the output for analysis (especially "sysctl -a") and will compare it to the outputs that I have attached in this post. Maybe we find some settings, which differ, that then could maybe be the reason for this problem.

I will keep you updated  ;)

Since my last posting on Aug, 31st, I've been running pfSense 2.3.4 on the system, without having any issues. So I'm rather sure, that my problem has to do with FreeBSD 11.0 vs. FreeBSD 10.3. I have attached a ZIP with the logs of pfSense 2.3.4.

I have searched for differences and I have found:

  • sysctl -a shows for pfSense 2.3.4 the item "hw.igb.buf_ring_size: 4096". This item is missing on pfSense 2.4-RC. Could this be causing the issue? (UPDATE: on OPNsense 17.7.2 "hw.igb.buf_ring_size: 4096" is present - so I think this is not the root cause for my problem)
  • dmesg shows on pfSense 2.4-RC pci entries with "[GIANT-LOCKED]". pfSense 2.3.4 does not list this kind of items, see the code below. Could this be causing the issue? (UPDATE: also OPNsense 17.7.2 shows "[GIANT-LOCKED]")

Here is the code for the mentioned dmesg part:

pfSense 2.3.4 dmesg:
  ...
  pcib19: <PCI-PCI bridge> irq 19 at device 4.0 on pci15
  pci19: <PCI bus> on pcib19
  igb9: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0x5000-0x501f mem 0xd0600000-0xd061ffff,0xd0620000-0xd0623fff irq 19 at device 0.0 on pci19
  ...

pfSense 2.4.0-RC dmesg:
  ...
  pcib19: <PCI-PCI bridge> irq 19 at device 4.0 on pci12
  pcib19: [GIANT-LOCKED]
  pci16: <PCI bus> on pcib19
  igb9: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0x5000-0x501f mem 0xd0600000-0xd061ffff,0xd0620000-0xd0623fff irq 19 at device 0.0 on pci16
  ...


Tomorrow, I will switch back to OPNsense and I will install the FreeBSD 11.1 kernel. I'll keep you updated.

I have now installed a FreeBSD 11.1 Kernel (got instruction for that from Franco via PM):


root@OPNsense:~ # freebsd-version -k
11.1-RELEASE-p1
root@OPNsense:~ # freebsd-version -u
11.0-RELEASE-p12
root@OPNsense:~ #


Also with FreeBSD 11.1 the NICs show "GIANT-LOCKED" in the dmesg output (as they do also with FreeBSD 11.0, but not with pfSense 2.3/FreeBSD 10.3 (which does not have the issue)):

...
pcib18: <PCI-PCI bridge> irq 18 at device 3.0 on pci12
pcib18: [GIANT-LOCKED]
pci15: <PCI bus> on pcib18
igb8: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0x6000-0x601f mem 0xd0700000-0xd071ffff,0xd0720000-0xd0723fff irq 18 at device 0.0 on pci15
igb8: Using MSIX interrupts with 3 vectors
igb8: Ethernet address: 00:30:18:cd:ec:62
igb8: Bound queue 0 to cpu 0
igb8: Bound queue 1 to cpu 1
igb8: netmap queues/slots: TX 2/1024, RX 2/1024
pcib19: <PCI-PCI bridge> irq 19 at device 4.0 on pci12
pcib19: [GIANT-LOCKED]
pci16: <PCI bus> on pcib19
igb9: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0x5000-0x501f mem 0xd0600000-0xd061ffff,0xd0620000-0xd0623fff irq 19 at device 0.0 on pci16
igb9: Using MSIX interrupts with 3 vectors
igb9: Ethernet address: 00:30:18:cd:ec:63
igb9: Bound queue 0 to cpu 2
igb9: Bound queue 1 to cpu 3
igb9: netmap queues/slots: TX 2/1024, RX 2/1024
...


But anyway I will stay with this setup for the next days and will watch out whether the problem shows up again or not. I'll keep you updated.