Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - gwww

#1
An update on this topic...

I just updated from 20.1.x to 22.1 (beta, latest as of ~Dec 28). I really could not stay on 20.1 any longer as security updates are critical (for any network!).

Problem still exists. It has happened twice in about 40 hours.

I'm running a Qotom with four i211 Ethernet ports.

More details in the thread I started here: https://forum.opnsense.org/index.php?topic=20456.0

Example log message:
    2022-01-01T00:33:28-05:00 Error opnsense-devel   /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for static lan(igb1)

It seemed to be related to EEE in 20.1 and disabling EEE kept the network very stable (like 100s of days with interface flapping).

I'm investigating the EEE settings in 22.1. What else can be done to diagnose this?
#2
Quote from: franco on February 02, 2021, 08:47:58 AM
So 20.1 was working? That would point to a FreeBSD 12 issue, because 20.1 was FreeBSD 11 and 20.7 and 21.1 are FreeBSD 12.
20.1.9 is what I'm running and it is stable, with eee_disable set. Without eee_disable set, the interface was very unstable. Current uptime of 71 days without interface flapping.

20.7 was very unstable to the point that interface would go down ~10 times a day.
#3
After trying everything I could think of, 20.7.5 still has many link DOWN events. My best guess is that EEE disable does not work with the new iflib that is in FreeBSD 12.

I have downgraded to 20.1.9, and with EEE disabled the router has been stable for close to 10 days. As soon as I reenable EEE link DOWN events start occurring again.

I see in the 21.1 release plan that igb/em driver stability is listed. Is there a list of what those stability fixes are?

Thanks!
#4
Thanks @dinguz. I have considered it but have not done that - yet. I'll give it a try.
#5
So, I have just removed the [RESOLVED] from the title. The Ethernet UP/DOWN is continuing. Snipped of my log...
2020-12-16T23:52:00 kernel igb1: link state changed to UP
2020-12-16T23:51:56 kernel igb1: link state changed to DOWN
2020-12-16T23:49:55 kernel igb1: link state changed to UP
2020-12-16T23:49:52 kernel igb1: link state changed to DOWN
2020-12-16T23:35:25 kernel igb1: link state changed to UP
2020-12-16T23:35:22 kernel igb1: link state changed to DOWN


There is nothing in the log that suggests why the DOWN is occurring. Literally, there previous entries in the log are some event from hours before (such as a login or something).

The interfaces were much more stable in 20.1, the move to 20.7 they have increased significantly. The series above is in a span of a couple of minutes. However, some days I don't see any link state changes.

Does anyone have any ideas of what to try? Is there more information that can be turned on (I see a debug setting in sysctl for igb interfaces)?
#6
20.7 Legacy Series / Re: [SOLVED] eee_disabled
December 16, 2020, 08:33:32 PM
Just in case others are interested in disabling Energy Efficient Ethernet (EEE) this is what I've learned, by reading the source code for the E1000 driver (its the driver that supports "em" and "igb"):

In tunables in settings, creating a tunable "hw.em.eee_setting" and setting its value to 1 causes dev.igb.<x>.eee_control to be set to 1 for all interfaces, where <x> is the interface number.

A value of non-zero for dev.igb.<x>.eee_control causes the EEE feature to be disabled for the interface. The precise line of code, just in case anyone in interested in this level of depth is https://github.com/freebsd/freebsd/blob/c33e89621813cc89d67acb01368d9760901bfab7/sys/dev/e1000/if_em.c#L4529 and https://github.com/freebsd/freebsd/blob/c33e89621813cc89d67acb01368d9760901bfab7/sys/dev/e1000/e1000_ich8lan.c#L941.

In my first post in this thread my edit says set hw.em.eee_setting to 0. That is incorrect based on reading the code. In my case with the eee_setting set to 0 I still have multiple Ethernet DOWN/UP events every day. With a setting of 1 the Ethernet is much more stable.
#7
20.7 Legacy Series / Re: eee_disabled
December 15, 2020, 05:10:45 PM
Thank you for your comments.

Without EEE disabled I see multiple link UP/DOWN events every day on both WAN and LAN interfaces.

With EEE disabled I only see the rare link UP/DOWN event limited to the WAN link, and, some are certainly justified as the cable modem reboots. I do not have stability issues.

I'm going to mark this as solved as setting hw.em.eee_setting=0 in tunables works for me. I opened the thread because moving from 20.1 to 20.7 the setting changed and initially I could not find the setting. I still can not find many docs on the setting.
#8
I just upgraded from 20.1 latest to 20.7 latest. I'm getting frequent "igb1: link state changed to DOWN" (followed by UP). What worked for me in 20.1 was setting eee_disabled to 1.

The eee_disabled flag does not appear to be supported on 20.7 as when I try to set it using sysctl I get unknown oid. I use "systctl dev.igb.0.eee_disabled=1" (I also have it set in tunables).

I do see what appears to be a new oid, eee_control, but I cannot find documentation on it.

I also see iflib is being used, but cannot find any docs describing how to disable EEE.

I also see em driver supports my Intel card, the i211. I'm unsure if I should be using the em driver over the igb driver, and if so, how to do that.

I appreciate any tips on how to proceed. My router interfaces are quite unstable at the moment.

Edit:
I added the tunable hw.em.eee_setting=0 in the OPNSense GUI. That's my best guess on the fix, but have not found much in my Googling.

For reference, my dmesg for igb0:
igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xe000-0xe01f mem 0x91400000-0x9141ffff,0x91420000-0x91423fff irq 22 at device 0.0 on pci1
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 40:62:31:0b:ca:ce
igb0: netmap queues/slots: TX 2/1024, RX 2/1024


My sysctl -A piped through grep eee (all the 'eee' settings were 1 before I set hw.em.eee_setting to 0):

hw.bxe.autogreeen: 0
hw.em.eee_setting: 0
dev.igb.3.eee_control: 0
dev.igb.2.eee_control: 0
dev.igb.1.eee_control: 0
dev.igb.0.eee_control: 0


My dmesg piped through grep igb0:

igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xe000-0xe01f mem 0x91400000-0x9141ffff,0x91420000-0x91423fff irq 22 at device 0.0 on pci1
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 2 RX queues 2 TX queues
igb0: Using MSI-X interrupts with 3 vectors
igb0: Ethernet address: 40:62:31:0b:ca:ce
igb0: netmap queues/slots: TX 2/1024, RX 2/1024
igb0: link state changed to UP
igb0: link state changed to DOWN
igb0: link state changed to UP