Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - dreunion61

#1
My board is a Gigabyte MJ11-EC1. That board is known for its incompatibility with ASPM. The solution is to turn off ASPM. On Linux I disable ASPM with the kernel parameter "pcie_aspm=off".

On OPNsense I already set the tunable equivalent "hw.pci.enable_aspm = 0". But I can't confirm whether it is working or not. On Linux I can list PCIe AER errors in journalctl with timestamps. In OPNSense those errors aren't even listed in dmesg.

The only place I found AER error counters are via "pciconf -lbcevV igb1" which only shows the counters without timestamps and the counters also doesn't reset after a reboot, unlike on Linux.

ecap 0001[100] = AER 2 0 fatal 1 non-fatal 2 corrected
I would like to confirm whether the tunable fixed the problem now and therefore need a way of monitoring the PCIe error counters.


The devices with errors are as following:

pcib2@pci0:1:0:0:       class=0x060400 rev=0x04 hdr=0x01 vendor=0x1a03 device=0x1150 subvendor=0x1a03 subdevice=0x1150
    vendor     = 'ASPEED Technology, Inc.'
    device     = 'AST1150 PCI-to-PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
    cap 05[50] = MSI supports 1 message, 64 bit
    cap 01[78] = powerspec 3  supports D0 D1 D2 D3  current D0
    cap 10[80] = PCI-Express 2 PCI bridge max data 128(256) RO NS
                 max read 512
                 link x1(x1) speed 5.0(5.0) ASPM disabled(L0s/L1)
    cap 0d[c0] = PCI Bridge subvendor=0x1a03 subdevice=0x1150
    ecap 0002[100] = VC 1 max VC0
    ecap 0001[800] = AER 1 0 fatal 1 non-fatal 1 corrected
  PCI-e errors = Correctable Error Detected
                 Unsupported Request Detected
     Non-fatal = Unsupported Request
     Corrected = Advisory Non-Fatal Error

pcib3@pci0:0:1:4:       class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1453 subvendor=0x1458 subdevice=0x1000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 17h (Models 00h-0fh) PCIe GPP Bridge'
    class      = bridge
    subclass   = PCI-PCI
    cap 01[50] = powerspec 3  supports D0 D3  current D0
    cap 10[58] = PCI-Express 2 root port max data 256(512) RO NS ARI disabled
                 max read 128
                 link x1(x2) speed 2.5(8.0) ASPM disabled(L1)
                 slot 0 power limit 0 mW
    cap 05[a0] = MSI supports 1 message, 64 bit
    cap 0d[c0] = PCI Bridge subvendor=0x1458 subdevice=0x1000
    cap 08[c8] = HT MSI fixed address window enabled at 0xfee00000
    ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
    ecap 0019[270] = PCIe Sec 1 lane errors 0x1
    ecap 001e[370] = L1 PM Substates 1
    ecap 001d[380] = Downstream Port Containment 1
    ecap 0023[3c4] = Designated Vendor-Specific 1
  PCI-e errors = Correctable Error Detected

pcib4@pci0:0:1:5:       class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x1453 subvendor=0x1458 subdevice=0x1000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 17h (Models 00h-0fh) PCIe GPP Bridge'
    class      = bridge
    subclass   = PCI-PCI
    cap 01[50] = powerspec 3  supports D0 D3  current D0
    cap 10[58] = PCI-Express 2 root port max data 128(512) RO NS ARI disabled
                 max read 128
                 link x1(x1) speed 2.5(8.0) ASPM disabled(L1)
                 slot 0 power limit 0 mW
    cap 05[a0] = MSI supports 1 message, 64 bit
    cap 0d[c0] = PCI Bridge subvendor=0x1458 subdevice=0x1000
    cap 08[c8] = HT MSI fixed address window enabled at 0xfee00000
    ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
    ecap 0019[270] = PCIe Sec 1 lane errors 0x1
    ecap 001e[370] = L1 PM Substates 1
    ecap 001d[380] = Downstream Port Containment 1
    ecap 0023[3c4] = Designated Vendor-Specific 1
  PCI-e errors = Correctable Error Detected

igb0@pci0:3:0:0:        class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1533 subvendor=0x1458 subdevice=0x1000
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xee800000, size 524288, enabled
    bar   [18] = type I/O Port, range 32, base 0x4000, size 32, enabled
    bar   [1c] = type Memory, range 32, base 0xee880000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 1 non-fatal 2 corrected
    ecap 0003[140] = Serial 1 18c04dffffbb755a
    ecap 0017[1a0] = TPH Requester 1
  PCI-e errors = Correctable Error Detected
                 Unsupported Request Detected
     Non-fatal = Unsupported Request
     Corrected = Replay Timer Timeout
                 Advisory Non-Fatal Error

igb1@pci0:4:0:0:        class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1533 subvendor=0x1458 subdevice=0x1000
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xee700000, size 524288, enabled
    bar   [18] = type I/O Port, range 32, base 0x3000, size 32, enabled
    bar   [1c] = type Memory, range 32, base 0xee780000, size 16384, enabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR RO NS
                 max read 512
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 1 non-fatal 2 corrected
    ecap 0003[140] = Serial 1 18c04dffffbb755b
    ecap 0017[1a0] = TPH Requester 1
  PCI-e errors = Correctable Error Detected
                 Unsupported Request Detected
     Non-fatal = Unsupported Request
     Corrected = Replay Timer Timeout