OPNsense Forum

English Forums => Hardware and Performance => Topic started by: NevadaTech on July 01, 2026, 06:29:11 PM

Title: Proxmox crashing, bad RAM?
Post by: NevadaTech on July 01, 2026, 06:29:11 PM
Please review this excerpt from my Proxmox log. Prox is 8.4.19, the mobo is Asrock X470D4U, the RAM is 128GB (4x32GB) Nemix.

I have 5 other servers similar to this one working fine. The exception is one other server with the same HARDWARE ERROR that pops up on the console. I believe that one also has Nemix RAM. I believe the other four servers have Kingston KSM26ED8/16ME RAM but no errors.

12:37 Hardware Error
13:21 Hardware Error
14:53 Hardware Error
15:45 server reboot

I also see a SMART thermal message but that seems like more of a 'notice'.

Under the Prox log is an output from 'dmidecode -t 17'. While it lists specs it doesn't list actual manufacturer part number. I believe the RAM is actually 3200 speed but running at a lower 2666 speed. I tried an 'lshw -C memory' but lshw is not installed.


------------------------------------------ start some Proxmox log dump
Jun 30 12:37:21 virt09b kernel: mce: [Hardware Error]: Machine check events logged
Jun 30 12:37:21 virt09b kernel: [Hardware Error]: Corrected error, no action required.
Jun 30 12:37:21 virt09b kernel: [Hardware Error]: CPU:0 (17:71:0) MC17_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|Scrub]: 0x9c2041000000011b
Jun 30 12:37:21 virt09b kernel: [Hardware Error]: Error Addr: 0x0000000bbf588300
Jun 30 12:37:21 virt09b kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000000040a801101
Jun 30 12:37:21 virt09b kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
Jun 30 12:37:21 virt09b kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x4)
Jun 30 12:37:21 virt09b kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Jun 30 12:43:12 virt09b smartd[1543]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Jun 30 13:04:37 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 13:08:53 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 13:17:01 virt09b CRON[151843]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 30 13:17:01 virt09b CRON[151844]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 30 13:17:01 virt09b CRON[151843]: pam_unix(cron:session): session closed for user root
Jun 30 13:19:41 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 13:21:03 virt09b kernel: mce: [Hardware Error]: Machine check events logged
Jun 30 13:21:03 virt09b kernel: [Hardware Error]: Corrected error, no action required.
Jun 30 13:21:03 virt09b kernel: [Hardware Error]: CPU:0 (17:71:0) MC17_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|Scrub]: 0x9c2041000000011b
Jun 30 13:21:03 virt09b kernel: [Hardware Error]: Error Addr: 0x0000000bbf588300
Jun 30 13:21:03 virt09b kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000000040a801101
Jun 30 13:21:03 virt09b kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
Jun 30 13:21:03 virt09b kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x4)
Jun 30 13:21:03 virt09b kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Jun 30 13:43:12 virt09b smartd[1543]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Jun 30 13:45:16 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 13:47:15 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 13:50:08 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 13:54:31 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 13:57:41 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:05:23 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:06:44 virt09b systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Jun 30 14:06:44 virt09b systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Jun 30 14:06:44 virt09b systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Jun 30 14:06:44 virt09b systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Jun 30 14:17:01 virt09b CRON[172546]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 30 14:17:01 virt09b CRON[172547]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 30 14:17:01 virt09b CRON[172546]: pam_unix(cron:session): session closed for user root
Jun 30 14:27:29 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:28:37 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:35:25 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:43:12 virt09b smartd[1543]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 66 to 67
Jun 30 14:44:19 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:45:10 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:53:09 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 14:53:53 virt09b kernel: mce: [Hardware Error]: Machine check events logged
Jun 30 14:53:53 virt09b kernel: [Hardware Error]: Corrected error, no action required.
Jun 30 14:53:53 virt09b kernel: [Hardware Error]: CPU:0 (17:71:0) MC17_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|Scrub]: 0x9c2041000000011b
Jun 30 14:53:53 virt09b kernel: [Hardware Error]: Error Addr: 0x0000000bbf520300
Jun 30 14:53:53 virt09b kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000000040a801101
Jun 30 14:53:53 virt09b kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
Jun 30 14:53:53 virt09b kernel: EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#1channel#0 (csrow:1 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x4)
Jun 30 14:53:53 virt09b kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Jun 30 14:56:46 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 15:06:35 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 15:07:25 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 15:08:03 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 15:13:12 virt09b smartd[1543]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 66
Jun 30 15:15:21 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 15:17:01 virt09b CRON[193198]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 30 15:17:01 virt09b CRON[193199]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 30 15:17:01 virt09b CRON[193198]: pam_unix(cron:session): session closed for user root
Jun 30 15:38:08 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
Jun 30 15:45:39 virt09b kernel: AMD-Vi: Completion-Wait loop timed out
-- Reboot --
Jun 30 15:47:33 virt09b kernel: Linux version 6.8.12-30-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-30 (2026-06-11T10:10Z) ()
Jun 30 15:47:33 virt09b kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-30-pve root=/dev/mapper/pve-root ro quiet
Jun 30 15:47:33 virt09b kernel: KERNEL supported cpus:
Jun 30 15:47:33 virt09b kernel:   Intel GenuineIntel



------------------------------------------ end some Proxmox log dump








------------------------------------------
dmidecode -t 17 show

Memory Device
        Array Handle: 0x0014
        Error Information Handle: 0x0021
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMM 0
        Bank Locator: P0 CHANNEL B
        Type: DDR4
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 2666 MT/s
        Manufacturer: Unknown
        Serial Number: 5D270016
        Asset Tag: Not Specified
        Part Number: Unknown
        Rank: 2
        Configured Memory Speed: 2666 MT/s
        Minimum Voltage: 1.2 V
        Maximum Voltage: 1.2 V
        Configured Voltage: 1.2 V
        Memory Technology: DRAM
        Memory Operating Mode Capability: Volatile memory
        Firmware Version: Unknown
        Module Manufacturer ID: Unknown
        Module Product ID: Unknown
        Memory Subsystem Controller Manufacturer ID: Unknown
        Memory Subsystem Controller Product ID: Unknown
        Non-Volatile Size: None
        Volatile Size: 32 GB
        Cache Size: None
        Logical Size: None

------------------------------------------
Title: Re: Proxmox crashing, bad RAM?
Post by: pfry on July 01, 2026, 07:01:50 PM
Yep, looks like a bad stick. You could test it with memtest86 (https://www.memtest86.com/) just to verify, but I doubt it would give a different result.

The only Nemix DIMMs I've seen used re-marked chips - not for me. YMMV. Heck, it might even have a warranty... but the manufacturer has an incentive to reject claims at the moment.

Quote from: NevadaTech on Today at 06:29:11 PM[...]I also see a SMART thermal message but that seems like more of a 'notice'.[...]

What temperature? The controller shouldn't exceed its limits, but higher temperatures are generally detrimental. Might be worth addressing while you're in it.
Title: Re: Proxmox crashing, bad RAM?
Post by: BrandyWine on July 01, 2026, 09:24:50 PM
Does the bios/uefi have hardware tests in it? If so I would 1st use that mem test tool, just to verify things.
Title: Re: Proxmox crashing, bad RAM?
Post by: meyergru on July 01, 2026, 09:29:16 PM
X470 chips in general and the consumer platform based Asrock Mainboards are notorious for failing early, too.