Show Posts

Hardware and Performance / OPNSense and SSDs, expected wear with normal use

« on: June 29, 2022, 08:57:04 pm »

I just recently happened to look at the SMART data of the SSD in my OPNSense machine and noticed total writes and life left values were a bit surprising considering how long the machine has been operational.

The machine has been in use for almost a year now as my primary home firewall. So no extravagant use cases. This is what the drive SMART reports:

Code: [Select]

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       8272
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       22
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       0
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       0
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   037   056   000    Old_age   Always       -       37 (Min/Max 16/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   088   088   000    Old_age   Offline      -       88
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       1248
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       9423
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       54
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       124
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       171
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       168104

I was surprised at the amount of writes, as well as how far the wear has progressed in just a year. Now I'm wondering if these numbers are in line with what can be expected, or if there is something wrong with my setup.

About a week ago when I first noticed it, the life left reading was at 89. I didn't think of taking notes of the other figures, but after I updated OPNsense last weekend, I took the numbers down:

Code: [Select]

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       8167
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       22
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       0
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       0
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   034   056   000    Old_age   Always       -       34 (Min/Max 16/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   088   088   000    Old_age   Offline      -       88
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       1233
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       9343
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       54
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       123
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       171
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       166475

Comparing with the above readings from today, in bit over four days there has been 80GiB written, which sounds a bit on the high side to me.

From what I can tell, flowd_aggregate.py is almost sole responsible for the writes:

Code: [Select]

# top -m io -o write -b
last pid: 40420;  load averages:  0.06,  0.11,  0.08  up 4+21:53:06    21:17:36
53 processes:  1 running, 52 sleeping
CPU:  1.0% user,  0.0% nice,  0.6% system,  0.3% interrupt, 98.1% idle
Mem: 150M Active, 2887M Inact, 782M Wired, 430M Buf, 4200M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
  699 root      1393772 458763    840 1273223      0 1274063  99.58% python3.8
35311 dhcpd     853298   1329     33   2949     33   3015   0.24% dhcpd
96277 dhcpd     861423   2637      0    588      0    588   0.05% dhcpd
99676 root        3756     82     70    416     51    537   0.04% radiusd
72210 root      1400080  16412      1    229      0    230   0.02% syslog-ng
 1549 root        1741     16      0    227      0    227   0.02% dhcpleases6
 5872 _flowd    200493   1613      4    159      0    163   0.01% flowd
90495 _dhcp      54483    115      0     14      0     14   0.00% dhclient
49557 root        8415    649      0      8      0      8   0.00% radvd
73189 root      825674   3873      0      0      0      0   0.00% python3.8
  422 root           4      3      0      0      0      0   0.00% python3.8
96103 root           6      0      0      0      0      0   0.00% rtsold
  424 root      171305   3810    215      0      6    221   0.02% python3.8

# ps -a -p 699
PID TT  STAT      TIME COMMAND
699  -  Ss   143:34.61 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)

Is this something to be expected? Or something wrong with my config, or even in flowd_aggregate.py itself? Could something be done to reduce the writing, apart from disabling NetFlow completely?

Admittedly this is not a high quality SSD drive (KINGSTON SA400S37240G, not my choice) which might be a contributing factor to the quickly diminished life left number.

Any thoughts?

Topics - nichiren

Hardware and Performance / OPNSense and SSDs, expected wear with normal use

21.1 Legacy Series / Freeradius EAP-TLS no longer working after upgrade to 21.1.7