1
Hardware and Performance / OPNSense and SSDs, expected wear with normal use
« on: June 29, 2022, 08:57:04 pm »
I just recently happened to look at the SMART data of the SSD in my OPNSense machine and noticed total writes and life left values were a bit surprising considering how long the machine has been operational.
The machine has been in use for almost a year now as my primary home firewall. So no extravagant use cases. This is what the drive SMART reports:
I was surprised at the amount of writes, as well as how far the wear has progressed in just a year. Now I'm wondering if these numbers are in line with what can be expected, or if there is something wrong with my setup.
About a week ago when I first noticed it, the life left reading was at 89. I didn't think of taking notes of the other figures, but after I updated OPNsense last weekend, I took the numbers down:
Comparing with the above readings from today, in bit over four days there has been 80GiB written, which sounds a bit on the high side to me.
From what I can tell, flowd_aggregate.py is almost sole responsible for the writes:
Is this something to be expected? Or something wrong with my config, or even in flowd_aggregate.py itself? Could something be done to reduce the writing, apart from disabling NetFlow completely?
Admittedly this is not a high quality SSD drive (KINGSTON SA400S37240G, not my choice) which might be a contributing factor to the quickly diminished life left number.
Any thoughts?
The machine has been in use for almost a year now as my primary home firewall. So no extravagant use cases. This is what the drive SMART reports:
Code: [Select]
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 000 Old_age Always - 100
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 8272
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 22
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
167 Write_Protect_Mode 0x0000 100 100 000 Old_age Offline - 0
168 SATA_Phy_Error_Count 0x0012 100 100 000 Old_age Always - 0
169 Bad_Block_Rate 0x0000 100 100 000 Old_age Offline - 0
170 Bad_Blk_Ct_Erl/Lat 0x0000 100 100 010 Old_age Offline - 0/0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 MaxAvgErase_Ct 0x0000 100 100 000 Old_age Offline - 0
181 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count 0x0000 100 100 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Unsafe_Shutdown_Count 0x0012 100 100 000 Old_age Always - 17
194 Temperature_Celsius 0x0022 037 056 000 Old_age Always - 37 (Min/Max 16/56)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 SATA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
218 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
231 SSD_Life_Left 0x0000 088 088 000 Old_age Offline - 88
233 Flash_Writes_GiB 0x0032 100 100 000 Old_age Always - 1248
241 Lifetime_Writes_GiB 0x0032 100 100 000 Old_age Always - 9423
242 Lifetime_Reads_GiB 0x0032 100 100 000 Old_age Always - 54
244 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 124
245 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 171
246 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 168104
I was surprised at the amount of writes, as well as how far the wear has progressed in just a year. Now I'm wondering if these numbers are in line with what can be expected, or if there is something wrong with my setup.
About a week ago when I first noticed it, the life left reading was at 89. I didn't think of taking notes of the other figures, but after I updated OPNsense last weekend, I took the numbers down:
Code: [Select]
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 000 Old_age Always - 100
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 8167
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 22
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
167 Write_Protect_Mode 0x0000 100 100 000 Old_age Offline - 0
168 SATA_Phy_Error_Count 0x0012 100 100 000 Old_age Always - 0
169 Bad_Block_Rate 0x0000 100 100 000 Old_age Offline - 0
170 Bad_Blk_Ct_Erl/Lat 0x0000 100 100 010 Old_age Offline - 0/0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 MaxAvgErase_Ct 0x0000 100 100 000 Old_age Offline - 0
181 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count 0x0000 100 100 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Unsafe_Shutdown_Count 0x0012 100 100 000 Old_age Always - 17
194 Temperature_Celsius 0x0022 034 056 000 Old_age Always - 34 (Min/Max 16/56)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 SATA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
218 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
231 SSD_Life_Left 0x0000 088 088 000 Old_age Offline - 88
233 Flash_Writes_GiB 0x0032 100 100 000 Old_age Always - 1233
241 Lifetime_Writes_GiB 0x0032 100 100 000 Old_age Always - 9343
242 Lifetime_Reads_GiB 0x0032 100 100 000 Old_age Always - 54
244 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 123
245 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 171
246 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 166475
Comparing with the above readings from today, in bit over four days there has been 80GiB written, which sounds a bit on the high side to me.
From what I can tell, flowd_aggregate.py is almost sole responsible for the writes:
Code: [Select]
# top -m io -o write -b
last pid: 40420; load averages: 0.06, 0.11, 0.08 up 4+21:53:06 21:17:36
53 processes: 1 running, 52 sleeping
CPU: 1.0% user, 0.0% nice, 0.6% system, 0.3% interrupt, 98.1% idle
Mem: 150M Active, 2887M Inact, 782M Wired, 430M Buf, 4200M Free
Swap: 8192M Total, 8192M Free
PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND
699 root 1393772 458763 840 1273223 0 1274063 99.58% python3.8
35311 dhcpd 853298 1329 33 2949 33 3015 0.24% dhcpd
96277 dhcpd 861423 2637 0 588 0 588 0.05% dhcpd
99676 root 3756 82 70 416 51 537 0.04% radiusd
72210 root 1400080 16412 1 229 0 230 0.02% syslog-ng
1549 root 1741 16 0 227 0 227 0.02% dhcpleases6
5872 _flowd 200493 1613 4 159 0 163 0.01% flowd
90495 _dhcp 54483 115 0 14 0 14 0.00% dhclient
49557 root 8415 649 0 8 0 8 0.00% radvd
73189 root 825674 3873 0 0 0 0 0.00% python3.8
422 root 4 3 0 0 0 0 0.00% python3.8
96103 root 6 0 0 0 0 0 0.00% rtsold
424 root 171305 3810 215 0 6 221 0.02% python3.8
# ps -a -p 699
PID TT STAT TIME COMMAND
699 - Ss 143:34.61 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)
Is this something to be expected? Or something wrong with my config, or even in flowd_aggregate.py itself? Could something be done to reduce the writing, apart from disabling NetFlow completely?
Admittedly this is not a high quality SSD drive (KINGSTON SA400S37240G, not my choice) which might be a contributing factor to the quickly diminished life left number.
Any thoughts?