Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - nichiren

#1
I was planning on looking into the /var usage further this weekend now that I might have actual time on my hands but sounds like you've already done the legwork.

For what it's worth, to me that sounds like a good idea. To have the most write intensive part in ramdrive and synced on boot to permanent storage (or maybe also periodically with configurable interval?) would probably dramatically reduce the SSH wear. I'd second that feature request.

Simplest and quickest solution would probably be to get a separate HDD as suggested, and just have it mounted as /var. Not the most elegant solution though, and in my case I'm not sure if the computer (some oldish Dell USFF desktop) can even fit an additional drive. And I know many who use even smaller computers.
#2
Yes, obviously the data needs to be written. What I'm after was more like if this amount of writing is to be expected (judging from the answer - yes?), and if it is, can it be reduced somehow without disabling Netflow (and using external storage as no such thing is available).

Of course, I could perhaps add a HDD as a second drive for this and logging in general, but I've no clue whether OPNSense supports this without resorting to sorcery.
#3
I just recently happened to look at the SMART data of the SSD in my OPNSense machine and noticed total writes and life left values were a bit surprising considering how long the machine has been operational.

The machine has been in use for almost a year now as my primary home firewall. So no extravagant use cases. This is what the drive SMART reports:

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       8272
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       22
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       0
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       0
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   037   056   000    Old_age   Always       -       37 (Min/Max 16/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   088   088   000    Old_age   Offline      -       88
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       1248
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       9423
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       54
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       124
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       171
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       168104


I was surprised at the amount of writes, as well as how far the wear has progressed in just a year. Now I'm wondering if these numbers are in line with what can be expected, or if there is something wrong with my setup.

About a week ago when I first noticed it, the life left reading was at 89. I didn't think of taking notes of the other figures, but after I updated OPNsense last weekend, I took the numbers down:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       8167
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       22
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       0
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       0
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   034   056   000    Old_age   Always       -       34 (Min/Max 16/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   088   088   000    Old_age   Offline      -       88
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       1233
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       9343
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       54
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       123
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       171
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       166475


Comparing with the above readings from today, in bit over four days there has been 80GiB written, which sounds a bit on the high side to me.

From what I can tell, flowd_aggregate.py is almost sole responsible for the writes:

# top -m io -o write -b
last pid: 40420;  load averages:  0.06,  0.11,  0.08  up 4+21:53:06    21:17:36
53 processes:  1 running, 52 sleeping
CPU:  1.0% user,  0.0% nice,  0.6% system,  0.3% interrupt, 98.1% idle
Mem: 150M Active, 2887M Inact, 782M Wired, 430M Buf, 4200M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
  699 root      1393772 458763    840 1273223      0 1274063  99.58% python3.8
35311 dhcpd     853298   1329     33   2949     33   3015   0.24% dhcpd
96277 dhcpd     861423   2637      0    588      0    588   0.05% dhcpd
99676 root        3756     82     70    416     51    537   0.04% radiusd
72210 root      1400080  16412      1    229      0    230   0.02% syslog-ng
1549 root        1741     16      0    227      0    227   0.02% dhcpleases6
5872 _flowd    200493   1613      4    159      0    163   0.01% flowd
90495 _dhcp      54483    115      0     14      0     14   0.00% dhclient
49557 root        8415    649      0      8      0      8   0.00% radvd
73189 root      825674   3873      0      0      0      0   0.00% python3.8
  422 root           4      3      0      0      0      0   0.00% python3.8
96103 root           6      0      0      0      0      0   0.00% rtsold
  424 root      171305   3810    215      0      6    221   0.02% python3.8

# ps -a -p 699
PID TT  STAT      TIME COMMAND
699  -  Ss   143:34.61 /usr/local/bin/python3 /usr/local/opnsense/scripts/netflow/flowd_aggregate.py (python3.8)


Is this something to be expected? Or something wrong with my config, or even in flowd_aggregate.py itself? Could something be done to reduce the writing, apart from disabling NetFlow completely?

Admittedly this is not a high quality SSD drive (KINGSTON SA400S37240G, not my choice) which might be a contributing factor to the quickly diminished life left number.

Any thoughts?
#4
I looked around the section and noted that there have been some issues with Freeradius after upgrading to 21.1.7, but did not see something like what I stumbled upon so I'm not sure if this is connected.

I've had Freeradius set up in OPNsense to act as an authentication service for a wireless AP. It's using the fairly usual self-signed "root" CA -> intermediate CA -> client/server certificate chain, generated outside OPNsense and imported.

This setup has been working without a hitch for quite a while, but now after upgrading to 21.1.7 (from 21.1.6) my wireless clients are no longer authenticated and unable to connect. Nothing has been changed in the configuration in the clients, AP or even in OPNsense side, yet for some reason Freeradius is now apparently unable to find the issuer certificate:
Sun Jun 27 08:34:47 2021 : ERROR: (64) eap_tls:   ERROR: SSL says error 2 : unable to get issuer certificate
Sun Jun 27 08:34:47 2021 : ERROR: (64) eap_tls: ERROR: (TLS) Alert write:fatal:unknown CA
Sun Jun 27 08:34:47 2021 : ERROR: (64) eap_tls: ERROR: (TLS) Server : Error in error


Freeradius is configured to use a server certificate signed with the intermediate CA, and this server certificate can be seen in System -> Trust -> Certificates and is recognized to be issued by the intermediate CA.

This in turn is in System -> Trust -> Authorities, shown as issued by the root CA which also is present there, issued by self-issued as it should.

After reverting the freeradius3 package using opnsense-revert -r 21.1.6 freeradius3 authentication works again.