SQLITE3 constantly writing to disk after upgrade to 26.1

Started by buedi, February 07, 2026, 08:51:29 PM

Previous topic - Next topic
I posted here (https://forum.opnsense.org/index.php?topic=50771.0) earlier while being on 25.7 , but I figured I can upgrade to 26.1, maybe things will change.
And they did... it got worse.

I upgraded from 25.7 to 26.1 and now sqlite3 is constantly writing on the SSD and I cannot figure out why it is doing this. It does not look like more space is consumed (I did not notice a change in the last 3 hours), but the constant writing is worrying.

A top -S -m io -o total shows:
  PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
83231 root         481      0      2   2396      0   2398  97.44% sqlite3
35108 hostd         22      0      0     63      0     63   2.56% hostwatch
73985 root           0      0      0      0      0      0   0.00% php-cgi
70657 root           0      0      0      0      0      0   0.00% openvpn
59009 root           0      0      0      0      0      0   0.00% cron

So it is clearly sqlite3 writing all the time. But I have no idea which OPNsense function / service this could be.

I have read, that Neighbor Discovery can cause load, so I disabled that. No change.
Then I disabled the Wazuh Agent I installed recently. No Change
I also disabled Netflow. No change.
Captiva Portal seems to use sqlite3 too, but this is turned off for me.
I also disabled Surricata to see if it changes anything... no change.

Since I upgraded to 26.1 this is happening and the upgrade was like 5 or 6 hours ago. I suspect, even if this was some internal Database migration, it should be finished a long time ago.

How can I track down what sqlite3 is doing here?

Any help is appreciated very much by me and my SSD ;-)

Disable hostwatch. Interfaces: Neighbours: Auto discovery.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Thank you for trying to help. But as I wrote above, I disabled that one already (I have seen the thread about it) and even rebootet after disabling, just to be sure. It changes nothing, unfortunately. SQLITE3 is still hammering on the Disk after all those hours. My disk usage does not even change (at least not from what I can see) and I wonder if some process is writing on the same table entry 24/7 here.

I just cannot figure out how to look into SQLITE3 service to see from which PID which SQL queries come :-(

New day, fresh Ideas... I found procstat and played around with it a bit.

root@OPNsense:/dev/fd # procstat -f 88990
  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME
88990 sqlite3           text v r r-------   -       - -   /usr/local/bin/sqlite3
88990 sqlite3            cwd v d r-------   -       - -   /
88990 sqlite3           root v d r-------   -       - -   /
88990 sqlite3              0 v r r-------   1 8519680 -   /var/netflow/src_addr_086400.sqlite.clean.sql
88990 sqlite3              1 v c rw------   6       0 -   /dev/null
88990 sqlite3              2 v c rw------   6       0 -   /dev/null
88990 sqlite3              3 v r rw------   1       0 -   /var/netflow/src_addr_086400.sqlite.fix
88990 sqlite3              4 v r rw------   2       0 -   /var/netflow/src_addr_086400.sqlite.fix-journal

So it seems to have something to do with Netflow, maybe? I am not sure about the filenames yet... .clean and .fix sound like something that would happen after an upgrade or crash. I will keep investigating.
It did not stop when I removed all interfaces from Netstat (apart from WAN which seems to be the minimum you need to select). But I will try if I can stop netstat entirely to see if the writing goes away. Will not be a fix, but at least I would know what process is causing the I/O.

So after configuring Netflow like this:
- Listening interfaces: Nothing selected
- WAN interfaces: Nothing selected

A new sqlite3 process was opened, but the writing continues... still on Netflow database files?

top -S -m io -o total
  PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
 4640 root         484      1      0   2525      0   2525 100.00% sqlite3
47616 root           2      0      0      0      0      0   0.00% lighttpd
    1 root           0      0      0      0      0      0   0.00% init

and
root@OPNsense:/dev/fd # procstat -f 4640
  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME
 4640 sqlite3           text v r r-------   -       - -   /usr/local/bin/sqlite3
 4640 sqlite3            cwd v d r-------   -       - -   /
 4640 sqlite3           root v d r-------   -       - -   /
 4640 sqlite3              0 v r r-------   1 1572864 -   /var/netflow/src_addr_086400.sqlite.clean.sql
 4640 sqlite3              1 v c rw------   6       0 -   /dev/null
 4640 sqlite3              2 v c rw------   6       0 -   /dev/null
 4640 sqlite3              3 v r rw------   2       0 -   /var/netflow/src_addr_086400.sqlite.fix
 4640 sqlite3              4 v r rw------   1       0 -   /var/netflow/src_addr_086400.sqlite.fix-journal

So I would assume that not Netflow itself is causing the I/O, but something that works with the data within these database files / tables?

More testing... I tried to Reporting --> Settings --> Repair Netflow Data.
After that my /var/netflow folder grew from around 500MB to 650MB and now I have 2 SQLITE3 processes writing to the disk. Space usage stays about the same for over 30 minutes now.

Next try: Reporting --> Settings --> Reset Netflow Data.
This made the folder shrink from 650MB to around 300MB, but both SQLITE3 processes are still running and writing to the Database files:

root@OPNsense:/var/netflow # procstat -f 7155
  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME
 7155 sqlite3           text v r r-------   -       - -   /usr/local/bin/sqlite3
 7155 sqlite3            cwd v d r-------   -       - -   /usr/local/opnsense/service
 7155 sqlite3           root v d r-------   -       - -   /
 7155 sqlite3              0 v r r-------   1 13631488 -   /var/netflow/src_addr_details_086400.sqlite.clean.sql
 7155 sqlite3              1 v c rw------  15       0 -   /dev/null
 7155 sqlite3              2 v c rw------  15       0 -   /dev/null
 7155 sqlite3              3 v r rw------   2       0 -   /var/netflow/src_addr_details_086400.sqlite.fix
 7155 sqlite3              4 v r rw------   1       0 -   /var/netflow/src_addr_details_086400.sqlite.fix-journal
root@OPNsense:/var/netflow # procstat -f 79089
  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME
79089 sqlite3           text v r r-------   -       - -   /usr/local/bin/sqlite3
79089 sqlite3            cwd v d r-------   -       - -   /
79089 sqlite3           root v d r-------   -       - -   /
79089 sqlite3              0 v r r-------   1 12451840 -   /var/netflow/src_addr_086400.sqlite.clean.sql
79089 sqlite3              1 v c rw------   2       0 -   /dev/null
79089 sqlite3              2 v c rw------   2       0 -   /dev/null
79089 sqlite3              3 v r rw------   2       0 -   /var/netflow/src_addr_086400.sqlite.fix
79089 sqlite3              4 v r rw------   1       0 -   /var/netflow/src_addr_086400.sqlite.fix-journal

The folder before I reset the Netflow data:
root@OPNsense:/var/netflow # ls -lsa
total 634961
     9 drwxr-x---   2 root wheel        25 Feb  8 09:08 .
     9 drwxr-xr-x  31 root wheel        31 Feb  3 14:04 ..
    33 -rw-r-----   1 root wheel    102400 Feb  6 23:59 dst_port_000300.sqlite
    13 -rw-r-----   1 root wheel     29240 Feb  7 14:33 dst_port_000300.sqlite-journal
    29 -rw-r-----   1 root wheel    376832 Feb  6 23:59 dst_port_003600.sqlite
     9 -rw-r-----   1 root wheel     21032 Feb  7 14:33 dst_port_003600.sqlite-journal
 24705 -rw-r-----   1 root wheel  73572352 Feb  6 23:59 dst_port_086400.sqlite
    13 -rw-r-----   1 root wheel     21032 Feb  7 14:33 dst_port_086400.sqlite-journal
   105 -rw-r-----   1 root wheel   2527232 Feb  6 23:59 interface_000030.sqlite
   905 -rw-r-----   1 root wheel   2355200 Feb  6 23:59 interface_000300.sqlite
   453 -rw-r-----   1 root wheel    983040 Feb  6 23:59 interface_003600.sqlite
   269 -rw-r-----   1 root wheel    585728 Feb  8 09:05 interface_086400.sqlite
     5 -rw-r-----   1 root wheel     12288 Feb  6 23:59 metadata.sqlite
   169 -rw-r-----   1 root wheel    495616 Feb  6 23:59 src_addr_000300.sqlite
   109 -rw-r-----   1 root wheel    286720 Feb  8 09:03 src_addr_003600.sqlite
174105 -rw-r-----   1 root wheel 486756352 Feb  7 16:00 src_addr_086400.sqlite
 92393 -rw-r-----   1 root wheel 431340383 Feb  8 09:01 src_addr_086400.sqlite.clean.sql
   817 -rw-r-----   1 root wheel   2871296 Feb  8 09:08 src_addr_086400.sqlite.fix
     1 -rw-r-----   1 root wheel     12824 Feb  8 09:08 src_addr_086400.sqlite.fix-journal
 92401 -rw-r-----   1 root wheel 431340429 Feb  8 09:01 src_addr_086400.sqlite.sql
124177 -rw-r-----   1 root wheel 433369088 Feb  8 09:01 src_addr_details_086400.sqlite
 62001 -rw-r-----   1 root wheel 353618428 Feb  8 09:05 src_addr_details_086400.sqlite.clean.sql
   221 -rw-r-----   1 root wheel   1363968 Feb  8 09:08 src_addr_details_086400.sqlite.fix
     1 -rw-r-----   1 root wheel     12824 Feb  8 09:08 src_addr_details_086400.sqlite.fix-journal
 62021 -rw-r-----   1 root wheel 353618447 Feb  8 09:05 src_addr_details_086400.sqlite.sql

And the folder after I reset the Netflow Data:
root@OPNsense:/var/netflow # ls -lsa
total 319059
    9 drwxr-x---   2 root wheel        13 Feb  8 09:37 .
    9 drwxr-xr-x  31 root wheel        31 Feb  3 14:04 ..
   13 -rw-r-----   1 root wheel     29240 Feb  7 14:33 dst_port_000300.sqlite-journal
    9 -rw-r-----   1 root wheel     21032 Feb  7 14:33 dst_port_003600.sqlite-journal
   13 -rw-r-----   1 root wheel     21032 Feb  7 14:33 dst_port_086400.sqlite-journal
92393 -rw-r-----   1 root wheel 431340383 Feb  8 09:01 src_addr_086400.sqlite.clean.sql
 5029 -rw-r-----   1 root wheel  15032320 Feb  8 09:37 src_addr_086400.sqlite.fix
ls: ./src_addr_086400.sqlite.fix-journal: No such file or directory
    1 -rw-r-----   1 root wheel     12824 Feb  8 09:37 src_addr_086400.sqlite.fix-journal
92401 -rw-r-----   1 root wheel 431340429 Feb  8 09:01 src_addr_086400.sqlite.sql
62001 -rw-r-----   1 root wheel 353618428 Feb  8 09:05 src_addr_details_086400.sqlite.clean.sql
 5165 -rw-r-----   1 root wheel  17735680 Feb  8 09:37 src_addr_details_086400.sqlite.fix
ls: ./src_addr_details_086400.sqlite.fix-journal: No such file or directory
    1 -rw-r-----   1 root wheel     12824 Feb  8 09:37 src_addr_details_086400.sqlite.fix-journal
62021 -rw-r-----   1 root wheel 353618447 Feb  8 09:05 src_addr_details_086400.sqlite.sql
The "No such file or directory" only comes up sometimes when doin an ls -lsa, and I suppose that those exist only for a fraction of a second. Maybe these are the files that sqlite3 writes to constantly? Like create the file, write, delete, repeat...?

So after rebooting, no more writing to to the disk!
I think the reset of the Netflow database helped. Although there´s still around 300MB of Data left in the /var/netflow directory, the reboot killed both sqlite3 processes.

I see close to 0 write access on the disk now for several minutes, so the issue seems to be fixed.

Questions that are open:
1. I do not see any sqlite3 processes anymore at all. Is this right or is something broken after I reset the Netflow database?
2. I still wonder what triggered the constant writing to the database, Losing my Netflow history is not a big deal for me, but knowing the real root cause and being able to fix it without losing the Netflow history would be a nice to have.

Edit:  A nice side-effect now is, that my CPU temps are back to sub 50°C  instead of being around 57-60°C.

Since I think it is fixed now, I want to sum it up for everyone else that might encounter the same problem and seeks a fix.

Issue
Like in the Subject of this thread, after upgrading to 26.1 I realized, that there was constant Disk activity. My OPNsense Box has an LED indicator for disc access, otherwise I might not have noticed, because it did not impact performance of the OPNsense at all and there was close to no CPU utilization. If you want to check for yourself, you can with
top -S -m io -o total

In my case sqlite3 was constantly sitting on top of the list with 100% write transactions.


Investigation
Knewing it was sqlite3 causing the disk I/O, even worse write transactions on an SSD, I narrowed down from where the SQL transactions might come by using
procstat -f <pid_of_top_process_of_previous_command>
This showed me that sqlite was writing on the files in /var/netflow.

Not a solution
Trying to disable Netflow did not help. I removed all interfaces from Reporting --> Netflow --> Listening and WAN interfaces. Even after a reboot, sqlite3 was still writing to the files.

Using the Reporting --> Settings --> Repair Netflow data made it worse, spawning a second sqlite3 process, just sharing the max I/O the SSD was capable of. So no solution either.

Solution
Using the Reporting --> Settings --> Reset Netflow data function did reduce the size of the /var/netflow folder from over 600MB to around 300MB, but the writing still occured due to the 2 sqlite3 processes still running.
A reboot of OPNsense finally killed the 2 sqlite3 processes and they did not come back.

Disk I/O now happens so rarely, I can barely notice it on the Disk activity LED. System temps went down about 7-10°C.
After the reset of the Netflow database, it takes a while for Reporting --> Insight to show data again, let it sit for 10 minutes and you should see new data again.