Hostwatch - high disk writes

Started by GreenMatter, January 16, 2026, 08:51:04 PM

Previous topic - Next topic
After upgrading to 25.7.11 hostwatch (v. 1.0.2) causes high (60M) disk writes and increased CPU utilisation. 
Is there any fix for it?
OPNsense on:
Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (4 cores)
8 GB RAM
50 GB HDD
and plenty of vlans ;-)


Quote from: franco on January 16, 2026, 09:02:37 PMhttps://github.com/opnsense/changelog/blob/efe03ef435b5abfff641262fd69e02efd926be5a/community/25.7/25.7.11#L10-L12

Interfaces: Neighbors: Automatic Discovery.


Cheers,
Franco
Thanks, I've seen it. But it still causing really high disk writes. For a time being I stopped this service...
OPNsense on:
Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (4 cores)
8 GB RAM
50 GB HDD
and plenty of vlans ;-)

Well, it's either enabled or not. There may be a bug that doesn't stop it but I haven't seen it. Worst case a reboot would take care of it (when properly disabled).


Cheers,
Franco

Quote from: franco on January 16, 2026, 09:26:15 PMWell, it's either enabled or not. There may be a bug that doesn't stop it but I haven't seen it. Worst case a reboot would take care of it (when properly disabled).


Cheers,
Franco
Does hostwatch suppose to create such disk writes?
OPNsense on:
Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (4 cores)
8 GB RAM
50 GB HDD
and plenty of vlans ;-)

January 16, 2026, 09:57:32 PM #5 Last Edit: January 16, 2026, 10:00:14 PM by OPNenthu
Mine is using only ~56k so far since upgrade, but my home network is small.

root@firewall:/var/log/hostwatch # ls -l
total 56
-rw-------  1 root wheel 56388 Jan 16 14:35 hostwatch_20260116.log
lrwxr-x---  1 root wheel    41 Jan 16 15:01 latest.log -> /var/log/hostwatch/hostwatch_20260116.log


Is your hostwatch log being flooded with error messages, or is most of your log filled with host discoveries?

Looking at iostat in the console, I am seeing high disk writes, too.

root@OPNsense:~ # iostat -x
                        extended device statistics
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b
ada0           1     107     42.0   2570.7     1     1     0     1    0   8

You can see the instantaneous rate by issuing iostat -x 2.  When I disable Automatic Discovery, the instantaneous writes drop back to near zero.

This is on a small home network.  I have disabled the feature for now.

It's supposed to log hardware address movements, but if it seems them constantly that is probably undesirable as logging. The issue is clear and we'll find a solution for it soon.


Cheers,
Franco

Out of curiosity, what is considered a "movement", and what sort of errors would it log? Just trying to get a handle on the high writes. I don't see any notable change in write frequency on my own system, and it's the wacky four-bridge setup, where "Interfaces: Neighbors: Automatic Discovery" (by default) picks up every MAC on multiple interfaces. (I fired it up with default settings just to see if I could trigger the issue, as I don't use it normally; actual ARP mapping does not normally move, as I do not normally re-plug machines and I have static ARP entries to tame my ISP's unlimited proxy.)

Today at 12:23:05 AM #9 Last Edit: Today at 12:58:25 AM by OPNenthu
@franco I know you said the issue is clear, but I just hit it as well.  Adding a data point.

I don't know the trigger but I can tell you what I did before it happened.  My site-to-site wireguard tunnel wasn't working because an Unbound blocklist was blocking my DDNS provider, so I added that the subdomains I needed to the whitelist and got the tunnel working again.  Shortly after, the system memory spiked and top is showing the issue. 

last pid: 27749;  load averages:  2.25,  2.23,  1.85                                                              up 0+16:09:17  18:10:59
92 processes:  1 running, 91 sleeping
CPU: 39.2% user,  0.0% nice, 11.7% system,  0.0% interrupt, 49.1% idle
Mem: 4752M Active, 614M Inact, 26M Laundry, 1946M Wired, 401M Free
ARC: 1148M Total, 196M MFU, 754M MRU, 12M Anon, 20M Header, 163M Other
     835M Compressed, 2203M Uncompressed, 2.64:1 Ratio
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
96386 hostd        20  20    0    94M    11M uwait    0  24:44  99.90% hostwatch
17017 root          5  21    0    65M    32M kqread   2  23:58  99.71% syslog-ng
92246 root          1  20    0    15M  3932K CPU1     1   0:00   0.14% top
58889 root          1  24    0    59M    33M nanslp   2   0:00   0.10% php
48741 nobody        1  20    0    17M  5504K select   0   0:02   0.06% dnsmasq
  520 root          4  68    0   121M    46M accept   3   0:47   0.05% python3.11
97534 root          1  20    0    23M    10M kqread   1   0:08   0.04% lighttpd

In addition, the symlink 'latest.log' disappeared, but the hostwatch service is not dead.


root@firewall:/var/log/hostwatch # ls -l
total 4032364
-rw-------  1 root wheel 4129140649 Jan 16 18:09 hostwatch_20260116.log


Note: I have two wireguard VPN gateways which were operational the whole time and didn't trigger this.

---

Resolution attempts:

1. Disable the WG instance & peer for the site-to-site tunnel
- No effect

2. Restart Unbound service
- No effect

3. Restart Host discovery service
- Restored the CPU to idle, but did not free used memory.  No change in iostat.

last pid: 93333;  load averages:  1.43,  2.06,  1.94                                                              up 0+16:17:01  18:18:43
98 processes:  1 running, 97 sleeping
CPU:  0.0% user,  0.0% nice,  0.8% system,  0.0% interrupt, 99.2% idle
Mem: 4851M Active, 552M Inact, 29M Laundry, 1911M Wired, 396M Free
ARC: 1108M Total, 198M MFU, 725M MRU, 760K Anon, 18M Header, 164M Other
     791M Compressed, 2139M Uncompressed, 2.70:1 Ratio
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
57652 root          1  20    0    15M  3980K CPU0     0   0:00   0.16% top
85230 root          1  21    0    59M    33M nanslp   0   0:00   0.10% php
16847 root          1  20    0    24M  8112K select   3   0:07   0.07% ntpd
60180 root          1  20    0    14M  2976K bpf      1   0:00   0.06% filterlog
17017 root          3  20    0    67M    36M kqread   1  30:45   0.06% syslog-ng
  520 root          4  68    0   129M    45M accept   2   0:50   0.06% python3.11
97534 root          1  20    0    23M  9948K kqread   0   0:08   0.05% lighttpd
48741 nobody        1  20    0    17M  5624K select   2   0:03   0.03% dnsmasq

root@firewall:~ # iostat -x
                        extended device statistics 
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b 
mmcsd0         0       0      0.0      0.0     0     0     0     0    0   0
mmcsd0bo       0       0      0.0      0.0     0     0     0     0    0   0
mmcsd0bo       0       0      0.0      0.0     0     0     0     0    0   0
nda0           2       9     20.5    238.7     0     0     2     1    0   1
pass0          0       0      0.0      0.0     0     0     0     0    0   0

UPDATE:  the memory usage as reported by the chart on the Dashboard has come back down to baseline.  It was up over 72% earlier and is now back to a normal (for me) ~21%. 

Nevermind :)  It's the same as reported below:

https://forum.opnsense.org/index.php?topic=50393.0
https://github.com/opnsense/hostwatch/issues/8

logspam:

<11>1 2026-01-16T17:48:18-05:00 firewall.h1.home.arpa hostwatch 96386 - [meta sequenceId="2850265"]   2026-01-16T22:48:18.250468Z ERROR hostwatch: Error reading from capture: libpcap error: The interface disappeared

What wasn't mentioned there is the high memory allocation in addition to CPU spike, and the loss of the logfile symlink even after restarting the service.