1
High availability / realtek failures
« on: August 20, 2023, 08:15:00 pm »
I have a few OPNsense HA installs on very cheap hardware to test HA on my homelab (some beelinks and other small cheap computers with at minimum two network ports). At some point on all of them, some more often than others, they go null route or something peculiar. If I log in to the machine directly with a keyboard and monitor the thing seems to be just fine but the networking is dead. The one thing that seems to be common amongst them all is the realtek hardware. Plus I have not seen this on more expensive hardware, though I would be interested if anyone has seen these sort of issues on any other hardware. To mitigate this I have made a small script that tries to ping out to 3 different ips and reboot the machine if all three fail three times in a row.
Ugly I know, I'd be happy to hear about more elegant solutions. For now I paired this with an action:
And have that running as a cronjob every 5 minutes.
Code: [Select]
#!/usr/bin/env bash
ips='ip1 ip2 ip3'
test () {
ping -t 5 -c 1 $1
if [[ $? -eq 0 ]]; then
echo good
# any successful ping means networking is still working
exit 0
else
((++count))
fi
}
count=0
while [[ $count -lt 3 ]]; do
for i in $ips
do
test $i
done
done
# only reboot if all pings fail
reboot
Ugly I know, I'd be happy to hear about more elegant solutions. For now I paired this with an action:
Code: [Select]
# cat /usr/local/opnsense/service/conf/actions.d/actions_testnet.conf
[check]
command:/root/testnet.sh
parameters:
type:script
message:testing network
description:network test
And have that running as a cronjob every 5 minutes.