"SOLVED" IPS Crashes During Upload Portion of Speedtest

Started by phantomsfbw, January 15, 2022, 09:22:00 PM

Previous topic - Next topic
January 15, 2022, 09:22:00 PM Last Edit: January 30, 2022, 03:19:11 PM by phantomsfbw
--See last Phantomsfbw post in this thread to see solution-- Just moved over from the last stable version of OPNsense to this RC.  The RC crashes the network when running a SpeedTest during the Upload test.  I can still access the wired LAN, but Internet access blown away.  Must reboot OPNsense to get WAN service back.  I have narrowed it to the IPS if it is tured on.  I was able to narrow it by doing a complete reinstall and turning on one capability after another.  Running Unbound with TLS-DNS.  WAN IP is DHCP.  Using Cloudflare DNS 1.1.1.2 and 1.0.0.2. 

1G symetrical service provider through Verizon FIOS.  The WAN NIC is an Intel 1G on board motherboard. No issues in the past or under last stable OPNsense version.

Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz (12 cores) and 16G Ram

January 16, 2022, 01:17:37 AM #1 Last Edit: January 17, 2022, 09:54:17 AM by MenschAergereDichNicht
I have a similar problem. When i run the test at https://www.breitbandmessung.de/test my Wan connection is going downhill.

If i execute top in a router shell i see e.g. the following:


83.05% /usr/local/opnsense/scripts/filter/update_tables.py
47.39% /sbin/sysctl -WaN
40.26% /usr/local/bin/php /usr/local/etc/rc.filter_configure


If i wait long enough it seems to settle down and Wan is working again.

I currently have a very basic configuration on an APU4D4. Only some rules and DNS over TLS.
WAN is connected using a static IP for IPv4 and DHCP for IPv6.
I have monitoring of the Wan connection enabled inside the Gateways for IPv4 and Ipv6. Additionally inside the System DNS settings i selected those Gateways.
I use Unbound as a DNS server and some block lists are enabled.

Update:
Maybe it was just coincidence that it happened while executing the speedtest. It seems to also happen on "its own". After some time of normal behaviour i suddenly have a very high cpu usage where update_table and filter_configure seems to dominate. And the Wan connection starts dropping packages.

Unbound also needs a lot of resources sometimes:

96.36% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}


There are lots of following entries inside the system log:

2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: Removing static route for monitor 1.0.0.1 via 192.168.0.1
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: Adding static route for monitor 1.1.1.1 via 192.168.69.1
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: Removing static route for monitor 1.1.1.1 via 192.168.69.1
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: Adding static route for monitor 2606:4700:4700::1111 via fe80::eadf:70ff:fe7a:23da%igb3
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: Removing static route for monitor 2606:4700:4700::1111 via fe80::eadf:70ff:fe7a:23da%igb3
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway 'fe80::eadf:70ff:fe7a:23da%igb3'
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::eadf:70ff:fe7a:23da
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: IPv6 default gateway set to wan
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: keeping current default gateway '192.168.69.1'
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: setting IPv4 default route to 192.168.69.1
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: IPv4 default gateway set to wan
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: ROUTING: entering configure using 'wan'
2022-01-16T16:19:34 Error opnsense /usr/local/etc/rc.newwanipv6: The command '/sbin/route add -host -'inet6' '2606:4700:4700::1111' 'fe80::eadf:70ff:fe7a:23da%'' returned exit code '71', the output was 'route: fe80::eadf:70ff:fe7a:23da%: Name does not resolve'
2022-01-16T16:19:33 Error opnsense /usr/local/etc/rc.newwanipv6: On (IP address: <IPv6-Address>) (interface: WAN[wan]) (real interface: igb3).
2022-01-16T16:19:33 Error opnsense /usr/local/etc/rc.newwanipv6: IPv6 renewal is starting on 'igb3'
2022-01-16T16:19:31 Error opnsense /usr/local/etc/rc.linkup: Warning! dhcpd_radvd_configure(auto) found no suitable IPv6 address on igb1_vlan13
2022-01-16T16:19:30 Error opnsense /usr/local/etc/rc.linkup: ROUTING: skipping IPv6 default route
2022-01-16T16:19:30 Error opnsense /usr/local/etc/rc.linkup: ROUTING: IPv6 default gateway set to wan
2022-01-16T16:19:30 Error opnsense /usr/local/etc/rc.linkup: ROUTING: creating /tmp/igb3_defaultgw using '192.168.69.1'


The logger also needs a lot of CPU:

99.52% logger: zygote (logger)


Maybe because of the error reporting above... .

I saw the same thing, not sure if IPS crashed but WAN stopped getting IP address and it shoed in dashboard WAN IP 0.0.0.0. This is after running speed test.
I installed 21.7 and after some updates I am at 21.7.7 and I will stay here until things get better.

Does issuing the following fix it until it happens again?

# configctl ids restart


Cheers,
Franco

January 17, 2022, 12:29:05 PM #4 Last Edit: January 17, 2022, 12:38:30 PM by MenschAergereDichNicht
Currently Unbound seems to be the culprit

unbound 103 0 563M 446M CPU3 3 0:43 95.46% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
unbound 52 0 563M 446M kqread 1 0:00 82.67% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
unbound 52 0 563M 446M kqread 0 0:00 82.57% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
unbound 52 0 563M 446M kqread 3 0:00 82.57% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
91701 unbound 37 0 71M 59M CPU0 0 0:04 64.78% /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf list_local_data


I am still trying to replicate the "old" problem without unbound consuming most of the CPU resources.

I should probably open a separate thread for that... .

No need, there's a ticket and forum posts https://github.com/opnsense/core/issues/5367 and the matter wasn't settled so far by community members...

January 17, 2022, 12:59:46 PM #6 Last Edit: January 17, 2022, 01:01:43 PM by MenschAergereDichNicht
Sorry. I already opened up a new thread before i read your answer (i thought the "ids" part in the command hinted at the fact that the topic author mentioned IPS).

I did try the command in such a situation. It does not help in my case.

ok, biggest question then is whether you have Firewall: Settings: Advanced: "Dynamic state reset" checked? And if you have it checked try again with it unchecked...

January 17, 2022, 01:12:42 PM #8 Last Edit: January 17, 2022, 01:51:16 PM by MenschAergereDichNicht
It is unchecked. Seems to be the default.

I did a clean install without the import of an old configuration. I re-created all from scratch. The only thing i changed inside the firewall settings is adjusting the "Firewall Optimization" to "conservative".

January 17, 2022, 01:29:53 PM #9 Last Edit: January 17, 2022, 02:17:23 PM by MenschAergereDichNicht
Regarding Unbound i see lots of messages like

Error unbound [31602:0] error: could not SSL_write crypto error:00000000:lib(0):func(0):reason(0)
2022-01-17T12:58:36 Error unbound [70967:0] error: remote control failed ssl crypto error:00000000:lib(0):func(0):reason(0)


inside the unbound log.

Because i use DNS over TLS maybe it is a problem with the DNS servers i entered (Cloudflare and dns.digitale-gesellschaft.ch)?

I also use a port forward NAT for DNS and NTP to ensure that the local DNS/NTP server is used inside my network.
For DNS i forward ports 53 and 853 to the local DNS server.

To be certain and not let this digress to an Unbound discussion, my Unbound with DNS over TLS works without issue.  I can select Intrusion Detection to Enabled and things are also fine, but when I select IPS mode to On, this is when Speedtest will then crash the WAN and I have to reboot. I also get the WAN IP zeroed as in 0.0.0.0 as some of the earlier post mentioned.

22.1 RC2 is a little better in the sense the system does not crash now when doing a speedtest and IPS is enabled.  However, on the upload part of the test with IPS enabled, the test hangs in the 300-400MB range, and then the system recovers.  The NIC for the WAN is an Intel i219v if that helps.  CPU and RAM are at most taxed at 25% when running the test on a symmetrical 1GB FIOS line.  As mentioned earlier, there were no issues in this setup using the 21.7 software. 

Had the same issue back with the betas, only that the whole system was freezed until I rebooted it.

https://www.reddit.com/r/OPNsenseFirewall/comments/rgxd0h/multi_wan_and_suricata_freezes_opnsense_when/

Issue also happend with the 21.7.6 and got fixed with the suricata rollback in 21.7.7..

January 30, 2022, 03:16:54 PM #13 Last Edit: January 30, 2022, 03:23:23 PM by phantomsfbw
Well dummy me. Finally solved the problem by fixing the "Home networks" setting.  To see this setting, you must check advanced mode under the administration menu and you will see it under the setting tab as a small toggle.  If you are only using Suricata on the WAN interface, you need to delete those default LAN interfaces from the Home networks setting. You then need to add your WAN IP address in the Home network setting.

Also note to keep in mind that if your ISP uses DHCP to provide your WAN IP address, then your WAN IP address could change from time-to-time so you will need to adjust the Home network setting accordingly.

Consider this post solved!