OPNsense Forum

Archive => 20.7 Legacy Series => Topic started by: Archanfel80 on December 20, 2020, 09:50:40 pm

Title: 20.7.7_1 randomly killing tcp active states
Post by: Archanfel80 on December 20, 2020, 09:50:40 pm
Hi!

After the upgrade i experienced a strange behaviour. The firewall randomly kills every tcp connection in every 15-20 minutes. It is a kill not connection reset, so if im on an ssh term its just frozen not got broken pipe.
Its a single gateway machine but if i disabled the state killing on gateway failure and enabled the sticky connection that help. The gateway is stable, i dint notice any gateway failure but still opnsense sometimes declared its down.
Even if i disable gateway monitoring! So this feature is pretty much garbage currently. No matter if you use or not, not reliable. Therefore every firewall with multiple gateway and load balancing is acting really weird now. I had to disable the load balancin completely.
There is a way to rollback the whole system without reinstalling? 20.7.6 was fine.

Thx!
Title: Re: 20.7.7_1 randomly killing tcp active states
Post by: mimugmail on December 20, 2020, 11:09:29 pm
Screenshot of Firewall : Settings : Advanced and gateway log please
Title: Re: 20.7.7_1 randomly killing tcp active states
Post by: Archanfel80 on December 21, 2020, 11:19:13 am
The gateway log is full of this now:
(the gw is fine, i do a ping test all the time from my control machine, not a single packet loss or higher latency ping)
This was not happened with 20.7.6
Affected multiple opnsense in my company. The affected systems are: vmware VM with vmxnet, pcengine APU box with igb
Attached the screenshot, everything was unchecked except ipv6, now i checked the sticky connections and the disable the state killing feautre. With the disabled gateway monitoring and this two checked options the firewall is working fine now. The problematic system's where we have multiple gw with load balancing. These are broken.

2020-11-23T14:25:02   dpinger[8525]   GW_WAN <redacted>: Clear latency 304us stddev 71us loss 5%   
2020-11-23T14:23:53   dpinger[57575]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 1 RTT: 271ms RTTd: 53ms Loss: 22%)   
2020-11-23T14:23:53   dpinger[8525]   GW_WAN <redacted>: Alarm latency 271us stddev 53us loss 22%   
2020-11-23T14:22:18   dpinger[29789]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 0 RTT: 2538ms RTTd: 16750ms Loss: 5%)   
2020-11-23T14:22:18   dpinger[8525]   GW_WAN <redacted>: Clear latency 2538us stddev 16750us loss 5%   
2020-11-23T14:21:21   dpinger[20904]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 1 RTT: 288ms RTTd: 53ms Loss: 22%)   
2020-11-23T14:21:21   dpinger[8525]   GW_WAN <redacted>: Alarm latency 288us stddev 53us loss 22%   
2020-11-05T17:14:27   dpinger[73947]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 0 RTT: 343ms RTTd: 274ms Loss: 5%)   
2020-11-05T17:14:27   dpinger[8525]   

In the meantime i did reinstall a simple 20.7 and skip any further update. Reenabled gw monitoring and revert all settings back to what was like. No issues. Upgraded to 20.7.7_1, issues again. So something is wrong either with the dpinger or the ethernet kernel module. Hope i can help.
Title: Re: 20.7.7_1 randomly killing tcp active states
Post by: mimugmail on December 21, 2020, 02:01:02 pm
The gateway log is full of this now:
(the gw is fine, i do a ping test all the time from my control machine, not a single packet loss or higher latency ping)
This was not happened with 20.7.6
Affected multiple opnsense in my company. The affected systems are: vmware VM with vmxnet, pcengine APU box with igb
Attached the screenshot, everything was unchecked except ipv6, now i checked the sticky connections and the disable the state killing feautre. With the disabled gateway monitoring and this two checked options the firewall is working fine now. The problematic system's where we have multiple gw with load balancing. These are broken.

2020-11-23T14:25:02   dpinger[8525]   GW_WAN <redacted>: Clear latency 304us stddev 71us loss 5%   
2020-11-23T14:23:53   dpinger[57575]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 1 RTT: 271ms RTTd: 53ms Loss: 22%)   
2020-11-23T14:23:53   dpinger[8525]   GW_WAN <redacted>: Alarm latency 271us stddev 53us loss 22%   
2020-11-23T14:22:18   dpinger[29789]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 0 RTT: 2538ms RTTd: 16750ms Loss: 5%)   
2020-11-23T14:22:18   dpinger[8525]   GW_WAN <redacted>: Clear latency 2538us stddev 16750us loss 5%   
2020-11-23T14:21:21   dpinger[20904]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 1 RTT: 288ms RTTd: 53ms Loss: 22%)   
2020-11-23T14:21:21   dpinger[8525]   GW_WAN <redacted>: Alarm latency 288us stddev 53us loss 22%   
2020-11-05T17:14:27   dpinger[73947]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 0 RTT: 343ms RTTd: 274ms Loss: 5%)   
2020-11-05T17:14:27   dpinger[8525]   

In the meantime i did reinstall a simple 20.7 and skip any further update. Reenabled gw monitoring and revert all settings back to what was like. No issues. Upgraded to 20.7.7_1, issues again. So something is wrong either with the dpinger or the ethernet kernel module. Hope i can help.

I had this one on a Fritzbox of provider too, in the end I let them replace the unit. Better ping an outside address?
Title: Re: 20.7.7_1 randomly killing tcp active states
Post by: Archanfel80 on December 21, 2020, 07:54:07 pm
The gateway log is full of this now:
(the gw is fine, i do a ping test all the time from my control machine, not a single packet loss or higher latency ping)
This was not happened with 20.7.6
Affected multiple opnsense in my company. The affected systems are: vmware VM with vmxnet, pcengine APU box with igb
Attached the screenshot, everything was unchecked except ipv6, now i checked the sticky connections and the disable the state killing feautre. With the disabled gateway monitoring and this two checked options the firewall is working fine now. The problematic system's where we have multiple gw with load balancing. These are broken.

2020-11-23T14:25:02   dpinger[8525]   GW_WAN <redacted>: Clear latency 304us stddev 71us loss 5%   
2020-11-23T14:23:53   dpinger[57575]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 1 RTT: 271ms RTTd: 53ms Loss: 22%)   
2020-11-23T14:23:53   dpinger[8525]   GW_WAN <redacted>: Alarm latency 271us stddev 53us loss 22%   
2020-11-23T14:22:18   dpinger[29789]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 0 RTT: 2538ms RTTd: 16750ms Loss: 5%)   
2020-11-23T14:22:18   dpinger[8525]   GW_WAN <redacted>: Clear latency 2538us stddev 16750us loss 5%   
2020-11-23T14:21:21   dpinger[20904]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 1 RTT: 288ms RTTd: 53ms Loss: 22%)   
2020-11-23T14:21:21   dpinger[8525]   GW_WAN <redacted>: Alarm latency 288us stddev 53us loss 22%   
2020-11-05T17:14:27   dpinger[73947]   GATEWAY ALARM: GW_WAN (Addr: <redacted> Alarm: 0 RTT: 343ms RTTd: 274ms Loss: 5%)   
2020-11-05T17:14:27   dpinger[8525]   

In the meantime i did reinstall a simple 20.7 and skip any further update. Reenabled gw monitoring and revert all settings back to what was like. No issues. Upgraded to 20.7.7_1, issues again. So something is wrong either with the dpinger or the ethernet kernel module. Hope i can help.

I had this one on a Fritzbox of provider too, in the end I let them replace the unit. Better ping an outside address?

Hi!

No ping latency problem from any computer behind the firewall. I setup a container for monitoring the gateway and the 1.1.1.1 for example. Ping values around 10-30ms all the time, no loss or higher values. Only the opnsense firewall sees high latency values.