Unbound won't start from GUI after 21.7.1 upgrade (from 21.7)

Started by Wendo, August 06, 2021, 11:05:53 PM

Previous topic - Next topic
thanks) I hope this helps with this strange unbound-control behavior and then i can make a pr

Quote from: Fright on October 07, 2021, 01:45:45 PM
@glasi
sounds logical imho
can you please test one possible solution?

Thanks for the patch. But, please don't do that. It's just another workaround. We should solve it by just resetting states of the WAN interface when the dynamic ip address changes. The topic was already discussed here.

Currently, I am experimenting with resetting states of expired/outdated PPPoE WAN IP only. This needs some more time and testing.
OPNsense 24.7.11_2-amd64

@schnipp
can't agree. imho this is not really a workaround. this, in principle, preserves localhost connections when manipulating states.

off-topic: if you want to reset the states by a specific interface, then imho this can be done with the command:
pfctl -i <ifname> -Fs
or by ip as it is already done in rc.newwanip by default
but @glasi mentioned an "Reset all states when a dynamic IP address changes" option that is explicitly listed as resetting all states. so I just suggest excluding localhost from "all" )

Quote from: Fright on October 08, 2021, 12:36:58 PM
@schnipp
[...] imho this is not really a workaround.
[...]

Of course it is. You try to solve a problem which neccessarily does not exist. The state reset function was implemented in the past due to my discussion referenced in #16. The original problem was that in case the dynamic WAN IP changed the NAPT table was not updated and the source IPs of known connections where still translated to one which became invalid. This kind of traffic has been correctly filtered by the ISP and resulted in broken communication to the internet as long as an corresponding NAPT table entry exists. This problem can be solved by deleting all entries in the NAPT table belonging to the expired public IP address.

Currently, the state reset is improperly implemented because it resets also the states of internal connections which is not needed.
OPNsense 24.7.11_2-amd64

Let's please discuss patch vs. patch, not patch vs. opinion.


Thanks,
Franco

@franco
thanks)
@schnipp
(sorry, i'm not even sure (or rather convinced) that playing with states is the right choice for solving sip problems. perhaps it would be enough to adapt the rules parameters and pbx trunk settings. we could continue the conversation on https://forum.opnsense.org/index.php?topic=8766.0, but my reasoning will remain at the level of theory, since i have nowhere to test them in practice (there is no suitable environment))

I've tested the patch. So far the patch works and unbound does not hang on cache-load command any longer.
However, I'm not sure if we get any kind of side effects with this patch.

Interestingly, during my testing I figured out that I don't need the setting "Reset all states when a dynamic IP address changes" any longer. Historically, I had enabled this option to avoid any stale states which would lead to problems with my VoIP setup. I completely missed out that since OPNsense 21.1 WAN IP address changes are detected by rc.newwanip script and that states of the outdated IP will be removed from the state stable.

Regarding state killing I would like to add some more findings and suggestions in this thread https://forum.opnsense.org/index.php?topic=8766.0.

QuoteI've tested the patch. So far the patch works and unbound does not hang on cache-load command any longer.
glad it works
QuoteHowever, I'm not sure if we get any kind of side effects with this patch
actually i would try to replace this rule with:
set skip on { lo0 }
can't think of any side effects from this yet.
just save resources on rules evaluations\states lookup and preserving internal communications when "all" states are reset.
@franco, what do you say to that?)
QuoteRegarding state killing I would like to add some more findings
it would be interesting

Quote from: Fright on October 08, 2021, 07:20:17 PM
(sorry, i'm not even sure (or rather convinced) that playing with states is the right choice for solving sip problems. perhaps it would be enough to adapt the rules parameters and pbx trunk settings.

State reset has nothing to do with SIP. The connection problems ragarding SIP reported by my Fritzbox were only the trigger to start investigation. The issue itself resides at OSI Layer 3 and 4, thus all protocols on top of the transport layer are affected. The impact on classic HTTP connections is mainly unnoticed because such connections are often short lived due to omitted keep-alive during request-reply communication. Furthermore, in case of a timeout web browsers initiate new TCP connections which does not hit an invalid NAPT table entry because of dynamic source port selection.


Quote from: Fright on October 08, 2021, 07:20:17 PM
we could continue the conversation on https://forum.opnsense.org/index.php?topic=8766.0, but my reasoning will remain at the level of theory, since i have nowhere to test them in practice (there is no suitable environment))

In my eyes we should start a new thread regarding the state reset discussion, but I'll have a look at it.


Quote from: franco on October 08, 2021, 04:11:06 PM
Let's please discuss patch vs. patch, not patch vs. opinion.

Of course, I fully agree.  :)
OPNsense 24.7.11_2-amd64

Loopback rule was added for Squid which does IPv6 loopback communication with itself, which was broken by IPv6 block rule setting. Wether state is tracked or not is hardly relevant. It might only indicate the unbound-control is not fully capable of recovering...


Cheers,
Franco

@franco
Quoteunbound-control is not fully capable of recovering
looks like that. I just can't figure out why this should hangs up the unbound itself...

so what's the verdict?)
-move system_hosts_generate() call to the end of rc.newwanip?
-make "pass loopback" stateless?
-switch to "set skip on { lo0 }"?
-get rid of "ip_change_kill_states"?  ;)

Quote from: Fright on October 10, 2021, 07:21:57 PM
@franco
Quoteunbound-control is not fully capable of recovering
looks like that. I just can't figure out why this should hangs up the unbound itself...

It's only my assumption. It looks like unbound-control is faulty and has secondarily disclosed a bug in opnsense (as we are discussing). Normally, the tcp socket of unbound-control should run into a timeout. In this case the function of the caller either returns with an error code or the process gets singnaled by the kernel when the calling function is a blocking one. But unbound-control seems to hang infinitely.
OPNsense 24.7.11_2-amd64

Quote from: schnipp on October 10, 2021, 11:19:06 AM
State reset has nothing to do with SIP. The connection problems ragarding SIP reported by my Fritzbox were only the trigger to start investigation. The issue itself resides at OSI Layer 3 and 4, thus all protocols on top of the transport layer are affected. The impact on classic HTTP connections is mainly unnoticed because such connections are often short lived due to omitted keep-alive during request-reply communication.
The last sentence is wrong.
According to HTTP RFC, there is no need to specify "keep-alive" for HTTP/1.1 clients. Connection are always "keep-alive" unless marked with "close".
See https://datatracker.ietf.org/doc/html/rfc7230#section-6.3
HTTP/2 and later use another technique, but the result is the same: connections are persistent.

Quote from: karlson2k on October 10, 2021, 09:00:44 PM
The last sentence is wrong.
According to HTTP RFC, there is no need to specify "keep-alive" for HTTP/1.1 clients. Connection are always "keep-alive" unless marked with "close".

Yes, it looks like there has been a change in the default value of this option. Thanks for this information. But, your argumentation is a little petty. The server still determines how long the connection is kept open. Modern Apache servers have a default value of 5 seconds. I did some tests (except the big CDN, and most of these sites closed the connection after 5 till 25 seconds of inactivity). Thus, this is still "short lived"  :)
OPNsense 24.7.11_2-amd64

Quote from: schnipp on October 12, 2021, 06:27:08 PM
Quote from: karlson2k on October 10, 2021, 09:00:44 PM
The last sentence is wrong.
According to HTTP RFC, there is no need to specify "keep-alive" for HTTP/1.1 clients. Connection are always "keep-alive" unless marked with "close".

Yes, it looks like there has been a change in the default value of this option. Thanks for this information. But, your argumentation is a little petty. The server still determines how long the connection is kept open. Modern Apache servers have a default value of 5 seconds. I did some tests (except the big CDN, and most of these sites closed the connection after 5 till 25 seconds of inactivity). Thus, this is still "short lived"  :)
My note was only about HTTP defaults.
HTTP connections could be long-lived (if you downloading something huge) or short-lived (if you just open a web-page).