OPNsense Forum

Archive => 17.7 Legacy Series => Topic started by: magic31 on August 02, 2017, 10:40:00 pm

Title: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 02, 2017, 10:40:00 pm
Hi All,

First post here, so please be gentle :)

Have been running OPNsense for a year now. Very pleased with it and all of the developments going on to further improve and enhance.

An issue I'm running into is with DNS resolving. I'm primarily still using the Dnsmasq DNS service, but I've also tried running Unbound DNS instead, which has the same issue.

What happens is that all is fine and dandy when booting up. And all runs as should upto 30 to 60 minutes.
After that, DNS resolving stops working on all interfaces (I have three interfaces setup for WAN LAN and DMZ).

Strange thing is that I can still make a connection to the DNS port (nc from a client machine returns a successful connection), but trying to resolve an address results in a timeout.. even for locally defined/overridden records.

Running a resolve from the interfaces diag section in the OPNsense webadmin interface also then fails.

Restarting the DNS service does not fix it... restarting the box does.

Pinging from DMZ to LAN (from server/client devices) still works (I have allowed ICMP trafic in the rules to troubleshoot)... so network flow seems OK.

Have looked in different logs... but have not found any errors or messages there that are related.

Also, the management interface shows all services as running.



How can I best troubleshoot this?

Thanks,
 Willem
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: franco on August 03, 2017, 07:25:12 am
Hi Willem,

Are you using IDS+IPS?

If yes I think you are seeing this, we are still a bit puzzled what happens, but a downgrade to Suricata 3.2.2 may be a viable workaround in the meantime:

https://forum.opnsense.org/index.php?topic=5605.0


Cheers,
Franco
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 03, 2017, 09:31:35 am
Hi Franco,

Thanks for you quick reply!

Yes, IDS+IPS are enabled.  I'll give that downgrade a try (and if that does not help see what happens when IDS+IPS are disabled altogether) and let you know.

Thanks,
  Willem
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 03, 2017, 09:08:42 pm
Yep... definitely IPS/IDS that was causing this here too. Disabled that for now as I could not quickly find a way to downgrade. I'll have a look at that later but wanted to confirm here first.


UPDATE: Ok sorry... I concluded that to soon.  IPS and IDS are disabled, firewall rebooted.... and after some time running, DNS resolving still stops, whilst service reports being up.

Restarting DNS(forwarder) does not resolve it... reboot is needed.

So there seems to be something else going on here...
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 04, 2017, 11:43:18 am
Still puzzled why this is happening. For now I have put a very ugly workaround in place (snapshot reset+boot of the VM every 30 mins, which causes a 20 second hickup).

I'll be install a fresh 17.7 and rebuild config to see if the problem also appears there.

In the meantime, to further troubleshoot this... are there any specific logs that might give hints as to what's causing this on my machine? Can't see anything particular when looking through the logs in /var/log.

Open to tools, tips, suggestions :)

Thanks,
  Willem
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: Solaris17 on August 05, 2017, 04:09:21 am
 /sub

this is happening to me too EXACTLY and I also run in hyper-V
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: franco on August 06, 2017, 12:00:15 am
Hi guys,

Does this patch[1] help?

# opnsense-patch 051e44ca


Cheers,
Franco

--
[1] https://github.com/opnsense/core/commit/051e44ca
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 07, 2017, 06:45:53 pm
Hi Franco,

Will try the patch and let you know.

Haven't had too much time to track the issue down... but from what I've been able to troubleshoot so far it looks as if OPNsense losses it's route to the gateway it should use to get to internet.

On the WAN side, this setup has two gateways... OPNsense is one (inbound) and the internet modem that's in that same subnet the second.

When DNS resolving fails, there are also connectivity issues to the outside world.
Traffic flow within LAN/WAN and two DMZ's that are defined seems fine.

Have not had the time to test and diagnose extensively... but maybe it could also be related to the gateway online check?  I've disabled that for now...

I'll let you guys know the outcome of that patch.

Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 07, 2017, 07:42:59 pm
One more thing (have not applied the patch yet)....


...never mind.  thought I had found something, but no.  Going to apply the patch now.
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 09, 2017, 12:47:40 pm

Guess there are some different issues going on with the 17.7 release (meaning the IPS+IDS thing and the posts on DNS resolving).

I think I've made some progress. From https://forum.opnsense.org/index.php?topic=5615.0, I saw a suggestion to try to disable reply-to.

Quote
Try setting Firewall: Settings: Advanced: [ x ] disable reply-to. 

Also, I have disabled IPv6 altogether as I'm not using that in any of the subnets that OPNsense is connected to.

So far those two changes seems to have stabilized matters. Uptime of the firewall is currently 1:48 and still working as it should...

Will be keeping an eye on this ofc... and will let you know.

Thanks,
 Willem
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: magic31 on August 09, 2017, 03:59:10 pm
Hi Franco,

Hi guys,

Does this patch[1] help?

# opnsense-patch 051e44ca


just to let you know... that patch had no effect on the issue I was(and maybe still am) seeing.

Cheers,
  Willem
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: Noctur on August 30, 2017, 05:27:00 pm
Seeing similar issues - all normal for a few hours, then loss of connectivity. Rebooting or restarting Unbound DNS restores connectivity. This apparently has been going on for several versions - having to intermittently reboot to reestablish connection. Note that the dashboard is green across all services when this happens. It seems also to have become more frequent with VPN enabled full-time.

Trying the 'reply-to' option noted above. Have not tried the patch above yet.

System 17.7, Suricata 4 with IPS/IDS enabled, Unbound DNS, Nord VPN with US Nord DNS server only, Comcast ISP.
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: franco on August 30, 2017, 06:23:43 pm
Try the older suricata (3.2.3 and 4.0.0 seem to cause this in IPS mode only), I think that was the current consensus, but we haven't pinned down the actual issue:

# pkg add -f https://pkg.opnsense.org/snapshots/suricata-3.2.2.txz


Cheers,
Franco
Title: Re: Strange issue with DNS (resolving fails after 30~60 mins uptime)
Post by: Noctur on August 31, 2017, 05:03:53 am
Thank you... trying now.

edit: upgraded to 17.7.1_2 and reverted to Suricata 3.2.2

Testing...