Loss of DNS capabilities after powercycle

Started by 9axqe, January 09, 2025, 02:30:38 PM

Previous topic - Next topic


Today I had a power outage and upon restarting my internet connection was unusable due to this chain of events:

  • Symptom: Only ping from a client on the LAN to Internet was working, that's about it, DNS was not for example. I could ping 1.1.1.1 form any client on the LAN, as an example.
  • The local time on the opnsense router was wrong (multiple years off somehow), NTP was not synchronised.
  • NTP could not synchronise because it could not resolve DNS
  • DNS could not be resolved because AdGuard could not resolve anything public (local DNS entries, "DNS rewrites", were working).
  • the upstream server for AdGuard was Unbound and unbound was unable to resolve anything it seems, there was no DNS lookups on the packet capture of the WAN interface somehow, the request didn't even leave opnsense. I wonder if that was because time was completely wrong on the router (it was suddenly back in May 2022 somehow...).

I did a DNS lookup from the WAN intf (Interfaces > Diagnostics) and that worked fine. But DNS lookup from any client or from any service (AdGuardHome, NTP) seemed to all fail.

I added a public DNS under system > settings > general and the issue went away, once NTP had resynchronised.

I am certain this issue did not exist in the past, anyone else has observed this?

The worst part was that everything was that a lot of services (including DHCP) were very unstable, probably due to the time being totally wrong.

Does the fact it was a power outage and not a controlled reboot has any implications? I regularly restart for updates, but a real cold power cycle, I had not done in many months, maybe a year – hence the question.

In hindsight, using the local DNS service might not have been the smartest idea in terms of resilient design, but I would still like to understand if anyone can make a theory as to what happened.

> The local time on the opnsense router was wrong (multiple years off somehow), NTP was not synchronised.
You might want to test the CMOS battery. The chain of events might have been prevented if the time was kept for the time it was powered off.