Home Assistant websocket not working

Started by instantdreams, October 10, 2025, 07:10:13 PM

Previous topic - Next topic
I upgraded from a Netgear Orbi RBR850 to opnsense. I'm running 25.7.5 with the following plugins:
  • IGMP Proxy
  • mDNS Repeater
  • UDP Broadcast Relay
  • Universal Plug and Play

In an effort to incrementally introduce complexity whilst focusing on stabilising my configuration, I am trying to replicate my previous network setup.

I have two interfaces:
  • WAN - connects to a fibre ont
  • LAN - connects to my unmanaged switch

My LAN contains a few servers with static IPs including my services server which hosts Home Assistant. Prior to moving to opnsense, Home Assistant was working with no issues. Since the upgrade the service is restarting with the same errors:

WARNING (MainThread) [zigpy.application] Watchdog failure
WARNING (MainThread) [zigpy.backups] Failed to create a network backup
ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved (None)
WARNING (MainThread) [bellows.thread] Attempted to use a closed event loop
ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [139864590888416] Unexpected exception
WARNING (MainThread) [py.warnings] /usr/local/lib/python3.13/asyncio/base_events.py:2035: RuntimeWarning: coroutine 'ClusterHandler.async_initialize' was never awaited
WARNING (MainThread) [py.warnings] /usr/local/lib/python3.13/asyncio/base_events.py:2051: RuntimeWarning: coroutine 'Device.async_initialize' was never awaited
ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [139864590888416] Error during service call to light.turn_on: Failed to send request: ApplicationController is not running
WARNING (MainThread) [homeassistant.components.mqtt.client] Error returned from MQTT server: The connection was lost.
ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [139864590888416] Error during service call to light.turn_on: Failed to send request: ApplicationController is not running
WARNING (MainThread) [zha.decorators] [<Task pending name='sensor_state_poller_00:0d:6f:00:05:42:89:88-1-2820_PolledElectricalMeasurement' coro=<periodic.<locals>.scheduler.<locals>.wrapper() running at /usr/local/lib/python3.13/site-packages/zha/decorators.py:92> cb=[set.remove()]>] Failed to poll using method [zha.application.platforms.sensor::PollableSensor._refresh]

These appear to indicate that the service can't see the mqtt server or the websocket, which is causing other components to fail.

I'd like to troubleshoot this but I'm not sure the best way to start. I've started Packet Capture on the server ip and port for Home Assistant but nothing seems to be included.

OpnSense is not involved in the intra-LAN communication at all. Unless you use separated VLANs, that is.

What seems to happen here is that a connection to your MQTT server cannot be established. More often than not, the MQTT server has to be specified in configurations. If you have HomeAssistant configured via DHCP and now its IP is different, you will have to reconfigure the MQTT clients.
 
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Most people with HA get an MQTT server by installing the Mosquito add-on.  Be sure it's started, check it's visible, make sure you can ping from the HA instance to the Mosquito IP (should be the same, but check it pings) and see if the port is up, and as mentioned if it's connected via static IP make sure it didn't change.

MQTT Explorer can be helpful as you can connect independently to the MQTT server to make sure it's up.


If the MQTT service and HA are in the same VM or container and if said "thing" changed its IP address by the switch of the firewall, that might explain it.

Combining @meyergru's and @Linwood's comments 🙂
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Prior to migrating from my Netgear router to opnsense, my configuration was solid and tested. Here's how it is set up:
  • 192.168.1.93:1883 - MQTT
  • 192.168.1.93:5000 - MQTT UI
  • 192.168.1.93:8123 - Home Assistant
  • 192.168.1.93:1880 - Node-Red

Home Assistant has an integration to MQTT using 192.168.1.93:1883

It's been stable for over 3 years. Shifting to opnsense has caused the errors in Node-Red and Home Assistant. The ip addresses have not changed. MQTT UI shows traffic from zwave-js-ui, frigate, teslamate, doubletake, and homeassistant.

Any suggestions on how I could troubleshoot the issue using the diagnostic tools in opnsense?

services on same ip but different port. OK.
So have you verified the same server definitively has the same address and can be reached? I'm thinking along the lines of the dhcp server setup on OPN. Usual dig / ping / telnet (to open session) would be a start.

Quote from: instantdreams on October 13, 2025, 05:17:23 PMAny suggestions on how I could troubleshoot the issue using the diagnostic tools in opnsense?

At first glance, data between HA, NR and Mosquito would appear to be all local, on-net (i.e. same subnet, same VLAN) and so does not even pass through OPNsense.

@cookiemonster's question is good, but unless I have just missed a clue, everything you are saying implies OPNsense is not involved.  Can you go through your setup and think about it and what you changed and share any theory of even how OPNsense plays any role in the setup described?

For example, you have IP addresses -- is ANYTHING using DNS names, and maybe OPNsense is now the DNS server and different?

Is there any chance around the same time that something changed on the host -- is this HAOS?  Or if it's housed in your own linux box, did something like apparmour change?

Please don't take this wrongly, but so far it's kind of like saying "I turned on the back yard light and my toilet overflowed, what's wrong with my light".  :)

You have to find the logical connection between the two, then I think people can help debug what's wrong with that aspect.

Yes I get the local-only traffic and therein lies the question. Verify it still goes and gets where it is -supposed- to be.
My thinking is the OP has his host with a static ip on the old router. Now he has OPN as the new router but although traffic is not through it, the host(s) still require the new router to dish out their ip addresses, static reserver or dynamic.
Now lately with isc dhcp to dnsmasq transition it might not be yet setup correctly to have dhcp reservations. Hence I am suggesting to check that basic.

@cookiemonster Services on the same server but different port is relatively standard and I am confirming that the same ip address and ports are accessible from the network prior to moving to opnsense and after stabilising with opnsense.

@Linwood I made sure everything uses ip addresses, but I will check that to confirm. Everything is in a docker container, and exposed via host ip and port. I appreciate your thought process, when you are inside a bug it's hard to step back.

I have 5 servers running Debian, each of which uses a static IP:
  • edge1 - 192.168.1.91
  • edge2 - 192.168.1.92
  • services - 192.168.1.93
  • security - 192.168.1.94
  • media - 192.168.1.95

The routers (Netgear or opnsense) have never needed to issue an ip address, and do recognise the servers on the network.

opnsense is using unbound and dnsmasq for DNS and DHCP. I've tried to change this to using my two pi-holes but it fails each time, that's a separate issue.

opnsense is configured to:
  • use domain home.arpa
  • have no specified dns servers
  • serve the web gui on port 8443

Unbound is configured to:
  • Be Enabled
  • Override example.com to 192.168.1.91 where traefik will reverse proxy requests

dnsmasq is configured to:
  • Be Enabled
  • Listen on LAN
  • DHCP FQDN
  • DHCP local domain
  • DHCP register firewall rules
  • Register the 5 servers under Hosts
  • Server ip addresses in the range 192.168.1.100 to 192.168.1.245

Here's how dig resolves for the hostnames:

$ docker exec homeassistant dig +noall +answer homeassistant.example.com
homeassistant.example.com. 3600 IN A      192.168.1.91

$ docker exec homeassistant dig +noall +answer mqtt.example.com
mqtt.example.com. 3600    IN      A       192.168.1.91

$ dig +noall +answer homeassistant.example.com
homeassistant.example.com. 3600 IN A      192.168.1.91

$ dig +noall +answer mqtt.example.com
mqtt.example.com. 3600    IN      A       192.168.1.91

Here's how nc feels about the ports:

$ nc -vz 192.168.1.93 1883
services.example.com [192.168.1.93] 1883 (?) open

$ nc -vz 192.168.1.93 8123
services.example.com [192.168.1.93] 8123 (?) open

$ nc -vz 192.168.1.93 5000
services.example.com [192.168.1.93] 5000 (?) open

$ nc -vz 192.168.1.93 1880
services.example.com [192.168.1.93] 1880 (?) open

$ docker exec homeassistant nc -vz 192.168.1.93 1883
192.168.1.93 (192.168.1.93:1883) open

$ docker exec homeassistant nc -vz 192.168.1.93 8123
192.168.1.93 (192.168.1.93:8123) open

$ docker exec homeassistant nc -vz 192.168.1.93 5000
192.168.1.93 (192.168.1.93:5000) open

$ docker exec homeassistant nc -vz 192.168.1.93 1880
192.168.1.93 (192.168.1.93:1880) open

This is a real puzzler.

October 14, 2025, 10:51:33 PM #9 Last Edit: October 14, 2025, 10:55:03 PM by meyergru
It now sure looks like a DNS problem. If you configured the MQTT clients to DNS names instead of IPs, then it would be clear.

Also: why do all of the server names resolve to 192.168.1.91 but then you check for the open ports on another server (192.168.1.93)?

Keep in mind that resolution of local, unqualified names has its quirks, like if some clients add a search domain to names and others do not. Thus, a request for "mqtt" might result in "mqtt.example.com" on one machine, but just "mqtt" on another (and one might fail). Also, you seem to mix example.com and home.arpa within your network.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

Quote@cookiemonster Services on the same server but different port is relatively standard and I am confirming that the same ip address and ports are accessible from the network prior to moving to opnsense and after stabilising with opnsense.
Yes it is pretty standard. I wasn't saying otherwise ;)
Network connectivity at ip level seems OK then. And it has been established that they are on the same network segment (and same host). By DNS is another matter so we might need to diagnose that. No routing required of course. The basic tests I was thinking you have now accomplished so I'm leaning on the application side now.

BTW if you have a flat network, may I ask why are you using those plugins which are normally to relay broadcast traffic between networks? Unrelated of course, just in case it shines some strange light.

I apologize for being dense but do you have an example of the actual problem occurring?

For example, if Home Assistant is supposed to connect to the MQTT server, do you have a log of that failing that shows HOW it connects?    Like from the integration page, showing it uses IP (vs name)?

If MQTT connection is the problem AND you are using static IP, we can stop talking about DNS.

Alternatively, run MQTT Explorer and enter the IP and credentials, e.g. as below, and use explicit IP addresses from a PC on that same network, and see if it can connect.  If it can't -- easy to debug.  If it can, see what's different about HA.

I really suggest using explicit IP addresses and not names if this is all internal and on the same network, as that takes mDNS and DNS out of the picture.




@meyergru All the endpoints are using ip addresses to avoid any DNS confusion, which leads me to assume the issue is a firewall one rather than a DNS one. The hostnames resolve to 192.168.1.91 because of the override in Unbound DNS which forwards all requests for example.com to my Traefik reverse proxy. To be clear, all inter-machine communication uses ip addresses and ports. There are no FQDNs or hostnames.

@cookiemonster I am using all multicast DNS plugins because without them Sonos wasn't working. I plan to slowly remove them and test to confirm what I actually need. I currently have:
  • A set of floating rules to allow SSDP, mDNS, GDM, Plex, Sonos, Spotify, and Windows Sharing
  • IGMP Proxy between WAN and LAN for all internal ip ranges
  • mDNS Repeater between WAN and LAN
  • UDP Broadcast Relay for SSDP, mDNS, GDM, and Sonos
  • Universal Plug and Play allowing all UPnP IGD and NAT-PMP mappings by default

My goal is to replicate what I had with my commercial router before hardening things.

@Linwood A very good question, let me define the behaviour I am experiencing and hopefully it'll add clarity.

Issue
Web connection to Home Assistant fails every 40-60 seconds. Logs indicate a websocket issue.

Host
Intel NUC7CJYHN 16GB RAM 500GB SSD

Services
ServiceIP and PortMQTT TopicHA ConnectionLog IssuesPurpose
homeassistant192.168.1.93:8123homeassistant 192.168.1.93:1883n/awebsocket relatedhome automation hub
mqtt192.168.1.93:1883n/aany 192.168.1.93:1883nonemessage broker
node-red192.168.1.93:1880192.168.1.93:1883192.168.1.93:8123nonevisual automation engine
zigbee2mqtt192.168.1.93:8321zigbee2mqtt 192.168.1.93:1883mqtt discoverynonezigbee coordinator
zwave-js-ui192.168.1.93:8091zwave 192.168.1.93:1883websocket 192.168.1.93:3000nonezwave coordinator

All services are run in docker containers. All connections use host IP addresses and ports.

Behaviour
Since migrating from my Netgear router to opnsense, home assistant has been unstable. The ui will become unresponsive and restart multiple times.

Initial errors in the log files indicated timeouts, usually with the Zigbee Home Automation service.

homeassistant  | 2025-10-13T14:15:59.106199110Z bellows.ash.NcpFailure: NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT␛[0m
homeassistant  | 2025-10-13T14:15:59.229743107Z ␛[31m2025-10-13 08:15:59.184 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140318389399968] Unexpected exception
homeassistant  | 2025-10-13T14:15:59.231156264Z bellows.ash.NcpFailure: NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT␛[0m
homeassistant  | 2025-10-13T14:15:59.465955299Z ␛[31m2025-10-13 08:15:59.464 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140318389399968] Error during service call to light.turn_on: Failed to send request: ApplicationController is not running␛[0m
homeassistant  | 2025-10-13T14:17:52.757669048Z ␛[33m2025-10-13 08:17:52.751 WARNING (MainThread) [homeassistant.components.media_player] Updating webostv media_player took longer than the scheduled update interval 0:00:10␛[0m
homeassistant  | 2025-10-13T14:17:52.769933176Z ␛[33m2025-10-13 08:17:52.751 WARNING (MainThread) [homeassistant.helpers.entity] Update of media_player.lg_webos_tv_bedroom1 is taking over 10 seconds␛[0m
homeassistant  | 2025-10-13T14:17:52.919683463Z ␛[31m2025-10-13 08:17:52.910 ERROR (MainThread) [homeassistant.components.tautulli] Error fetching tautulli data: Request timeout for 'http://192.168.1.95:8282/api/v2?apikey=[REDACTED_API_TOKEN]&cmd=get_home_stats'␛[0m
homeassistant  | 2025-10-13T14:17:52.937803502Z ␛[33m2025-10-13 08:17:52.931 WARNING (MainThread) [zigpy.application] Watchdog failure
homeassistant  | 2025-10-13T14:17:53.302102073Z ␛[31m2025-10-13 08:17:53.301 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140318389399968] Error during service call to light.turn_on: Failed to send request: ApplicationController is not running␛[0m


Troubleshooting
In an effort to resolve this, the following steps were taken:
  • Add Firewall Floating Rule for ports 8123 and 1883 - no obvious impact
  • Migrate from Zigbee Home Assistant to zigbee2mqqt - reduced errors in Home Assistant but did not change behaviour
  • Change version of Home Assistant from latest to 2025.9 - no obvious impact


Current State
I am running Home Assistant 2025.9 in host networking mode. The web ui will work for 30-50 seconds then become unresponsive. The message "Connection lost. Reconnecting..." appears on the UI and 20-30 seconds later the page is responsive again.

But how can it be a firewall issue when the traffic is local on the LAN and never passes OpnSense?
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

I'm a network engineer. When I am out of ideas I pull the big guns. I.e. a packet trace.

Do a packet trace on OPNsense and try to find evidence for any of that traffic even leaving your HA host. If that evidence is found, then try to find out *why*.

Traffic from a host to its own IP address - even if that IP address is bound to a physical interface - is routed through the loopback IF. It should never be seen on the wire. So watch the wire. Proceed from what you find (or don't find).
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)