Some Chinese sites not returning packets when accessed via OPNsense [WAN].

Started by raybies, April 23, 2025, 07:55:58 AM

Previous topic - Next topic
During the 2nd part, the machine (same IP as in part 1) was added to the alias containing hosts that are directed to your external WGD VPN provider?
Apart from validating that this machine can access that site, I'm not sure what I can do with that.
It would have been more straightforward to connect that machine to your edge router (bypassing OPN).

The capture from part 1 indicates a complete connectivity failure. That connection attempt is either outright blocked, or maybe it gets routed into a void.
I can't reconcile this with the previous capture (reply #8) on WAN.
The WAN side indicated several roundtrips to the target website (3-way handshake, client then server hello, key exchange and a HTTPS roundtrip, connection close).

I don't really care to see the DNS lookups. We know the IP by now.
In an attempt to connect the dots, for the part 1 use case (not going thru the VPN provider), please provide a WAN and LAN capture filtered to the target IP for that curl command.

Can you pls give me a full list of tests you would like me to perform, how you would like them filtered and the expected format?

One packet capture on OPN, with WAN and LAN selected, filtered by target 'host address' IP.
While that's going on, a single curl invocation. Then download the capture (it should contain 2 cap files and a json).
It's similar to what you did in reply #8, apart from the fact that we'd see both interfaces for the exact same exchange, not just WAN.
The filtering is for your privacy and reduces the size of the file. We don't need to see other traffic.

Again, the capture is #8 looked normal to me, but it makes no sense when compared with the Wireshark cap on the host.
Having the LAN side too, for a single exchange, would eliminate discrepancies due to looking at discrete exchanges that may have happened in different conditions.

Sorry for the late reply—life got in the way.
I have some more info and the requested capture.
I have an integration on Home Assistant that connects to dessmonitor, previously it worked, then it stopped working, in this capture it's working. The inability to get HA to connect to dessmonitor is what pushed me to reinstall OPNsense on a new machine. I disabled the integration, but the clients still can't connect to dessmonitor.

In the attached OPNsense capture, you can see the HA server 10.0.5.100, which is running as a VM on a Proxmox host, and the Windows client (10.0.1.15) from which I ran...
C:\Users\Admin>curl -v https://www.dessmonitor.com
* Host www.dessmonitor.com:443 was resolved.
* IPv6: (none)
* IPv4: 8.210.123.202
*  Trying 8.210.123.202:443...
* connect to 8.210.123.202 port 443 from 0.0.0.0 port 61689 failed: Timed out
* Failed to connect to www.dessmonitor.com port 443 after 21611 ms: Couldn't connect to server
* Closing connection
curl: (28) Failed to connect to www.dessmonitor.com port 443 after 21611 ms: Couldn't connect to server

There are no aliases, rules or anything in opnsense to distinguish the 2 IPs.
Non captured after disabling dessmonitor integration.

The proxmox server running HA:
root@proxmox:~# curl -v https://www.dessmonitor.com
*   Trying 8.210.123.202:443...
* connect to 8.210.123.202 port 443 failed: Connection timed out
* Failed to connect to www.dessmonitor.com port 443 after 136454 ms: Couldn't connect to server
* Closing connection 0
curl: (28) Failed to connect to www.dessmonitor.com port 443 after 136454 ms: Couldn't connect to server


An ubuntu 24.4 server:
root@a51m:~# curl -v https://www.dessmonitor.com
* Host www.dessmonitor.com:443 was resolved.
* IPv6: (none)
* IPv4: 8.210.123.202
*   Trying 8.210.123.202:443...
* connect to 8.210.123.202 port 443 from 10.0.1.51 port 56914 failed: Connection timed out
* Failed to connect to www.dessmonitor.com port 443 after 133075 ms: Couldn't connect to server
* Closing connection
curl: (28) Failed to connect to www.dessmonitor.com port 443 after 133075 ms: Couldn't connect to server

I'm going to paraphrase to make sure I get it.
Something in HA is making background requests to dessmonitor (about 4 in the 30 seconds span of the capture).
They succeeded here but they don't always.
FWIW, the requests appear to be targeted at web.dessmonitor.com (not www) although both FQDNs resolve to the same IP in your case.

While this is ongoing, you invoked curl on the www address (twice apparently, 4secs apart) from another machine on the same LAN interface.
Both failed miserably.
Question 1: what's the interface IP configuration looking like?

Question 2: what do you know about this HA "integration" with dessmonitor?
I ask because I observe 2 small differences between the SYN packets TCP options:
* Window Scale is 7 coming from HA, 8 coming from Windows
* The HA frame features a Timestamps option and is thus 8 bytes larger

This is way too low level for me to tell if it is significant.
Maybe I'll do some research at some point (but definitely not tonight).

Another thought, given usage pattern here, is the server limiting connections per IP.
That would be from the server's perspective, so even stopping the client may not be sufficient if the connections are not closed properly (even though they should eventually timeout).
And you're behind another router too so even that router could do something unbecoming with too many connections with same source (OPN) and same destination (dess).


Quote from: raybies on April 23, 2025, 07:55:58 AMI'm running into a frustrating issue where certain websites, primarily smaller sites hosted in China (e.g., www.dessmonitor.com IP: 8.210.123.202), are inaccessible when routing through my OPNsense firewall, while major international sites work fine.

Are they accessible if you use something different for a router? For example, pfSense (not that it would be different) or any Linux router or even a Windows connection-sharing box?

Are they accessible from the router itself if you log into SSH console and curl/wget/lynx from there?

I considered the endpoint throttling or even blocking IPs via some DDoS algo, but I've changed ISP (completely different IP range), and I've never had an issue with anything upstream of opnsense.
I've disabled all the security features of the primary router with no change.
I don't think I've changed anything on either WAN or LAN since the last reinstall.
Here's LAN.


Integration is OS so I know *everything*. Code: https://github.com/Antoxa1081/home-assistant-dess-monitor/blob/main/custom_components/dess_monitor/api/__init__.py


From opnsense shell, yes it accessible... also note that it has been accessible briefly on LAN via WAN opnsense I *think* it started failing after importing DHCP address but not sure, but that's the first thing I do or I can't access any of my devices. It still works fine LAN via WireGuard. I haven't tried pfSense recently, but I wouldn't go back.
Also note that previously for ~ 1 year probably all on 24.x to 24.7 lots of Chinese websites (like machinery products), and only Chinese websites were exhibiting this behavior. I didn't investigate too deeply and assumed it was just the Unbound BL, but then I disabled the block lists and they still weren't accessible so the resolution climbed my to do list.

root@OPNsense:~ # curl -v https://www.dessmonitor.com
* Host www.dessmonitor.com:443 was resolved.
* IPv6: (none)
* IPv4: 8.210.123.202
*   Trying 8.210.123.202:443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 / prime256v1 / rsaEncryption
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: CN=*.dessmonitor.com
*  start date: Jun  2 00:00:00 2024 GMT
*  expire date: Jul  3 23:59:59 2025 GMT
*  subjectAltName: host "www.dessmonitor.com" matched cert's "*.dessmonitor.com"
*  issuer: C=US; O=DigiCert, Inc.; CN=RapidSSL Global TLS RSA4096 SHA256 2022 CA1
*  SSL certificate verify ok.
*   Certificate level 0: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 1: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (2048/112 Bits/secBits), signed using sha1WithRSAEncryption
* Connected to www.dessmonitor.com (8.210.123.202) port 443
* using HTTP/1.x
> GET / HTTP/1.1
> Host: www.dessmonitor.com
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
< Server: nginx/1.9.6

Nothing to add on the LAN side. Looks good.

I took a quick look at the TCP options that differ.
They revolve around some high-performance extensions.
Likely harmless, unless the edge router chokes on one of them (the wikipedia article about TCP window scaling mentions this possible side-effect):
https://en.wikipedia.org/wiki/TCP_window_scale_option
OTOH, you might have indicated that curl from W11 connected directly to the edge router works (too lazy to re-read).

In any case, I don't see how OPN is at fault here.
The LAN-WAN captures indicate a faithful reproduction of the traffic coming from the client.
Clearly, in some cases, the SYN frame is unanswered. In both cases (source HA or W11), the WAN frame is sent to an ASUS device (edge router I guess).
What happens from there on is unknown.
The frame could be dropped, not retransmitted from edge router WAN, lost on its way to the server, dropped by the server, ...

You might be able to follow that traffic a little further in your infrastructure if your gear has that capability.
That's the most likely source of issues. It's fully up to date, right?

When using Wireguard (to 3rd party server), all the traffic is encapsulated from OPN to the VPN server, so your edge router is not seeing the connection...
It's forwarded in UDP packets all the way to the VPN server and the connection is established from there.

I took a look at that code. They are not shy with the number of roundtrips.
I don't know how many devices you have but I now have an explanation for the repeated calls (from HA).
They weren't from chasing links in a web page, but merely a slew of async requests (get device list, then 4 requests per device).
It makes it unlikely there's throttling on the server IMO.

Still have no idea how to resolve this apart from bypassing opnsense [LAN] > [WAN].

The only thing left for you to investigate is what happens to that SYN frame (from W11) when it reaches the edge router.

I don't know if your edge router (ASUS?) allows packet capture.
If it can't maybe you have a switch that supports port mirroring (to mirror router WAN to a PC with WireShark).
Or maybe you have a spare machine on which you can run OPN as a filtering bridge (only long enough to capture).

If you don't see the frame out on edge_router WAN, you know where the problem is...
Otherwise, you would look for a reply SYN_ACK. If you don't get it, the problem is between your edge router and the web server.

In any case, the top-level page appears to be the default nginx page.
The integration code shows a proprietary authentication (some application-level with hardcoded string, some user-level based on creds your problem supplied to HA).
Is there more to this web site/service?
What are you actually trying to do?