Some Chinese sites not returning packets when accessed via OPNsense.

Started by raybies, April 23, 2025, 07:55:58 AM

Previous topic - Next topic
Hi all,

I'm running into a frustrating issue where certain websites, primarily smaller sites hosted in China (e.g., www.dessmonitor.com IP: 8.210.123.202), are inaccessible when routing through my OPNsense firewall, while major international sites work fine.

Symptoms:

DNS Resolution Works: nslookup from a LAN client correctly resolves the domain names to their public IP addresses via OPNsense Unbound.
Standard Ping Works: ping 8.210.123.202 from the LAN client receives replies successfully.
Browser Fails: Accessing the site via a browser (Chrome, Edge) results in a timeout error (e.g., ERR_CONNECTION_TIMED_OUT).
curl Confirms Timeout: Using curl -v https://www.dessmonitor.com from the client successfully resolves the DNS name but fails during the TCP connection attempt:
*   Trying 8.210.123.202:443...
* connect to 8.210.123.202 port 443 from 0.0.0.0 port [port] failed: Timed out
* Failed to connect to www.dessmonitor.com port 443 after [time] ms: Couldn't connect to server
curl: (28) Failed to connect to www.dessmonitor.com port 443 after [time] ms: Couldn't connect to server
Use code with caution.

My Setup:
OPNsense Version: Current 25.1.5_5, previous attempts 24.7.1 - 12
Basic Setup: WAN (DHCP/Static, ISP Gateway 10.1.1.1, 2 x NAT), LAN (10.0.0.0/16).
WireGuard is configured for selective routing (using alias) but the client experiencing this issue is intended to use the standard WAN gateway. WireGuard hosts can access the site.
Bypassing OPNsense and connecting directly to WAN router loads https://www.dessmonitor.com.

Unbound DNS is used for LAN clients. DNSBL is enabled but doesn't seem related as nslookup works and was failing even before enabling Unbound.
Client OS: Windows 10/11
Troubleshooting Steps Taken:
DNS: Confirmed resolution works via nslookup and curl. Disabled Chrome's Secure DNS.
Basic Connectivity: Standard ping to the destination IP works fine. Access to major international sites (Google, Cloudflare, etc.) works fine.

Firewall Rules: Reviewed Firewall -> Rules -> LAN. Temporarily disabled all rules except the default "allow LAN to any" rule. The issue persisted. No specific rules appear to be blocking this destination IP/port. Checked Live Logs while attempting connection - no blocks shown for the destination IP.

Outbound NAT: Using Hybrid mode with a specific rule for the WireGuard alias. Tested switching to Automatic Outbound NAT - issue persisted.
Gateway Status: WAN gateway is online and functioning for other traffic.

MTU/MSS:
Performed ping <dest_ip> -f -l <size> tests from the client.
Determined Path MTU is 1488 (largest successful payload -l 1460).
Enabled MSS clamping under Firewall -> Settings -> Advanced -> Enable Maximum MSS and set the value explicitly to 1448 (1488 - 40).
Reset firewall states (Firewall -> Diagnostics -> States -> Reset).

Result: This did NOT resolve the timeout issue.

Normalization: Reviewed Firewall -> Settings -> Normalization. Default scrub is enabled. A specific max-mss rule exists only for the WG1 (WireGuard) interface, not WAN. Tested disabling normalization globally - issue persisted.

IDS/IPS: Intrusion Detection (Suricata) is currently disabled.
States: Reset firewall states multiple times after configuration changes.

Request for Help:

Given that DNS resolution works, standard ping works, and MSS clamping based on tested Path MTU didn't resolve the TCP connection timeout specifically for these sites, what else should I investigate within OPNsense?

Are there specific logs (beyond Unbound and basic filtered live view) I should enable or examine?

Could this be related to state handling, specific NAT behavior, or upstream routing issues that OPNsense might interact with differently for these paths?

How can I effectively trace or debug the failing TCP connection as it passes through OPNsense? For example, what are the recommended steps/filters for using Packet Capture (Interfaces -> Diagnostics -> Packet Capture) on the LAN and WAN interfaces simultaneously to see where the SYN packet goes and if a SYN-ACK returns?

Any pointers on what specific settings or diagnostics to try next would be greatly appreciated.

Remember this ONLY affects some smaller Chinese sites not just https://www.dessmonitor.com, and the issue presents itself on new install with very little configured. Usually the Chinese site won't load like it's down, but then I connect either through browser proxy or upstream of opnsense box and it loads fine.

Thanks!

Quote from: raybies on April 23, 2025, 07:55:58 AMBypassing OPNsense and connecting directly to WAN router loads https://www.dessmonitor.com.
Run a Wireshark while accessing the site via the WAN router then another trace through OPNsense. See what is different.

Is Zenarmor or Crowdsec enabled? MTU might still be a problem: I can confirm that the real maximum achievable MTU to www.dessmonitor.com is indeed 1500, so if you see 1488, it is on your side. To fix that, see this.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Thanks for the responses.
Zenarmor or Crowdsec NOT enabled.



Capture Analysis - Tracking the Connection:

Client Sends SYN:

capture_client_23-4_2.csv Packet 126 (Time 8.449): 10.0.1.15 -> 8.210.123.202 [SYN] (Src Port 58821)

capture_client_23-4_2.csv Packet 128 (Time 8.584): 10.0.1.15 -> 8.210.123.202 [SYN] (Src Port 58822)

(Client capture shows many subsequent SYN retransmissions)

OPNsense Forwards SYN (and performs NAT):

capture_opnsense_23-4_2.csv Packet 11 (Time 1.239): 10.1.1.100 (OPNsense WAN IP) -> 8.210.123.202 [SYN] (Src Port 33933 - Note the NATed source port)

(This confirms OPNsense received the client's SYN on LAN and sent it out the WAN)

Server Responds with SYN-ACK:

capture_opnsense_23-4_2.csv Packet 12 (Time 1.450): 8.210.123.202 -> 10.1.1.100 (OPNsense WAN IP) [SYN, ACK] (Dest Port 33933)

(This is crucial! The server did respond, and OPNsense received the SYN-ACK on its WAN interface.)

OPNsense Completes Handshake (WAN side):

capture_opnsense_23-4_2.csv Packet 13 (Time 1.451): 10.1.1.100 -> 8.210.123.202 [ACK]

(OPNsense sends the final ACK for the handshake out the WAN)

The Failure Point - SYN-ACK Not Reaching Client:

Compare Packet 12 from the OPNsense capture (SYN-ACK arriving at OPNsense WAN) with the entire client capture.

There is NO corresponding SYN-ACK packet arriving at the client 10.0.1.15 from 8.210.123.202 in capture_client_23-4_2.csv.


I have dozens of times:
Reset States.
Reboot OPNsense.
Reviewed Firewall Logs (Filtered): All green.

I don't get it (could be my problem).
I ran a quick capture for some random web site, downloaded it, opened the caps in wireshark.
Both look identical apart from the NAT rewrites.

Everything OPN does is a reflection of what the client does.
In particular, I don't expect OPN to ACK anything on its own.
The ACK from OPN is the NAT equivalent of the same packet from the client, because the client got the [SYN, ACK].
This said, the time offsets don't necessarily align across both captures so it could be confusing.

The whole issue is confounding; hence why I am here. Why does it affect only some Chinese sites? Why did it work for a very short period just after reinstalling v25.1—before accessing the UI and after setting up the interfaces and IP ranges.
I've reinstalled OPNsense several times and even changed the entire machine, last resort, thinking it could be a HW issue.

Also note this doesn't just affect my testing Windows client, it affects every client eg: Ubuntu servers, Proxmox, etc.. the only commonality is IP range 10.0.0.0/16

Can you share the network captures from OPN for the case that fails?
OPN's WAN IP seems to be a private network IP so I don't expect the capture to contain anything private/sensitive.

Can you put your WAN router in modem mode? NAT strictly speaking breaks HTTP and double NAT doubly so.

OPNSense [WAN] capture: https://drive.google.com/file/d/13e1IGOWSGeyIsPOLwRwv8LTwHQQl6RVT/view?usp=sharing
Not sure what modem mode means, it's an Asus Merlin fw wifi router, it performs PPPoE with my ISP and does the first NAT.

I know 2xNAT is bad, but it's way less bad than having the Mrs go nuts on me because something weird has happened on the opnsense side, so she connects directly to Asus Merlin fw wifi, bypassing "my LAN". It also allows me to quickly troubleshoot an issue's location, among other benefits. I'm debating IPv6.

It's a little weird.
The capture refers to a web.dess... that resolves to the IP in the OP (apparently a Hong-Kong version of the site).

For me, www.dess... refers to the americas version of the site.
Interestingly, curl https://web.dess... (same IP as OP) returns something that looks like the nginx default page.
curl https://www.dess... returns a more complete page (and actually leaves the connection intact)

There's nothing obviously wrong in the capture.
The returned content is smaller than for me (675 versus 1137), but the request could be different, then the connection is torn down.
Then new connections are established. The sizes aren't similar so likely not retries. Chasing links in a browser (not curl)?
I obviously can't peek in the encrypted content...

That's my interpretation... I could be wrong.

I think they're just using Host Headers/Virtual Hosts to load different branded sites on the same server/IP.
So how do I resolve the OPNsense issue? It only fails when accessing through my OPNsense.