Can no longer access smb or ssh on one device in VLAN

Started by verlenord, April 23, 2025, 10:54:50 AM

Previous topic - Next topic
Hello,

I need your help :-)
First, I'd like to point out that I'm a beginner and that I was able to set up my setup thanks to the various tutorials on the internet. Please forgive me for using terms that may be incorrect or imprecise when defining certain things. I'm a fast learner, but I still have a lot of gaps ...

For about 10 days now, I've been unable to access the various internal services on my NAS (SMB, ssh, etc.). I can't pinpoint the exact change that led to this problem, but I'm guessing it's been happening since the latest Opnsense 25.1.5 upgrade.
All services exposed via reverse proxy are accessible without problems, but I can no longer mount shared volumes locally or connect via ssh on my NAS when I'm on the current VLAN.

I have 4 VLAN in my network. The main NAS (Synology DS920+), another NAS and a raspberry are on the DMZ VLAN. All other laptop-type devices are on a USER VLAN, and the various firewall rules for accessing devices in the DMZ VLAN have always worked well so far. I can still access the other NAS and the Raspberry via ssh without any problems.

I've turned the problems upside down, suspected lag, the switch, the settings on the Synology, I can't get anywhere. I can access without problems when I'm connected to the LAN for testing, but not from the VLAN. Surprisingly, I can also connect to the NAS using ssh or SMB when I'm on the wireguard vpn (I have a firewall rule that allows this).

Anyway, if anyone could help me find the problem with methodology, I'd be infinitely grateful :-)

I'd suspect, that you have an asymmetric routing issue due to a layer 2 leakage.

For troubleshooting something like that, Interfaces: Diagnostics: Packet Capture is your best friend.
E.g. to verify proper routing of SSH packets, select the incoming and outgoing interfaces and enter 22 (or whatever port your SSH server is listening on) for the port filter and start the capture.
Then try to access the NAS via SSH.
Stop the capture then and check out the log. Ensure that you see the packets on both interfaces.

ssh is pretty simple and by the time you get a login prompt, you should have enough data.
From USER.client to DMZ.NAS1:22, you should see a request packet in on USER, then out on DMZ (destination port = 22). The reply packet comes in on DMZ then out on USER (source port = 22).
in and out are from the perspective of the FW.

If you try to ssh from another host in DMZ, you should see no logging.

FWIW, the DMZ term is typically used for the subnet containing the external-facing services of an org (web, ftp, ...), separate from the more protected LAN.

Thanks for your feedback :-)

Here is what I got with Packet Capture:

On the vlan01.DMZ

No.   Time   Source   Destination   Protocol   Length   Info
1   0.000000   192.168.20.45   192.168.10.10   TCP   78   60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164839612 TSecr=0 SACK_PERM
2   1.002296   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164840612 TSecr=0 SACK_PERM
3   2.000752   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164841612 TSecr=0 SACK_PERM
4   3.002084   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164842613 TSecr=0 SACK_PERM
5   4.000046   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164843613 TSecr=0 SACK_PERM
6   5.000820   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164844614 TSecr=0 SACK_PERM
7   7.003733   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164846614 TSecr=0 SACK_PERM


And on the vlan02.USER

No.   Time   Source   Destination   Protocol   Length   Info
1   0.000000   192.168.20.45   192.168.10.10   TCP   78   60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164839612 TSecr=0 SACK_PERM
2   1.002349   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164840612 TSecr=0 SACK_PERM
3   2.000814   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164841612 TSecr=0 SACK_PERM
4   3.002150   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164842613 TSecr=0 SACK_PERM
5   4.000108   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164843613 TSecr=0 SACK_PERM
6   5.000877   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164844614 TSecr=0 SACK_PERM
7   7.003805   192.168.20.45   192.168.10.10   TCP   78   [TCP Retransmission] 60299 → 441 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=2164846614 TSecr=0 SACK_PERM


I don't know how to interpret the results, but it seems to talk both ways...

And the live view :

You cannot view this attachment.


Quote from: EricPerl on April 23, 2025, 09:08:57 PMFWIW, the DMZ term is typically used for the subnet containing the external-facing services of an org (web, ftp, ...), separate from the more protected LAN.

It seems to me that this is precisely the idea in this situation because all the services exposed to the Internet run on my NAS, which is in the DMZ? Or am I mistaken in my understanding?

The packet capture shows only SYN packets from 192.168.20.45 to 192.168.10.10, but there is no respond from 192.168.10.10 at all.

Two possible reasons I quickly got in mind for this:
- the response packets go a different path
- the request or respond packets are sent out on the wrong interface

The source cause for the first one cloud be a layer 2 leakage on another devices (switch), as mentioned already.

The second could be caused by wrong network settings on one of the involved devices. Check if the network mask and the gateway set correctly on every device.

Note that this issue might not have anything to do with firewall rules. You just have to ensure that the connection from 192.168.20.45 to 192.168.10.10 is allowed.
Do you even see any regarding blocks in the log?

April 24, 2025, 01:39:50 PM #5 Last Edit: April 24, 2025, 01:59:23 PM by verlenord
So, to give a little more context on the network configuration, Opnsense has been configured following this tutorial: https://homenetworkguy.com/how-to/set-up-a-fully-functioning-home-network-using-opnsense/

This means that all vlan pass through the lagg. This is presumably configured correctly on the switch (cisco sg300-10). No recent changes have been made to the switch and it was working well until recently.
The NAS is also configured with a lagg on the switch, in vlan10 access mode.

Among the things I've tried :
- reboot the NAS, switch and router
- connect the NAS with a single port and remove the lagg
- change the NAS ip
- check the firewall settings on the NAS
- Try SSH from other laptops in vlan.USER to NAS without success.

I'd like to point out that in this vlan.DMZ I have 2 other devices to which I connect via ssh from vlan.USER without any problems.

On the NAS in question :
Fixed IP: 192.168.10.10
Gateway: 192.168.10.1
Network mask: 255.255.255.0

On my Laptop, no particular configuration other than a fixed ip in my network

On the firewall, I have a rule in USER that authorizes 192.168.20.45 to talk on any port in DMZ net

One of the things I've been testing over the past month is enabling Layer4 Proxy in Caddy for SSH access on my NAS. This works fine. However, I'm not at home right now to test disabling this.

So if you can connect to other devices in the DMZ but not to the NAT, remember that the destination devices itself will run its own firewall, which is possibly blocking the access.

Yes, of course. This is why I tried to disable firewall on the NAS side, but it didn't change anything.
And, it was working flawless until ~1 month.

So there must be something else wrong.

Sniff the traffic on the interface of the NAS to see if the request packets arrive and if it sends out responses.

On OPNsense you see the packets going out, but nothing is coming back. Hence further searching for the mistake on OPNsense is a waste of time.

Ok, make sense

I'll run a tcpdump on the NAS side this evening and get back to you with the results

So the problem was indeed with the NAS.
On Synology, in Network > General > Advanced settings, the "enable multiple gateways" was unchecked.

I had disabled this function because it was causing me problems with my Docker networks, which could no longer communicate with each other.

I ended up leaving this function disabled but added a static route to 192.168.20.0
I don't know if this is the right way to do it, but it works.

Anyway, thanks a lot for the help! I've learned more how to use tcpdump and other tools, and I'm very grateful :-)

Quote from: verlenord on April 25, 2025, 11:48:19 AM"enable multiple gateways"
I don't know, what this setting does on the NAS. But to circumvent routing issues you can also masquerade the traffic to it with an outbound NAT rule on OPNsense, so that the NAS only sees the OPNsense DMZ IP as source of the access.