[SOLVED] Routing table not used for LAN clients (eg. for remote OpenVPN site)

Started by CDuv, July 16, 2018, 12:29:12 PM

Previous topic - Next topic
I have a routing issue: when trying to access a remote OpenVPN-accessible network, traffic leaves via WAN (Internet) not using the router's routing table.

Context:
I have 3 sites, all running OPNsense as their router : Site 1 with LAN_1 (192.168.1.0/24), Site 2 with LAN_2 (192.168.2.0/24) and Site 3 with LAN_3 (192.168.3.0/24).
The OpenVPN server is on site 1's router.
Sites 2 and 3 have OpenVPN clients configured.

Problem:
The OpenVPN tunnel between sites 1 and 2 is fine: LAN_1 can reach LAN_2 and LAN_2 can reach LAN_1.
But the tunnel between sites 1 and 3 has issues: LAN_3 can reach LAN_1 (pinging 192.168.1.x from LAN_3 is OK) but LAN_1 cannot reach LAN_3 (pinging 192.168.3.x from LAN_1 fails).

It looks like a routing issue because, from LAN_1, a traceroute to a LAN_3 IP shows traffic goes toward my WAN/Internet:

user@machine_at_site_1:~# traceroute 192.168.3.1
traceroute to 192.168.3.1 (192.168.3.1), 30 hops max, 60 byte packets
1  80.10.x.y (80.10.x.y)  0.127 ms  0.100 ms  0.112 ms
2  * * *
3  * * *
[...]


(80.10.x.y is the IP address of site 1's WAN gateway)

In the opposite direction (from LAN_3 to LAN_1) it seems to works fine:

user@machine_at_site_3:~# traceroute 192.168.1.1
traceroute to 192.168.1.1 (192.168.1.1), 30 hops max, 60 byte packets
router.site3.example.com (192.168.3.254)  0.127 ms  0.100 ms  0.112 ms
2  192.168.254.17 (192.168.254.17)  62.583 ms  63.738 ms  60.224 ms
192.168.1.1 (192.168.1.1)  62.612 ms  63.870 ms  62.489 ms


(192.168.3.254 is site 3's OPNsense, and 192.168.254.17 the OpenVPN interface for the Site 1-Site 3 tunnel)

In the mean time, a traceroute from LAN_1 to LAN_2 (the tunnel that works) shows traffic using OPNsense VPN server (as expected):

user@machine_at_site_1:~#  traceroute 192.168.2.1
traceroute to 192.168.2.1 (192.168.2.1), 30 hops max, 60 byte packets
router.site1.example.com (192.168.1.254)  0.127 ms  0.100 ms  0.112 ms
2  192.168.254.14 (192.168.254.14)  62.583 ms  63.738 ms  60.224 ms
192.168.2.1 (192.168.2.1)  62.612 ms  63.870 ms  62.489 ms


(192.168.1.254 is site 1's OPNsense, and 192.168.254.14 the OpenVPN interface for the Site 1-Site 2 tunnel)

Also, traceroute executed directly from site 1's OPNsense (to either Site 2 or Site 3 works fine.

I cannot understand why routing table works for one site and not the other and/or why LAN_1 machines traffic does not use routing table...

Quote from: CDuv on July 16, 2018, 12:29:12 PM
user@machine_at_site_1:~# traceroute 192.168.3.1
traceroute to 192.168.3.1 (192.168.3.1), 30 hops max, 60 byte packets
1  80.10.x.y (80.10.x.y)  0.127 ms  0.100 ms  0.112 ms
2  * * *
3  * * *
[...]


In my opinion it's unusal that the WAN IP address is the first hop that answers to your traceroute, isn't it?
I mean, when you traceroute site 2, you see the IP of your OPNsense 192.168.1.254 as the first hop. That means OPNsense answeres to ICMP requests in generally. So why didn't it when you try to traceroute site 3?
Duck, Duck, Duck, Duck, Duck, Duck, Duck, Duck, Goose

You are right, this is also part of the mystery: not having OPNsense as the first hop is the same behavior as when pinging the Internet (eg. "google.com").

I tested from several machines of LAN_1: same result.

Can this be caused by misconfiguration at:

  • the OpenVPN level?
  • the firewall level?

Does your openvpn tunnel use a tap or a tun device?

Bart...

They all (I have 6 OpenVPN servers defined on LAN_1's OPNsense) use tun, over UDP, with the exact same settings except for (obviously):

  • Local port
  • Shared Key
  • IPv4 Tunnel Network
  • IPv4 Local Network
  • IPv4 Remote Network

I just upgraded both OPNsense to v18.1.12: issue remains...
Why would it work for 192.168.1.x and 192.168.2.x but not for 192.168.3.x ?

Have you checked if there's a typo in a subnetmask of one or more interfaces?
Could you post your routing table?

Strange problem....but I like to solve it  ;)


EDIT: I'm still thinking about the traceroute and the WAN IP that appears as first address. Is there an active NATting rule on the LAN interface which could be the reason for this?
Duck, Duck, Duck, Duck, Duck, Duck, Duck, Duck, Goose

With or without route to LAN_2 on OPNsense LAN_1, my first hop when tracerouting from LAN_1 to LAN_2 (working tunnel) is still the LAN_1's OPNsense.

I'll obfuscate my routing table and post it...

I have NAT Port forwarding set on LAN interface for Squid (on specific port) and the "Anti-Lockout Rule": and nothing more on LAN interface

I have double checked and cannot find any typo in network address or mask.

Here is my routing table of Site 1's OPNsense:

root@opnsense.site1.example.com:~ # netstat -rn
Routing tables

Internet:
Destination        Gateway                Flags     Netif Expire
default            80.10.10.9             UGS        igb1         # Default Internet access via WAN_1
8.8.4.4            80.10.10.9             UGHS       igb1         # WAN_1 Monitoring
8.8.8.8            192.168.202.1          UGHS       igb2         # WAN_2 Monitoring
80.67.169.12       192.168.203.1          UGHS   igb4_vla         # WAN_3 Monitoring

80.10.10.8/30      link#2                 U          igb1         # WAN_1
80.10.10.10        link#2                 UHS         lo0         # WAN_1

127.0.0.1          link#7                 UH          lo0

192.168.1.0/24     link#1                 U          igb0         # LAN_1
192.168.1.250      link#1                 UHS         lo0         # LAN_1 OPNsense virtual IP
192.168.1.254      link#1                 UHS         lo0         # LAN_1 OPNsense IP

192.168.2.0/24     192.168.254.2          UGS      ovpns3         # LAN_2 via OpenVPN tunnel
192.168.3.0/24     192.168.254.6          UGS      ovpns2         # LAN_3 via OpenVPN tunnel
192.168.4.0/24     192.168.254.10         UGS      ovpns4         # LAN_4 via OpenVPN tunnel
192.168.5.0/24     192.168.254.14         UGS      ovpns5         # LAN_5 via OpenVPN tunnel
192.168.6.0/24     192.168.254.18         UGS      ovpns6         # LAN_6 via OpenVPN tunnel

192.168.66.0/24    link#17                U      igb4_vla         # Guest_1 network
192.168.66.1       link#17                UHS         lo0         # Guest_1 OPNsense virtual IP
192.168.66.4       link#17                UHS         lo0         # Guest_1 OPNsense IP
192.168.67.0/24    link#18                U      igb4_vla         # Guest_2 network
192.168.67.1       link#18                UHS         lo0         # Guest_2 OPNsense virtual IP
192.168.67.4       link#18                UHS         lo0         # Guest_2 OPNsense IP

192.168.202.0/24   link#3                 U          igb2         # WAN_2 network
192.168.202.5      link#3                 UHS         lo0         # WAN_2 OPNsense virtual IP
192.168.202.6      link#3                 UHS         lo0         # WAN_2 OPNsense IP

192.168.203.0/24   link#19                U      igb4_vla         # WAN_3 network
192.168.203.2      link#19                UHS         lo0         # WAN_3 OPNsense IP

192.168.252.0/30   link#6                 U          igb5         # Inter-OPNsense synchronization interface (not used)
192.168.252.1      link#6                 UHS         lo0         # Inter-OPNsense synchronization interface (not used)

192.168.253.0/24   192.168.253.2          UGS      ovpns1         # Mobile OpenVPN clients network
192.168.253.1      link#11                UHS         lo0
192.168.253.2      link#11                UH       ovpns1

192.168.254.0&0xac1e0102 192.168.254.2    UGS      ovpns3         # Site 1-Site 2 OpenVPN tunnel
192.168.254.1      link#13                UHS         lo0
192.168.254.2      link#13                UH       ovpns3
192.168.254.4&0xac1e0106 192.168.254.6    UGS      ovpns2         # Site 1-Site 3 OpenVPN tunnel
192.168.254.5      link#12                UHS         lo0
192.168.254.6      link#12                UH       ovpns2
192.168.254.8&0xac1e010a 192.168.254.10   UGS      ovpns4         # Site 1-Site 4 OpenVPN tunnel
192.168.254.9      link#14                UHS         lo0
192.168.254.10     link#14                UH       ovpns4
192.168.254.12&0xac1e010e 192.168.254.14  UGS      ovpns5         # Site 1-Site 5 OpenVPN tunnel
192.168.254.13     link#15                UHS         lo0
192.168.254.14     link#15                UH       ovpns5
192.168.254.16&0xac1e0112 192.168.254.18  UGS      ovpns6         # Site 1-Site 6 OpenVPN tunnel
192.168.254.17     link#16                UHS         lo0
192.168.254.18     link#16                UH       ovpns6


I added comments but here are some details:

192.168.1.0/24 = LAN_1 (OPNsense = 192.168.1.254, virtual IP = 192.168.1.250)
192.168.2.0/24 = LAN_2 (OPNsense = 192.168.2.254)
192.168.3.0/24 = LAN_3 (OPNsense = 192.168.3.254)
192.168.4.0/24 = LAN_4 (OPNsense = 192.168.4.254)
192.168.5.0/24 = LAN_5 (OPNsense = 192.168.5.254)
192.168.6.0/24 = LAN_6 (OPNsense = 192.168.6.254)

192.168.66.0/24 = Guest1 LAN (OPNsense = 192.168.66.4, virtual IP = 192.168.66.1)
192.168.67.0/24 = Guest2 LAN (OPNsense = 192.168.67.4, virtual IP = 192.168.67.1)

192.168.253.0/24 = Mobile OpenVPN clients network
192.168.254.0/30 = Site 1-Site 2 OpenVPN tunnel
192.168.254.4/30 = Site 1-Site 3 OpenVPN tunnel
192.168.254.8/30 = Site 1-Site 4 OpenVPN tunnel
192.168.254.12/30 = Site 1-Site 5 OpenVPN tunnel
192.168.254.16/30 = Site 1-Site 6 OpenVPN tunnel

80.10.10.9 = WAN_1 gateway
80.10.10.10 = OPNsense WAN_1 interface

192.168.202.0/24 = WAN_2 network (ISP box has it's own /24 network)
192.168.202.1 = WAN_2 gateway
192.168.202.5 = OPNsense Virtual IP
192.168.202.6 = OPNsense WAN_2 interface

192.168.203.0/24 = WAN_3 network (ISP box has it's own /24 network)
192.168.203.1 = WAN_3 gateway
192.168.203.2 = OPNsense WAN_3 interface

192.168.252.0/30 = Inter-OPNsense synchronization interface (not used)


For testing purpose I added a route to 10.0.0.0/8 via one of my working OpenVPN tunnel, and yet all my LAN_1 clients won't use OPNsense as first hop.

I wonder if what franco said here:
QuoteMaybe this is due to anti-spoof, or maybe due to a forced catch-all gateway multi-wan rule that slurps your local traffic and pushes it to the gateway on said interface.

The latter is more likely, but there was no statement about it in the OP.
Could explain the fact that some networks are getting out via my gateway. Or maybe this network address is known to my Internet provider and somehow the gateway and OPNsense shared this information via some route table sharing protocol (which I didn't enabled and could not find any settings, though).

I do have multi-WAN enabled on this OPNsense box but don't have any specific rule that could explain why if fails specifically with 192.168.3.x.

The routing table looks fine. If any routing protocol is active which share a route for 192.168.3.x/yy between the providers router and your OPNsense, you should see this route in the table.


  • Run a traceroute from the GUI (Interfaces: Diagnostics: Trace Route) on OPNsense_Site1 to LAN_Site3 interface or client with source interface LAN_Site1 (192.168.1.254).

    • Is the result fine? Then I guess the problem is not the OPNsense. It must be something in the LAN or the routing table of the clients are misconfigured.
    • Runs it to the WAN interface as the traceroute from the clients?
  • Run a traceroute from a client of LAN_Site1 to the OpenVPN interface of Site3 (192.168.254.6).

    • Is the result fine?
    • Runs it to the WAN interface as the traceroute to LAN_3?
  • In your first post the traceroute to LAN_2 shows the IP 192.168.254.14. You wrote that this is the OpenVPN interface for Site_1 <-> Site_2 tunnel. But according on your routing table this is the OpenVPN interface for LAN_5. Is this just another subnet on Site_2, or do you have additional tunnels and you only mistaken/obfuscate the IP? I think this is not really important for troubleshooting, but to understand your topology.
  • I'm not very familiar with the Multi WAN function, but maybe this is the root of your issue. Can you disable this option or at least the additional lines? Then test again.
  • Are you allowing communications between LAN_2 and LAN_3? If yes, can you reach clients from LAN_2 in LAN_3 and vice versa?
  • Check your Firewall rules, especially those for LAN_3. Have you defined a gateway which forces 'policy based routing'?
Duck, Duck, Duck, Duck, Duck, Duck, Duck, Duck, Goose

First, let me thank you for the time you take to help me sort this out: much appreciated.


  • The traceroute from OPNsense_Site1 to some IP of LAN_Site3 fails (including to OPNsense_Site3's LAN_Site3 IP):
    traceroute to 192.168.3.9 (192.168.3.9) from 192.168.1.254, 8 hops max, 40 byte packets
    1  80.10.10.9  0.396 ms  0.342 ms  0.278 ms
    2  * * *
    3  * * *

    [...]
    I didn't understand the "Runs it to the WAN interface as the traceroute from the clients?" test.
  • The traceroute from OPNsense_Site1 to OpenVPN interface for LAN_Site3 (192.168.254.6) fails (same for IP 192.168.254.5 as destination):
    traceroute to 192.168.254.6 (192.168.254.6) from 192.168.1.254, 3 hops max, 40 byte packets
    1  80.10.10.9  0.491 ms  0.285 ms  0.259 ms
    2  * * *
    3  * * *
  • You are right, in my first post I simplified the setup: I only took 2 tunnels (3 sites) as an example but when giving the full routing table I gave all 5 tunnels (and had some conflict on the IP indeed).
  • In fact, I have already disabled multi-WAN load balancing: My OPNsense_Site1 LAN interface has a "src=* dest=* gw=gateway_group_of_only_one_gateway" final firewall rule
  • I don't block traffic from LAN_2 to LAN_3.
    I tested and a ping from LAN_3 to LAN_2 works (tested with ping on one side and tcpdump on the other), but a traceroute fails (no capture by tcpdump) and have LAN_3's gateway as first hop.
    But a ping from LAN_2 to LAN_3 fails: ping fails and tcpdump captures nothing.
    So: that made me think... and I noticed I had a "allow IPv4 ICMP" rule on OPNsense_Site3 but not on OPNsense_Site2 (nor OPNsense_Site1), once enabled ping works in both ways.
    I thought traceroute used ICMP.
  • As all my sites uses multi-WAN (with failover), the last firewall rule of LAN interface are: action=pass, proto=*, src=LAN, dst=*, gw=gateway_group_with_one_primary_gw_and_one_failover_gw
    So yes, firewall always defines a "policy based routing".

Theses last 3 questions/tests of yours were very pertinent, first I had a misconfiguration that allowed ping in one direction but not the other. And I had also forgot about the "policy based routing".
Still I don't understand yet why it worked with some network/tunnel and not the other: it must be some underlying misconfiguration that never popped visible issue until now.

I'm digging the firewall "policy based routing" rule issue lead now :)

I think I found out the root cause: It might be because of some bug with firewall aliases in OPNsense: an alias fails to be completely expanded (it's an network alias where members are also network aliases), thus the firewall rule that should tell OPNsense to use "gw=default" for this destination does not match and it's then the final "gw=multiwan_gateway_group" that matches, ignoring my routing table.

Quote from: CDuv on July 30, 2018, 07:16:02 PM
I think I found out the root cause: It might be because of some bug with firewall aliases in OPNsense: an alias fails to be completely expanded (it's an network alias where members are also network aliases), thus the firewall rule that should tell OPNsense to use "gw=default" for this destination does not match and it's then the final "gw=multiwan_gateway_group" that matches, ignoring my routing table.

That sounds like a good explanation for this crazy behaviour.  :)
If the alias is not loading properly please check the system logs first.

Just some hints:
Another user in this forum have or had problems with an GeoIP alias which could not be loaded, and therefore all other following aliases were not loaded too.

Another reason could be the max. table entries for the firewall. If the max. count is reached, no more aliases can be loaded. You can rise the count in the firewall settings ( 'Maximum Table Entries' ).

Duck, Duck, Duck, Duck, Duck, Duck, Duck, Duck, Goose

Got it working, it was effectively a failing alias (see https://github.com/opnsense/core/issues/2590).
I managed to make it populated correctly and now it works.

Thanks for you help JasMan: I wouldn't have looked into this direction without it.