OPNSense on Azure messing up routes?

Started by wrobelda, March 10, 2022, 12:40:43 AM

Previous topic - Next topic
I migrated to OPNSense from pfSense on my Azure (self-provisioned) appliance and everything is mostly fine, except one issue I have with the routes it it is setting up using DHCP, which render the 168.63.129.16 (https://docs.microsoft.com/en-us/azure/virtual-network/what-is-ip-address-168-63-129-16) unusable.

Have a look at the dump first:

root@gw:~ # netstat -nr
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            172.32.0.1         UGS         hn0
10.0.0.0/22        87.99.46.47        US          hn0
10.1.0.0/23        link#6             U           hn1
10.1.0.4           link#6             UHS         lo0
127.0.0.1          link#1             UH          lo0
168.63.129.16      link#6             UHS         hn1
169.254.169.254    172.32.0.1         UGHS        hn0
172.32.0.0/24      link#5             U           hn0
172.32.0.7         link#5             UHS         lo0

root@gw:~ # ifconfig
hn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN
options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
ether 00:22:48:4f:b5:c7
inet 172.32.0.7 netmask 0xffffff00 broadcast 172.32.0.255
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
hn1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: LAN
options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
ether 00:22:48:4f:bd:3f
inet 10.1.0.4 netmask 0xfffffe00 broadcast 10.1.1.255
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>


Now, compare it to the output from the old pfSense appliance:


root@gw:~ # netstat -nr
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            172.32.0.1         UGS         hn0
10.1.0.0/23        link#6             U           hn1
10.1.0.14          link#6             UHS         lo0
127.0.0.1          link#2             UH          lo0
168.63.129.16      00:0d:3a:7d:83:47  UHS         hn0
168.63.129.16/32   172.32.0.1         UGS         hn0
169.254.169.254/32 172.32.0.1         UGS         hn0
172.32.0.0/24      link#5             U           hn0
172.32.0.4         link#5             UHS         lo0



root@gw:~ # ifconfig
hn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=48001b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,LINKSTATE,TXCSUM_IPV6>
ether 00:0d:3a:7d:83:47
inet 172.32.0.4 netmask 0xffffff00 broadcast 172.32.0.255
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
hn1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: LAN
options=48001b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,LINKSTATE,TXCSUM_IPV6>
ether 00:0d:3a:01:7f:5f
inet 10.1.0.14 netmask 0xfffffe00 broadcast 10.1.1.255
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>



Notice that the pfSense routes both 168.63.129.16  and 168.63.129.16/32 using the WAN interface, whereas OPNSense somehow assigns 168.63.129.16 to LAN. Removing the latter rule manually restores the communication back.

What's happening here? Any ideas?

This is fairly crucial, since 168.63.129.16 handles communication with the Agent, including backing up.


Did you use the appliance or did you bootstrap yourself? To me it seems the assignments need to be flipped

March 10, 2022, 10:12:43 AM #2 Last Edit: March 10, 2022, 10:14:53 AM by wrobelda
I bootstrapped myself, using https://github.com/dmauser/opnazure

Not sure what you mean by assignments, though?


There is no general rule if azure 168.63.129.16 should be routed via hn0 or hn1, it depends on your underlying azure infrastructure deployment and the usecase(s) related to 168.63.129.16.

On pitfall is that OPNSense tries to talk to 168.63.129.16 on both interfaces.

Example:

- Your OPNsense is using Azure DNS on LAN/hn0 (route 168.63.129.16 -> LAN/hn0).  This is outbound connection from OPNsense to Azure
- You have incoming connection from Azure Loadbalancer Healthprobe (also 168.63.129.16) on WAN/hn1

=> loadbalancer probe will always fail because answer is routed via LAN.

So first make sure that you see all packets with source OR destination address = 168.63.129.16 in logs. Then check if they are on the same interface. If not, it won't work even if you flip LAN/WAN assignments to hn0/hn1.
There are solutions for this problem, but they depend on the Use-Case and the underlying azure infrastructure.

If connections to/from 168.63.129.16 are all on the same interface, i think it is a problem of your underlying infrastructure deployment or System->Gateways->Single configuration problem.

But again, without more details i cannot offer solution.