Responses to NTP requests from my firewall is lost at NAT

Started by mosterb, July 23, 2020, 01:04:22 AM

Previous topic - Next topic
Hi!

What I am trying to achieve is to send a simple NTP request from my firewall to some online NTP pool and get a response back. I once had it up and running but somewhere along the line, while configuring other parts of the firewall, it seems to have stopped working.
Currently I'm tearing my hair out trying to understand what is going wrong!

In order to investigate the problem I'm following these steps;

  • I'm sending a NTP request from the firewall itself to an outside Internet IP address: 193.182.111.12 / ntp1.flashdance.cx. I'm using the IP address to avoid adding DNS complexity into this investigation.
  • I'm running tcpdump on the firewall WAN interface and can see the request being sent to 193.182.111.12 and returned to my WAN interface.
  • I have a firewall rule for allowing the responses. It passes outgoing traffic on WAN with source port 123 to any ip/port. This rule successfully pass the response according to the logs.
  • I also see a third packet being captured by tcpdump. It's containing the same response, but this time it is being sent from 193.182.111.12 but using my own WAN interface MAC address. This response seems really weird to me.

In order to explain it better, I of course have some logs :) (where I've anonymized my IP)

The request being sent by a cool piped command I found on the internet, allowing me to choose source port:
root@OPNsense:~ # printf c%47s | nc -vuw1 -s <WAN-IP> -p 12321 193.182.111.12 123
Connection to 193.182.111.12 123 port [udp/ntp] succeeded!
root@OPNsense:~ #


This might look good with the "succeeded", but actually in a non-broken environment there is also a response payload that is printed, that we can't see here.

The tcpdump:
root@OPNsense:~ # tcpdump -n -e -i igb0 port 123
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on igb0, link-type EN10MB (Ethernet), capture size 262144 bytes
23:46:56.122676 40:62:31:09:f5:a9 > 00:00:5e:00:01:05, ethertype IPv4 (0x0800), length 90: <WAN-IP>.48786 > 193.182.111.12.123: NTPv4, Client, length 48
23:46:56.135402 00:00:5e:00:01:05 > 40:62:31:09:f5:a9, ethertype IPv4 (0x0800), length 90: 193.182.111.12.123 > <WAN-IP>.48786: NTPv4, Server, length 48
23:46:56.135456 40:62:31:09:f5:a9 > 00:00:5e:00:01:05, ethertype IPv4 (0x0800), length 90: 193.182.111.12.123 > <WAN-IP>.12321: NTPv4, Server, length 48


In the above dump "40:62:31:09:f5:a9" is the MAC of my WAN interface and I believe "00:00:5e:00:01:05" is the MAC of my external gateway.

Why do I believe that? Here's why:
root@OPNsense:~ # arp -i igb0 -a
? (100.65.0.1) at 00:00:5e:00:01:05 on igb0 expires in 546 seconds [ethernet]
? (<WAN-IP>) at 40:62:31:09:f5:a9 on igb0 permanent [ethernet]


..where 100.65.0.1 is the IP of my ISP Gateway.

Finally here's the route for my WAN IP:
root@OPNsense:~ # route show <WAN-IP>
   route to: <WAN-IP>
destination: <WAN-IP>
        fib: 0
  interface: lo0
      flags: <UP,HOST,DONE,STATIC,PINNED>
recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0     16384         1         0


So.. My hypothesis of what is happening, is that my request is being NAT:ed to the outside world, but while NAT:ing the response back to port 12321, the firewall goes bananas and uses the MAC address of the gateway as target MAC (instead of my WAN MAC).

Does anyone have any idea of what can cause this? I really can't figure out how to proceed :'(

Regards,
Mattias

I'll respond to my own thread with a solution in case someone stumble upon this searching for the same issue.

It is solved by clicking "firewall -> settings -> advanced -> Disable reply-to on WAN rules".
My setup has the opnsense WAN interface connected to my ISP network. It's getting its IP through DHCP which means it will automatically get an upstream gateway. This, in turn, makes it subject to automatically routing all traffic produced by the WAN interface via the ISP gateway. In my case that was completely wrong and had to be disabled.

More information can be found in this thread, where others have had similar problems:
https://forum.opnsense.org/index.php?topic=15900.0