Avaya IP Phone IPSec - Not Registering

Started by jb113, December 09, 2020, 08:52:01 PM

Previous topic - Next topic
Greetings.  This is my first post on the forum and I hope I have posted it to the right topic.

I was given an Avaya IP phone for work from home (model 9608G).  It uses IPSec directly from the phone to connect to the office.  I can confirm it is using UDP 4500 for NAT-T.

This worked with pfsense 2.4.5p1 and an old Zyxel usg 50.  But with opnsense 20.7.5, the phone establishes the tunnel but will not communicate further (no registration).

Opnsense is configured with automatic outbound NAT, default LAN and WAN rules, no IDS/IPS, IPSec and OpenVPN are disabled.  Single WAN connection through Spectrum cable.

I found some old forum posts in German referring to problems with NAT-T, but they may have been referring to VPNs originating/terminating on opnsense itself.

Figured it was worth a shot posting here and seeing if anyone had experience with getting Avaya's phone VPN working.

January 26, 2021, 02:59:05 AM #1 Last Edit: January 26, 2021, 03:03:21 AM by siliconsoliloquy
I think I'm working on the exact same issue. I was just about to create a new thread of this but I'll try to use this one.

I'm working on setting up Avaya IP phones for our office to work from home using the built in IPsec VPN on the phones. I have the VPN working properly but I'm running into a really difficult issue where it appears my home OPNsense router is not applying outbound NAT properly to some of the ESP packets between the phone and the office.

In short:

  • The IP Phone is successfully connecting over the VPN and is able to pull configuration files through it. However, the display on the phone is mostly blank and just shows "Press <> for Feature List. It's probably not getting certain data it's expecting.
  • The IP Phone seems to send fragmented ESP packets all the time, even when not needed due to MTU issues. By default my home OPNSense was reassembling these and sending a single packet out on the WAN but with a bad checksum. This packet never makes it to the WAN side of the office router. (Not sure why)
  • If I disable scrubbing on my home router, the packets are no longer reassembled and they are transmitted out the WAN as they were received on the LAN. However, outbound NAT appears to not be applied and thus the packets are dropped and never make it to the Office WAN.
  • This outbound NAT issue only seems to apply to these fragmented packets. All the other packets from the phone to the office are NAT'd properly.
  • I started with the default automatic outbound NAT but I've even manually configured an outbound NAT rule that also doesn't seem to change anything. (photo attached)

Photos of Wireshark captures showing what I think is happening (216.*.*.* is office public IP, 104.*.*.* is home public IP: https://imgur.com/a/G7Bpd6k
(The screenshots aren't necessarily time synchronized but they make my point.)

I've also isolated the issue to my home OPNsense router by plugging the IP Phone directly in without a router at all, effectively plugging it directly into hand-off from my ISP. The phone works perfectly when I do this.

I'm way out of my depth here, I've spent several days on this at this point, and would very much appreciate any thoughts or suggestions.

Did you try to set "static port" in outbound NAT?
,,The S in IoT stands for Security!" :)

@siliconsoliloquy
QuoteI disable scrubbing on my home router, the packets are no longer reassembled and they are transmitted out the WAN as they were received on the LAN. However, outbound NAT appears to not be applied
I think that's the way it should be: if fragments not buffered and reassembled then the pf cannot associate the fragment with the records in states table (or create new record). so NAT is not possible without reassembly.
Quotebut with a bad checksum
this is possible because of hardware checksum offload on NIC (not an error)
(https://wiki.wireshark.org/TCP_Checksum_Verification)
QuoteBy default my home OPNSense was reassembling these and sending a single packet out on the WAN but with a bad checksum. This packet never makes it to the WAN side of the office router. (Not sure why)
most interesting part  ;)
can you try to figure out why? packets leaves the OPN WAN? packets do not reach the office wan or are discarded by the office wan?

Thanks for the responses!

Quote from: Gauss23 on January 26, 2021, 08:02:26 AM
Did you try to set "static port" in outbound NAT?

I just tried turning on the static port setting for the outbound NAT. I can definitely see it being applied in the packet traces but it didn't make any improvement. This is probably because of what Fright said. The outbound NAT isn't working when packet scrubbing is disabled, because packet scrubbing would normally reassemble the fragmented packets and then NAT would be applied. I guess most scenarios don't involve fragmented packets and outbound NAT too.

Quote from: Fright on January 26, 2021, 04:04:57 PM
@siliconsoliloquy
QuoteI disable scrubbing on my home router, the packets are no longer reassembled and they are transmitted out the WAN as they were received on the LAN. However, outbound NAT appears to not be applied
I think that's the way it should be: if fragments not buffered and reassembled then the pf cannot associate the fragment with the records in states table (or create new record). so NAT is not possible without reassembly.

It makes sense that the router has to reassemble the fragments in order to properly process them. I guess since most packets aren't fragmented, that's why it isn't obvious in reading online that disabling packet scrubbing will cause fragmented packets to not have outbound NAT applied. Seems like my outbound NAT issue/theory was a red herring.

Quote
Quotebut with a bad checksum
this is possible because of hardware checksum offload on NIC (not an error)
(https://wiki.wireshark.org/TCP_Checksum_Verification)

I have hardware checksum offloading already turned off on the NIC in OPNsense. (screenshot attached)
I also saw (once albeit) that when checksums were being offloaded, the packets in Wireshark had a checksum of 0x0000 since, I'm guessing, they are just not calculated at that point until the NIC actually sends the packets. I'm seeing actually invalid packet checksums "Header checksum: 0x95d9 incorrect, should be 0xb595", not just null checksums (screenshot attached). Not sure if this means anything though. I'd also think that if checksum offloading was the issue, more or all of the outbound ESP packets would show as bad in Wireshark, not just these reassembled packets of 1310 length.


Quote
QuoteBy default my home OPNSense was reassembling these and sending a single packet out on the WAN but with a bad checksum. This packet never makes it to the WAN side of the office router. (Not sure why)
most interesting part  ;)
can you try to figure out why? packets leaves the OPN WAN? packets do not reach the office wan or are discarded by the office wan?

My theory is that these packets exiting the home WAN actually have bad checksums and are thus being dropped by the next router on the ISP side on the way to the office WAN. I've ordered a cheap smart switch with port mirroring so I can capture the WAN side of the home router with external hardware to rule out any NIC offloading concerns. This should at least confirm if the  outgoing WAN packets actually have bad checksums or not. I just realized if I can replay one of packets with bad checksums, with a fixed checksum, and it arrives, that would narrow down the issue.

Other than that, I have no idea how to determine why packets exiting the home WAN never show up on a packet capture of the office WAN.


(Sorry for the poor image quality. The 256KB limit doesn't leave much room)

QuoteI'd also think that if checksum offloading was the issue, more or all of the outbound ESP packets would show as bad in Wireshark, not just these reassembled packets of 1310 length
agree. sounds very reasonable
so it becomes like an upstream issue (pf, driver or some)
I want to check something. need a little time

January 27, 2021, 07:34:15 PM #6 Last Edit: January 28, 2021, 07:10:28 AM by Fright
@siliconsoliloquy
I dont have a chance to test it on the ESP right now, but this behavior during reassembly was easily reproduced with the icmp.
I sent icmp pings of various lengths from a client with a reduced MTU. if the length of the icmp and the value of the MTU on the OPN allows reassembling the fragments into one packet for sending, then an incorrect IP header checksum is set on outgoing packets. such packets are dropped on the next router
(More precisely, the first reassembled datagram gets the correct checksum but all subsequent ones receive an incorrect one. But in your case, I think the first packet that sets the state is sent without reassembly, so this is not so noticeable).
If the length of the ping is such that reassembled datagram does not fit into one packet on outgoing interface, then all new fragments (corresponding to the MTU value on the outgoing interface of OPN) are sent with the correct checksum.
this is very similar to this described bug:
https://redmine.pfsense.org/issues/10189
(the difference in the checksum, in my opinion, may vary from different conditions. and does not have to be equal to the one mentioned in the "Interesting observation". in my case it is different)

in my opinion this is a PF bug not fixed on HBSD yet

In your case, it seems to me that you can try to play with the MTU_SIZE value on the phone (by the way, what is this value now?), picking it up for the OPN mtu value. Then fragmentation will become less frequent, and the likelihood of reassembling of all fragments into one packet will not be so great. Perhaps this will allow you to continue registering the phone. (its needs to be tested to understand how much fragmentation will occur during normal work and whether it will affect phone work)

hm..some upstream pf fixes mentioned in 21.1 notes
https://forum.opnsense.org/index.php?topic=21147.0

will check reassembly behavior on 21.1 on test VM when I can

and NO. behavior has not changed. still incorrect  ip header checksum
(in addition, the update changed the nic drivers from hn to de  ;D)
so i would try playing with the MTU_SIZE on the phone

Thanks for all your input.

I ran a packet capture using a cheap smart switch with port mirroring placed between my home ISP side and the OPNSense WAN. Like you already said, the OPNSense router is definitely sending out the reassembled packets with bad checksums, which are being dropped by the next router down the chain. I edited and corrected the checksum on one of these packets and when replaying it, it was transmitted successfully to the Office WAN and showed up on a WAN packet capture there.

Quote
If the length of the ping is such that reassembled datagram does not fit into one packet on outgoing interface, then all new fragments (corresponding to the MTU value on the outgoing interface of OPN) are sent with the correct checksum.

That's a really key piece of info. With that I have developed the world's dirtiest workaround: Set the OPNSense WAN MTU 1250. It works beautifully.

These phones don't really provide any way to control or even check the MTU, I think. I've tried changing the MTU_SIZE parameter in the 46xxsettings.txt config before without success and I'm not even sure if it's supported on the H323 mode of these phones. It's only actually mentioned in the SIP documentation from Avaya.

Trying to get OPNsense to handle the fragmented packets properly was the workaround for my not being able to get the MTU to change on the phones. They are mostly default settings so I have no idea why the packets are fragmented in the first place.

I'll keep playing with this and see if I can figure out a better workaround or some way to control the MTU on the phones. Otherwise, I guess the stupid hack of lowering the MTU on the WAN of my home OPNsense might be okay since I'm quite certainly the only one of our users with OPNsense running at home.

nice work!
Another possible solution might be to try to connect ipsec from the OPN itself and connect the phone through an already working tunnel if it possible?

Maybe @franco will notice this topic and comment on the situation with the PF possible bug.
If not, in my opinion the problem is absolutely worth creating a ticket at github

can you share phone model number? maybe i'm lucky looking for the settings?

The phone is an Avaya 9608 IP Phone (6.8304 firmware) and the phone system is an Avaya IP Office 500 v2 (11.1.0.1.0 build 95 firmware).

I know I could run the VPN from home router to office router, rather than using the phone's built in VPN client, and that would be a valid workaround given I'm the only user that would be using OPNsense. However, at this point it feels like more of a matter of principal.  ;)

I've never really dug under the hood of OPNsense into things like PF but I'm kind of curious now how deeply I can recreate the issue. I might see if I can recreate it in a VM with just FreeBSD and PF. If so, then I could submit a bug report with them.

QuoteVM with just FreeBSD and PF
hm. HardenedBSD
and i'm not even sure that @franco doesn't do extra magic with it  ;)