Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - greg124816

Pages: [1]

19.7 Legacy Series / Backup CARP member using CARP IPv6 address as source for ping6

« on: December 02, 2019, 03:29:28 pm »

I have a HA firewall setup that has been working in production with IPv4 for many years.

I'm adding IPv6 now and ran into an issue with source address selection on the Backup CARP interface.

ping6 to any IPv6 address the subnet works correctly from either firewall except for one case:

ping6 to the CARP IP from the Backup firewall fails

For some reason the Backup firewall uses the CARP IP as the source address, even though it is in BACKUP state. If I force ping6 to use the permanent IP assigned to the backup firewall it works fine.

I can see with tcpdump that the frames come with both src and dest IP as the CARP ip. Even the Neighbor Solicitation has incorrect src IP, and is also not sent to ff02::1:ff00:1 (Solicited-node Multicast address).

Here is tcpdump on the master firewall using a regular ping6 to CARP IP from the backup firewall ( ping6 2001:db8:d::1 )

Code: [Select]

root@dmzfwa:~ # tcpdump -ni igb2_vlan4 ip6 and not proto 112
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on igb2_vlan4, link-type EN10MB (Ethernet), capture size 262144 bytes
05:54:09.722690 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 0, length 16
05:54:10.784383 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 1, length 16
05:54:11.821819 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 2, length 16
05:54:12.831525 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 3, length 16
05:54:13.845976 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 4, length 16
05:54:14.768000 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, neighbor solicitation, who has 2001:db8:d::1, length 32
05:54:14.909059 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 5, length 16
05:54:15.768636 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, neighbor solicitation, who has 2001:db8:d::1, length 32
05:54:15.963281 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 6, length 16
05:54:16.768648 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, neighbor solicitation, who has 2001:db8:d::1, length 32
05:54:17.026216 IP6 2001:db8:d::1 > 2001:db8:d::1: ICMP6, echo request, seq 7, length 16

Things work if I force the source IP selection of ping6 (ping6 -S 2001:db8:d::3 2001:db8:d::1)

Code: [Select]

root@dmzfwa:~ # tcpdump -ni igb2_vlan4 ip6 and not proto 112
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on igb2_vlan4, link-type EN10MB (Ethernet), capture size 262144 bytes
06:03:25.427573 IP6 2001:db8:d::3 > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has 2001:db8:d::1, length 32
06:03:25.427666 IP6 2001:db8:d::2 > 2001:db8:d::3: ICMP6, neighbor advertisement, tgt is 2001:db8:d::1, length 32
06:03:25.427740 IP6 2001:db8:d::3 > 2001:db8:d::1: ICMP6, echo request, seq 0, length 16
06:03:25.427776 IP6 2001:db8:d::1 > 2001:db8:d::3: ICMP6, echo reply, seq 0, length 16

Pinging the IPv4 CARP master IP works fine still. It's also not just ping6 having issues, ssh to the CARP master IPv6 ip has the same symptoms(tcpdump looks the same with src+dst as CARP IP).

Has any one seen anything like this? I've rebooted multiple times, built and rebuilt the IPv6 CARP as it's own CARP item in opnsense with different VHID and also as an IP alias on the same VHID. I get the same results both ways. Never any problems with IPv4.

With tcpdump -e option I did verify that the ping6 and NS frames had the proper SRC MAC of the backup firewall interface.

Here are ifconfig details for the interface on both firewalls:

Master firewall(oops, edited to change IPv6 first part to 2001:db8 like the rest):

Code: [Select]

root@dmzfwa:~ # ifconfig igb2_vlan4
igb2_vlan4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether ac:1f:6b:67:01:b0
        inet6 fe80::ae1f:6bff:fe67:1b0%igb2_vlan4 prefixlen 64 scopeid 0xd
        inet6 2001:db8:d::2 prefixlen 64
        inet6 2001:db8:d::1 prefixlen 64 vhid 1
        inet 10.10.144.2 netmask 0xffffffc0 broadcast 10.10.144.63
        inet 10.10.144.1 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.58 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.54 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.55 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.56 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.57 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 4 vlanpcp: 0 parent interface: igb2
        carp: MASTER vhid 1 advbase 1 advskew 0
        groups: vlan
root@dmzfwa:~ #

Backup firewall:

Code: [Select]

root@dmzfwb:~ # ifconfig igb2_vlan4
igb2_vlan4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether ac:1f:6b:67:01:fe
        inet6 fe80::ae1f:6bff:fe67:1fe%igb2_vlan4 prefixlen 64 scopeid 0xd
        inet6 2001:db8:d::3 prefixlen 64
        inet6 2001:db8:d::1 prefixlen 64 vhid 1
        inet 10.10.144.3 netmask 0xffffffc0 broadcast 10.10.144.63
        inet 10.10.144.1 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.58 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.54 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.55 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.56 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        inet 10.10.144.57 netmask 0xffffffc0 broadcast 10.10.144.63 vhid 1
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 4 vlanpcp: 0 parent interface: igb2
        carp: BACKUP vhid 1 advbase 1 advskew 100
        groups: vlan
root@dmzfwb:~ #

**Development and Code Review / CARP replay protection counter on *sense distros**

« on: February 14, 2019, 10:49:28 pm »

I was migrating a old pair of redundant firewalls w/pfsense to new hardware w/opnsense

What I discovered is that the CARP implementations are NOT compatible and both become MASTER (while trying to swap in the new B firewall while leaving the A firewall running).

After much confusion, the reason is that opnsense seems to have a properly incrementing replay protection counter in each CARP advertisement, while pfsense's counter is static and unchanging. Since the counters never match the packets get ignored by each firewall and both stay MASTER and continue to send their advertisements. There are 2 counter fields, I'm not sure which one tcpdump is showing, but whichever one it is is definitely not matching between opnsense and pfsense.

I verified the same incrementing replay protection counter behavior on several uCARP https://www.pureftpd.org/project/ucarp systems we have running.

Here are some example tcpdump traces, all done from a single pfsense host looking out two different interfaces

Looking at at Opnsense 19.1 host (initial post had the wrong trace for the opnsense host, it's corrected now):

Code: [Select]

[2.3.4-RELEASE][root@localhost]/root: tcpdump -T carp -ni em1 vrrp and host 192.168.100.2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes
13:05:27.216731 IP 192.168.100.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=6491304834018196506
13:05:28.218369 IP 192.168.100.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=6491304834018196507
13:05:29.220678 IP 192.168.100.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=6491304834018196508
13:05:30.222307 IP 192.168.100.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=6491304834018196509

Looking at a Linux Centos host running ucarp 1.5.1:

Code: [Select]

[2.3.4-RELEASE][root@localhost]/root: tcpdump -T carp -ni em1 vrrp and host 192.168.100.7
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes
12:57:37.999181 IP 192.168.100.7 > 224.0.0.18: CARPv2-advertise 36: vhid=5 advbase=1 advskew=120 authlen=7 counter=5446066235882559562
12:57:39.112282 IP 192.168.100.7 > 224.0.0.18: CARPv2-advertise 36: vhid=5 advbase=1 advskew=120 authlen=7 counter=5446066235882559563
12:57:40.999080 IP 192.168.100.7 > 224.0.0.18: CARPv2-advertise 36: vhid=5 advbase=1 advskew=120 authlen=7 counter=5446066235882559564
12:57:42.115312 IP 192.168.100.7 > 224.0.0.18: CARPv2-advertise 36: vhid=5 advbase=1 advskew=120 authlen=7 counter=5446066235882559565

Looking at its own CARP adverts (pfsense 2.3.4):

Code: [Select]

[2.3.4-RELEASE][root@localhost]/root: tcpdump -T carp -ni em0 vrrp and host 192.168.101.2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em0, link-type EN10MB (Ethernet), capture size 65535 bytes
12:59:24.369135 IP 192.168.101.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=16106432045254150054
12:59:25.370135 IP 192.168.101.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=16106432045254150054
12:59:26.371129 IP 192.168.101.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=16106432045254150054
12:59:27.372131 IP 192.168.101.2 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=16106432045254150054

I can see from a 2004 presentation on OpenBSD CARP that the replay detection counters were "not implemented yet" at that time.
See page 10 in this pdf: https://cyber-defense.sans.org/resources/papers/gsec/carp-free-fail-over-protocol-106433

The FreeBSD initial commit of CARP code seems to have been in 2005 and to have included the incrementing counter (sc_counter++) here:
https://github.com/freebsd/freebsd/blob/e1d22638d0a8257ed01b7f95d1b6d5cef74ebd07/sys/netinet/ip_carp.c#L747

The above code with sc_counter++ is what is present in opnsense github source.

Finally, this pfsense code on github is supposed to be from Freebsd with their changes:
https://github.com/pfsense/FreeBSD-src/blob/RELENG_2_2/sys/netinet/ip_carp.c#L715

Does anyone have any comments on the history of this?
Am I somehow confused or totally out of the loop and this is a known incompatibility?
opnsense must have switched to the FreeBSD version of ip_carp.c at some point, when the fork occurred?

I haven't looked at tcpdump code CARP parsing code yet, but it's possible it is parsing 1 of the two counters and pfsense is using the other one. Either way it doe snot seem to be compatible with opnsense or FreeBSD or Ucarp.

I have a couple more pairs to migrate, I dont see where even the latest version of pfsense has the incrementing counter (based on code shown on github). I'm not sure if upgrading to the latest pfsense would get me an incrementing counter so i could more easily migrate. I guess I'll have to bring one up in a VM and check.

If I'm right, this CARP incompatibility is an important thing to know if you are planning to "smoothly" migrate a redundant pair of firewalls from pfsense to opnsense... they will not cooperate on CARP so it's a hard cut from one to the other, not gracefully anyway.

18.7 Legacy Series / Firewall Rules- single click enable/disable no longer possible for Reject rules

« on: August 09, 2018, 02:57:48 pm »

I searched a little bit on the forum and did not see this issue mentioned.

After upgrading to 18.7, my firewall "reject" rules still work and can be edited and enabled/disabled if I click the Edit icon (pencil) for the rule. Then on the Edit Firewall Rule page i can check/uncheck the "Disabled: X Disable this Rule" check box and everything works as expected

The issue I'm seeing is with the "single click to enable/disable" rules from the Firewall: Rules: LAN page ( the list of all rules for LAN)

If the Reject rule is currently enabled, I can click the red circle with white X icon and the rule is disabled, I can then Apply the change and the rule is actually changed to disabled.

But, after disabling a reject rule, the Firewall: Rules: LAN page has no icon for that disabled reject rule. Normally (before my upgrade) it would have a grayed out circle with white X in it which you could click to Enable the rule, and then Apply/save the change.

As things are now since the upgrade I have to click the Edit icon(pencil) on the far right of the disabled Reject rule to load the edit page for that rule, then uncheck the Disable checkbox an save/apply to re-enable the rule.

All the Accept/Pass rules I have still work as they did before the upgrade.... I can enable/disable with a single click from the Firewall: Rules: LAN page and they show a grayed out triangle or green triangle indicating the enable/disable state.

I have a redundant pair of firewalls running CARP and both act this same way. I have 4 or 5 reject rules and they all operate this same way now.

Anyone else seeing this?

General Discussion / HA sync functionality question

« on: August 03, 2017, 09:27:22 pm »

Ok, I haven't really looked and definitely haven't dug into the code to find the answer myself but, I'm curious about the actual behind the scenes process of changing a simple firewall rule in a HA pair (changing from the master of course).

Ultimately I'm searching for the reason I see a 30-60 second delay in "applying" a rule change.

In the course of double and triple-checking things I have a simple pass-rule where I click on the green/gray triangle to "enable/disable" and of course it produces the "Apply" button up top. But, what I've found is that after I click the green triangle to either enable or disable the rule, it propagates to gui on the HA peer (without ever clicking apply).

I have monitored the active pf ruleset before/after enable/disable and before/after clicking the Apply button. I need to redo the testing because I am not sure what I saw. I think sometimes I wasn't waiting long enough between checks and a previous click of the Apply button took effect on the active pf rules.

Anyway, what I'm most confused by is that when i click to disable/enable the rule, the change propagates to the web gui on the peer in a couple seconds and local page refresh is complete. When I click Apply, it takes over 30 seconds for page refresh to finish but I see all the xmlsync traffic (via tcpdump) occur within a couple seconds.

If i disable the HA Sync, of course there is no delay in page refresh after apply.

I've gone over my entire config, compared to other setups and online examples. I'm not new to the opnsense/pfsense HA setup, I've had one running at home and a couple at work since the pfsense v1.x days.

I haven't not had time to dig into the web gui code to figure out what's supposed to happen as far as the xmlrpcsync and filter reload etc on local and peer. I'm hoping someone knows and has the time and patience to tell me what's supposed to happen.

Thanks!

Development and Code Review / [SOLVED] parse failure in list_arp.py causes empty Intf/Diag/ARP list

« on: December 13, 2016, 08:00:42 pm »

Just registered to report this error I ran into after to migrating to OPNsense.

I am pretty sure I have the issue pinned down to the lease.find and dhcp_ipv4_address assignment in:

Code: [Select]

leases = open(dhcp_leases_filename, 'r').read()
        for lease in leases.split('}'):
            if lease.find('{') > -1:
                dhcp_ipv4_address = lease.split('{')[0].split('lease')[1].strip(
)
                if lease.find('client-hostname') > -1:
                    dhcp_leases[dhcp_ipv4_address] = {'hostname': lease.split('client-hostname')[1].strip()[1:-2]}

Things work good when only "lease" objects appear in dhcpd.leases file. They still work when the "failover peer" object appears as the very first object, just after the ISC boilerplate comments at the top.

When the "failover peer" object appears further down after other "lease" object(s), like this:

Code: [Select]

lease 192.168.99.242 {
  starts 2 2016/12/13 18:05:23;
  tstp 2 2016/12/13 18:05:23;
  tsfp 2 2016/12/13 18:05:23;
  atsfp 2 2016/12/13 18:05:23;
  binding state backup;
}
failover peer "dhcp_lan" state {
  my state normal at 2 2016/12/13 16:30:14;
  partner state normal at 2 2016/12/13 17:05:23;
}

I'm guessing the lease.find("lease") returns an array with a single item... at index 0, which then causes the error:

Code: [Select]

root@fwa:/usr/local/opnsense/scripts/interfaces # ./list_arp.py
Traceback (most recent call last):
  File "./list_arp.py", line 51, in <module>
    dhcp_ipv4_address = lease.split('{')[0].split('lease')[1].strip()
IndexError: list index out of range
root@fwa:/usr/local/opnsense/scripts/interfaces #

The reason it works normally is that the failover peer object appears first after the ISC comments as shown here:

Code: [Select]

# The format of this file is documented in the dhcpd.leases(5) manual page.
# This lease file was written by isc-dhcp-4.3.5

# authoring-byte-order entry is generated, DO NOT DELETE
authoring-byte-order little-endian;


failover peer "dhcp_lan" state {
  my state normal at 2 2016/12/13 16:30:14;
  partner state normal at 2 2016/12/13 17:05:23;
}

The comments above the lease object contain the token "lease" (twice!), so that lease.split[0] and [1] exist.

So the leases file gets parsed to a non ip address with blank hostname for one item in a failover setup. But there is no corruption of the arp table output by the script since lease.split[1] contains text and not an ip address, plus no client-hostname is found for this iteration and never gets used to provide hostname detail to any actual arp -a table IP that are looked up later in list_arp.py. At least that is my assumption, I didn't read any further down.

I'm not sure why my failover peer object appeared further down in the leases file. I just know that it's written at the discretion of dhcpd. While I was troubleshooting things the second failover peer object was cleared by the running DHCP server and only appeared at the top again... and my WebGui ARP Table output was working.

I don't know Python much but it seems if you used whatever needed to perform the following:

Code: [Select]

if (lease.find('{') > -1) && (lease.find(something that matches regex "^lease") > -1):

it might fix the problem.

Thanks!

Pages: [1]