6RD Default Route Issue (19.7.5, 20.1, 20.7, 21.1.a_201)

Started by cdine, November 28, 2020, 12:47:23 AM

Previous topic - Next topic
I'm hoping this can be a place folks can work on the current 6RD default gateway issue on OPNsense that has apparently been present since around version 20.1 from what I've read.

I think the best background and most complete set of information so far is in the commentary of the following GitHub issue. The issue is currently closed, because it never got claimed by someone to do the work.. because of that, I thought starting a fresh forum thread may be best for continued conversation until a concrete problem and fix is determined (I have no interest in discussing or arguing about the project's choice to close issues in the way they do, which caused some spiciness on that issue).

GitHub Issue:
https://github.com/opnsense/core/issues/3903

Other forum posts about this issue & version mentioned:
https://forum.opnsense.org/index.php?topic=14789.0 - 19.7.5
https://forum.opnsense.org/index.php?topic=18571.0 - 20.7
https://forum.opnsense.org/index.php?topic=19691.0 - 20.1 (mentioned that 6RD worked in 19.7)

There may be others, these are just the ones I have come across in my search.

I have not done any development on OPNSense, so I don't quite know where to start in terms of making my own test builds and the like, but I'd be willing to put in effort to get a local dev/test environment in place to be able to work on this.

For now, I can offer up logs of my own setup and even offer access to a dedicated OPNSense install for this issue, which is not in production (i.e. it can be messed with/broken and that's fine). My setup is Centurylink Small Business Fiber (GPON), and their PPPoE / 6rd setup is identical between residential and small business.

I've ran in to this issue on 20.1, 20.7, and currently 21.1.a_201 (the latest development version as of this posting) - I am currently using the OpenSSL variant.

To add some of my own debug info, here's sections of system.log (obtained via `clog -f /var/log/system.log`) when I apply my WAN interface configuration:

IPv4 Configuration Type = PPPoE; IPv6 Configuration Type = None


  • PPPoE Username/Password: Filled out w/ CenturyLink details
  • Everything else empty/default

I'm including this to show that the messages about the pppoe0 interface not existing (order of operations maybe?) are present regardless - initially I thought it could be important, but I doubt it.


Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure openvpn_prepare (,pppoe0)
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure openvpn_prepare (execute task : openvpn_prepare(,pppoe0))
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: /interfaces.php: The command '/sbin/ifconfig 'pppoe0' inet6 -accept_rtadv' returned exit code '1', the output was 'ifconfig: interface pppoe0 does not exist'
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: /interfaces.php: The command `/sbin/ifconfig 'pppoe0' up' failed to execute
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: /interfaces.php: The command '/usr/sbin/ngctl msg 'pppoe0': setautosrc 1' returned exit code '71', the output was 'ngctl: send msg: No such file or directory'
Nov 27 15:28:34 OPNsense kernel: ng0: changing name to 'pppoe0'
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: /interfaces.php: ROUTING: entering configure using 'opt2'
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure ipsec (,opt2)
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure ipsec (execute task : ipsec_configure_do(,opt2))
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure dhcp ()
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure dns ()
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure dns (execute task : dnsmasq_configure_do())
Nov 27 15:28:34 OPNsense opnsense-devel[86324]: plugins_configure dns (execute task : unbound_configure_do())
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: /interfaces.php: ROUTING: entering configure using defaults
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure monitor ()
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure monitor (execute task : dpinger_configure_do())
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (,opt2)
Nov 27 15:28:35 OPNsense kernel: pflog0: promiscuous mode disabled
Nov 27 15:28:35 OPNsense kernel: pflog0: promiscuous mode enabled
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (execute task : dyndns_configure_do(,opt2))
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (execute task : ntpd_configure_defer())
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (execute task : opendns_configure_do())
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (execute task : openssh_configure_do(,opt2))
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (execute task : unbound_configure_do(,opt2))
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (execute task : vxlan_configure_interface())
Nov 27 15:28:35 OPNsense opnsense-devel[86324]: plugins_configure newwanip (execute task : webgui_configure_do(,opt2))


IPv4 Configuration Type = PPPoE; IPv6 Configuration Type = 6rd Tunnel


  • PPPoE Username/Password: Filled out w/ CenturyLink details
  • 6RD prefix: 205.171.2.64
  • 6RD Border Relay: 205.171.2.64
  • 6RD IPv4 Prefix length: 0 bits (default)
  • 6RD IPv4 Prefix address: Auto-detect (default)
  • Everything else empty/default


Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure openvpn_prepare (,pppoe0)
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure openvpn_prepare (execute task : openvpn_prepare(,pppoe0))
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: The command '/sbin/ifconfig 'pppoe0' inet6 -accept_rtadv' returned exit code '1', the output was 'ifconfig: interface pppoe0 does not exist'
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: The command `/sbin/ifconfig 'pppoe0' up' failed to execute
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: The command '/usr/sbin/ngctl msg 'pppoe0': setautosrc 1' returned exit code '71', the output was 'ngctl: send msg: No such file or directory'
Nov 27 15:36:21 OPNsense kernel: ng0: changing name to 'pppoe0'
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: The interface IPv4 address '' on interface 'pppoe0' is invalid, not configuring 6RD tunnel
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: ROUTING: entering configure using 'opt2'
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: ROUTING: IPv6 default gateway set to opt2
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: ROUTING: skipping IPv6 default route
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure ipsec (,opt2)
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure ipsec (execute task : ipsec_configure_do(,opt2))
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure dhcp ()
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure dhcp (execute task : dhcpd_dhcp_configure())
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure dns ()
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure dns (execute task : dnsmasq_configure_do())
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure dns (execute task : unbound_configure_do())
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: ROUTING: entering configure using defaults
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: ROUTING: IPv6 default gateway set to opt2
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: ROUTING: skipping IPv6 default route
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure monitor ()
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure monitor (execute task : dpinger_configure_do())
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: /interfaces.php: The WAN_CL_6RD monitor address is empty, skipping.
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (,opt2)
Nov 27 15:36:21 OPNsense kernel: pflog0: promiscuous mode disabled
Nov 27 15:36:21 OPNsense kernel: pflog0: promiscuous mode enabled
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (execute task : dyndns_configure_do(,opt2))
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (execute task : ntpd_configure_defer())
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (execute task : opendns_configure_do())
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (execute task : openssh_configure_do(,opt2))
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (execute task : unbound_configure_do(,opt2))
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (execute task : vxlan_configure_interface())
Nov 27 15:36:21 OPNsense opnsense-devel[7331]: plugins_configure newwanip (execute task : webgui_configure_do(,opt2))



As others have noted, my correct IPv6 gateway does get written to both /tmp/opt2_stf_defaultgwv6 and /tmp/opt2_stf_routerv6.


Possibly/probably relevant code locations for this:


So, there's what I have so far - does anyone else have suggestions on next steps, or additional debug information they have gathered on the issue? As I mentioned, I'm happy to do whatever I can to help here, including provide access to this test opnsense install and the like.

So I've spent some more time on this today, I don't have a major win to report yet but thought I'd share my progress (and minor findings... unless I'm just totally misunderstanding the internal gateway model architecture, which is also possible!).

First off, it's a lot easier to get started with development than I expected, kudos to the OPNsense folks for making that a quite easy process to jump in to, just follow the single page here and you can start messing around with things in the core repository in under 5 minutes.

https://wiki.opnsense.org/development/workflow.html#packages

tl;dr: git clone https://github.com/opnsense/core ; cd core ; make mount  8)

Ok, so once I did that, I basically started adding log_error() calls everywhere that I thought maybe relevant, particularly around any handling of gateway objects.

In doing so, I found out the following - and again, while it seems interesting to me, I may be totally off-base in my understanding of how the internal gateway models are supposed to work, so consider this purely exploratory so far. All configuration is still the same from post above.

$gateway['gateway'] is empty/null when this conditional checks if it's present: https://github.com/opnsense/core/blob/21.1.a/src/etc/inc/system.inc#L494

This is due to $gateways not containing a gateway item for the 6RD interface in this object: https://github.com/opnsense/core/blob/21.1.a/src/etc/inc/system.inc#L486

I verified that by examining the contents of the all gateway objects object at the time, by adding a new debug variable with the result of $gateways->getGateways(). Here's the state of the 6rd interface when I checked it:


    [000020000000004] => Array
        (
            [interface] => opt2
            [weight] => 1
            [ipprotocol] => inet6
            [name] => WAN_CL_6RD
            [descr] => Interface WAN_CL_6RD Gateway
            [monitor_disable] => 1
            [if] => pppoe0
            [dynamic] => 1
            [virtual] => 1
            [priority] => 254
        )




It also seems possibly worth noting that the Gateways.php code uses the temp files /tmp/[IF]_router[FSUFFIX] - but that code doesn't ever seem to do anything with the /tmp/[IF]_defaultgw[FSUFFIX] files. I'm not yet positive what code is responsible for writing those files for the 6rd interfaces.


I tried to work backwards in releases to where I could reproduce the known-good condition, possibly believed to be in something between 19.0 and 19.5, but I actually didn't succeed in doing so. I tried the following, for each release (git tag) of the core repository, live-mounted on my dev system:


Example command to grab a branch:

git checkout tags/19.7.4 -b 19.7.4-branch

tags/19.7
tags/19.7.1
tags/19.7.2
tags/19.7.3
tags/19.7.4
tags/19.7.5
tags/21.1.a
tags/21.1.a

Also master as of tonight (d70a1aae03a716508aca9439a7db0f1abbe34957)

Sadly, while the interface/gateway code definitely changed whenever I checked out a different tag (verified by different log messages, as those have changed quite a bit over the 19.x releases), I was unable to get to a state that configured a default inet6 gateway at all upon applying the interface configuration. I wonder if there may be some limitation of the live mounted development/debug process here, vs needing to use a complete system image - there's probably something else that's different which my above methodology just isn't catching.

So, that's where I'm at right now. Happy to continue on this, and I welcome suggestions from anybody with ideas for what to try next :)

I have no idea about 6rd and also cant reproduce therefore. All I can say is that Gateway Code got a big rewrite with 19.7 release to add Gateway priorities. Just check the major release notes.

I will follow your quest and happy to help where possible

Thanks @mimugmail! Based on the files I've identified (and others in core), would you expect any need to do a full system build to test, vs. just making changes in a live-mounted source tree? I'm going to try some more hunting with that method, but if there's something I may be missing with that method it'd be good to know.

Perhaps another things you know a bit more about the /tmp/[IF]_router[FSUFFIX] and /tmp/[IF]_defaultgw[FSUFFIX] files in general - their origin and use, and how they differ? I imagine they may just be duplicated due to some legacy reasons, but I'm not quite sure.

Thanks again.

You dont even need to make mount. You can edit files directly in /usr/local/opnsense or /usr/local/www

To revert just force a refresh like pkg install  -f opnsense

Difference between Router and defaultgw COULD be the option to mark a gateway as upstream, so it's choose as default when best prio, or dont use it, If it's an internal gateway like a core switch.

If you live in CEST you can reach core devs in IRC, way easier for such quick questions

Are you still working on this? Easiest thing would be to test against 19.1

Hi, I hadn't after hitting a wall testing all the 19.7 versions, but I still have the test VM up so I'll see about giving it a try with 19.1 versions soon. Thanks for the suggestion.

So I gave this a shot again with tags/19.1 and even back to 18.7, both with no luck. To be honest, I'm not sure my testing strategy is good here - there's just a lot about the system that I'm not familiar with unfortunately.

Here's the script that I've been using to test (but I've also been checking the routing table manually and inspecting the web UI): https://gist.github.com/craSH/5f3996f04387522f3daaf8ee214d8754

But did you check manually if interface config is correct? Usually restoring config and couple of reboots should show if everything is working or not

After @mimugmail suggested trying a clean 19.1.x install, I did so this morning and can report that the bug is NOT present in that version. Simply adding the 6rd config to my WAN interface does result in it electing the proper default inet6 gateway, and this is reflected by both netstat -nr -f inet6 | grep default, as well as the opnsense gateway UI.

I may try my previous approach of live-mounting the git repo for different versions from this one onward to see if I can get it to break that way, in which case I'd be closer to being able to actually identifying the code change that introduced the bug! It's a bit of time/effort though and I'm not sure I'll get to it this weekend - but I wanted to share my status so folks can know it's still being worked on.

Here's some screenshots of the various configs + outputs with 19.1.4:


root@OPNsense:~ # netstat -nr -f inet6 | grep default
default                           2602:61:7181:4c00::cdab:240   UGS     wan_stf





You can update to latest 19.1 and it will work, after 19.7 it will break

Is there currently a fix or workaround?
Setting the default route manually after every reboot is a bit... let's call it inefficient.

I'm a software dev (limited knowledge on system programing though...) and I'd love to help ;)

The biggest problem is that every dev which enough knowledge doesn't have 6RD available to test.

Hi, I also have this problem so I did some research today.

The logs show that the IPv6 default route is just ignored:

Jan 23 20:24:05 gate1 opnsense[75480]: /interfaces.php: ROUTING: entering configure using defaults
Jan 23 20:24:05 gate1 opnsense[75480]: /interfaces.php: ROUTING: IPv4 default gateway set to opt2
Jan 23 20:24:05 gate1 opnsense[75480]: /interfaces.php: ROUTING: setting IPv4 default route to 144.2.xxx.xxx
Jan 23 20:24:05 gate1 opnsense[75480]: /interfaces.php: ROUTING: keeping current default gateway '144.2.xxx.xxx'
Jan 23 20:24:05 gate1 opnsense[75480]: /interfaces.php: ROUTING: IPv6 default gateway set to opt2
Jan 23 20:24:05 gate1 opnsense[75480]: /interfaces.php: ROUTING: skipping IPv6 default route


I believe the issue is the following lines were missed during the gateway handling rewrite of release 19.7:
https://github.com/opnsense/core/blob/stable/19.1/src/etc/inc/gwlb.inc#L879-L889

As a dirty workaround I did the following:

diff --git a/src/opnsense/mvc/app/library/OPNsense/Routing/Gateways.php b/src/opnsense/mvc/app/library/OPNsense/Routing/Gateways.php
index b1efd58d5..bc733d472 100644
--- a/src/opnsense/mvc/app/library/OPNsense/Routing/Gateways.php
+++ b/src/opnsense/mvc/app/library/OPNsense/Routing/Gateways.php
@@ -261,6 +261,9 @@ class Gateways
                         $gwkey = $this->newKey($thisconf['priority'], !empty($thisconf['defaultgw']));
                         // gateway should only contain a valid address, make sure its empty
                         unset($thisconf['gateway']);
+                        if ($ifcfg['ipaddrv6'] == '6rd' && file_exists("/tmp/{$thisconf['interface']}_stf_routerv6")) {
+                            $thisconf['gateway'] = trim(@file_get_contents("/tmp/{$thisconf['interface']}_stf_routerv6"));
+                        }
                         $this->cached_gateways[$gwkey] = $thisconf;
                     } elseif (empty($thisconf['virtual'])) {
                         // skipped dynamic gateway from config, add to $dynamic_gw to handle defunct


Now the default gateway is added correctly again for the 6rd setup.

Unfortunately this is still not enough for my connection as my new ISP only gives me a single /64 which leads in the following unsolved issue:
https://github.com/opnsense/core/issues/4025
I was unable to solve that one in a proper way - it may be FreeBSD related...

Hope someone else can use this information.

Regards

Wow, nice work. The code diverged a lot, but I think I'm seeing the actual issue with Gateways.php: it tries to read the wrong _routerv6 file, because the interface for 6RD is "xxx_stf" and not "xxx".

Let me try to fix this. :)


Cheers,
Franco