[SOLVED] HE.NET GIF tunnel never comes up.

Started by 5SpeedFun, February 02, 2022, 05:01:01 AM

Previous topic - Next topic
February 02, 2022, 05:01:01 AM Last Edit: April 22, 2022, 04:12:30 PM by 5SpeedFun
This has always been normal at startup.  I have to go into interfaces -> Other Types -> Gift  -> Edit/Save and it comes up immediately.

However on the 22.1 upgrade, I couldn't get it to come up even then.  This is my 6in4 tunnel to HE.net

Anyone know if this is expected or if something else has changed in 22.1 where it may not come up?  The only thing I see in the release notes is:

"interfaces: align GIF configuration with base system options".

This doesn't have any practical meaning to me.  Am I supposed to be configuring it differently?

Well, it sounds like some interface rework in 22.1 could be the cause of this, but I suspect it also means your WAN comes up later than expected leaving the tunnel deactivated when it is first attempted to bring it up. As far as I understand such timing issues could have been in 21.7 as well but not trigger that easily.

Can you share the system log? It should have a message or two relevant to the attempts to bring up GIF and failing.


Thanks,
Franco

February 02, 2022, 03:23:38 PM #2 Last Edit: February 20, 2022, 12:06:27 AM by 5SpeedFun
The gif tunnel never came up in older versions on boot.  I've posted about similar issues IIRC on github, and was told to tie services to a loopback interfaces, however I don't think that's possible when assigning a public IP I need on a specific interface.

In any case, after boot I could click "edit" and just click "save" and that would cause it come up, but that is not the case now.  In any case I've rolled back to 21.7.8.  I won't be able to look at this again for a few days and at that point I'll snapshot the vm this is running on, upgrade, and then I can pull logs.

Edit: Re-read your post and see i can pull this from system logs.

Here is a log (21.7.8) which doesn't come up at boot:

/usr/local/etc/rc.bootup: The command '/sbin/ifconfig 'gif0' tunnel '' '184.105.253.14'' returned exit code '1', the output was 'ifconfig: error in parsing address string: Name does not resolve'

This is odd as the gif config doesn't have any hostnames in it as you can see above.

Thanks!  I appreciate all your work on OPNSense.  Overall I have no regrets moving to OPNSense from Pfsense & the developers should be proud of all the work that was put into it.

The issue seems to be here:

https://github.com/opnsense/core/blob/master/src/etc/inc/interfaces.inc#L795

It's not returning an address, probably because the one it's looking for is set via DHCP later.

Can you share your GIF config?

In theory it should rerun the tunnel when DHCP kicks in. What is the IPv4 mode of the interface the GIF is running on?


Cheers,
Franco

February 02, 2022, 06:04:49 PM #4 Last Edit: February 02, 2022, 06:53:02 PM by 5SpeedFun
I have a static /29, so I'm using an address from that static block.  The actual interface is sfxge0_vlan10, -- the ip isn't just on a  physical interface, it's on the vlan interface.  I wonder if that matters?

Parent interface is my internet connection interface as mentioned above.
Remote ipv4 address (also static as it's he.net)
And then the 2 peer ip addresses for each side of the tunnel, also both static.  Nothing complicated.

If the actual addresses matter let me know.

February 03, 2022, 08:24:36 AM #5 Last Edit: February 03, 2022, 08:27:11 AM by franco
VLANs are created before GIFs and if they do have a static address then that is added immediately at least as long as it doesn't have any complications like DHCPv6 tracking or bridging.

Can you provide the output of

# /usr/local/etc/rc.reload_all

which shows us the configuration order. I suspect that GIF is created before sfxge0_vlan10 assigned interface which could cause this to happen.

Also sfxge0_vlan10 is assigned as an interface with a static configuration or do you use a VIP on it? What exactly is selected on GIF parent interface drop down? I don't need an address if one is in there, just the type of string that's in there.


Cheers,
Franco

February 03, 2022, 03:21:16 PM #6 Last Edit: February 03, 2022, 04:01:26 PM by 5SpeedFun
Unfortunately I have both DHCP6-PD tracking on this interface, as well as VIP or two as well.


/usr/local/etc/rc.reload_all
Writing firmware setting...done.
Writing trust files...done.
Configuring login behaviour...done.
Configuring CRON...done.
Setting timezone...done.
Setting hostname: edge01.xxxxxxxx.net
Generating /etc/hosts...done.
Generating /etc/resolv.conf...done.
Configuring loopback interface...done.
Creating wireless clone interfaces...done.
Configuring VLAN interfaces...done.
Configuring Loopback interfaces...Creating OpenVPN instances...done.
Configuring DMZ_Zimbra interface...done.
Configuring DMZ_mail interface...done.
Configuring DMZ_pihole interface...done.
Configuring DMZ_plex interface...done.
Configuring DMZ_www interface...done.
Configuring LAN interface...done.
Configuring LAN_NET_MGMT interface...done.
Configuring Lo1 interface...done.
Configuring Lo2 interface...done.
Configuring Lo3 interface...done.
Configuring TRANSIT interface...done.
Configuring WG_xxxxx interface...done.
Configuring WG_Josh interface...done.
Configuring WG_Parents interface...done.
Configuring GIF interfaces (1)...done.
Configuring GIF interfaces (2)...done.
Configuring HE_Chicago interface...done.
Configuring LAN_Xbox interface...done.
Configuring InternetBiz interface...done.
Creating IPsec VTI instances...done.
Setting up routes...done.
Configuring firewall........done.
Starting DHCPv4 service...done.
Starting router advertisement service...done.
Starting NTP service...done.
Configuring OpenSSH...done.
Starting Unbound DNS...done.
Starting web GUI...done.
Syncing OpenVPN settings...done.
Generating RRD graphs...done.
Stopping named.
Waiting for PIDS: 59873.
Stopping php_fpm.
Waiting for PIDS: 90780.
Stopping zebra.
Waiting for PIDS: 51609.
[#] rm -f /var/run/wireguard/wg0.sock
[#] rm -f /var/run/wireguard/wg1.sock
[#] rm -f /var/run/wireguard/wg2.sock
[#] ifconfig wg create name wg0
[!] Missing WireGuard kernel support (ifconfig: SIOCIFCREATE2: Invalid argument). Falling back to slow userspace implementation.
[#] wireguard-go wg0
┌──────────────────────────────────────────────────────┐
│                                                      │
│   Running wireguard-go is not required because this  │
│   kernel has first class support for WireGuard. For  │
│   information on installing the kernel module,       │
│   please visit:                                      │
│         https://www.wireguard.com/install/           │
│                                                      │
└──────────────────────────────────────────────────────┘
[#] wg setconf wg0 /dev/stdin
[#] ifconfig wg0 inet 192.168.30.1/27 alias
[#] ifconfig wg0 inet6 xxxx:xxxx:xxxx:f1c9::1/64 alias
[#] ifconfig wg0 mtu 1420
[#] ifconfig wg0 up
[#] route -q -n add -inet 192.168.30.7/32 -interface wg0
[#] route -q -n add -inet 192.168.30.6/32 -interface wg0
[#] route -q -n add -inet 192.168.30.5/32 -interface wg0
[#] route -q -n add -inet 192.168.30.4/32 -interface wg0
[#] route -q -n add -inet 192.168.30.3/32 -interface wg0
[#] route -q -n add -inet 192.168.30.2/32 -interface wg0
[+] Backgrounding route monitor
[#] ifconfig wg create name wg1
[!] Missing WireGuard kernel support (ifconfig: SIOCIFCREATE2: Invalid argument). Falling back to slow userspace implementation.
[#] wireguard-go wg1
┌──────────────────────────────────────────────────────┐
│                                                      │
│   Running wireguard-go is not required because this  │
│   kernel has first class support for WireGuard. For  │
│   information on installing the kernel module,       │
│   please visit:                                      │
│         https://www.wireguard.com/install/           │
│                                                      │
└──────────────────────────────────────────────────────┘
[#] wg setconf wg1 /dev/stdin
[#] ifconfig wg1 inet 192.168.30.33/29 alias
[#] ifconfig wg1 inet6 xxxx:xxxx:xxxx:ac09::1/64 alias
[#] ifconfig wg1 mtu 1420
[#] ifconfig wg1 up
[#] route -q -n add -inet6 xxxx:xxxx:xxxx::c01a::35/128 -interface wg1
[#] route -q -n add -inet 192.168.30.35/32 -interface wg1
[#] route -q -n add -inet 192.168.30.34/32 -interface wg1
[+] Backgrounding route monitor
[#] ifconfig wg create name wg2
[!] Missing WireGuard kernel support (ifconfig: SIOCIFCREATE2: Invalid argument). Falling back to slow userspace implementation.
[#] wireguard-go wg2
┌──────────────────────────────────────────────────────┐
│                                                      │
│   Running wireguard-go is not required because this  │
│   kernel has first class support for WireGuard. For  │
│   information on installing the kernel module,       │
│   please visit:                                      │
│         https://www.wireguard.com/install/           │
│                                                      │
└──────────────────────────────────────────────────────┘
[#] wg setconf wg2 /dev/stdin
[#] ifconfig wg2 inet 192.168.30.41/29 alias
[#] ifconfig wg2 inet6 xxxx:xxxx:xxxx:xxxx::1/64 alias
[#] ifconfig wg2 mtu 1420
[#] ifconfig wg2 up
[#] route -q -n add -inet 192.168.30.42/32 -interface wg2
[+] Backgrounding route monitor
Checking zebra.conf
2022/02/03 08:57:33 ZEBRA: [EC 4043309111] Disabling MPLS support (no kernel support)
OK
Starting zebra.
2022/02/03 08:57:33 ZEBRA: [EC 4043309111] Disabling MPLS support (no kernel support)
Performing sanity check on php-fpm configuration:
[03-Feb-2022 08:57:33] NOTICE: configuration file /usr/local/etc/php-fpm.conf test is successful

Starting php_fpm.
setup sfxge0_vlan10 [egress only]
setup sfxge0_vlan100
setup sfxge0_vlan99
setup sfxge0_vlan120
setup lo1
ngctl: send msg: No such file or directory
error lo1: cannot create netflow node for lo1
setup wg0
ngctl: send msg: No such file or directory
error wg0: cannot create netflow node for wg0
Starting named.



Ok the good news is I understand the problem. The other good news is I always wanted to clean this up, but then the bad news is this is rather complex and might break things while testing it. Would you be up for it?


Cheers,
Franco

I'd love to help! 

You guys in the OPNSense project have done so much for the community, this is the least I could do!

This is all in a VM with a pass through nic so it's easy for me to snapshot, try things & roll back if necessary.

Just let me know what you need me to do/try.

Thanks!

Rob


Hi Rob,

Great, I added a feature ticket here for reference https://github.com/opnsense/core/issues/5540 and will report back in a bit.


Cheers,
Franco

As a HE.net IPv6 tunnelbroker user myself, currently doing a P-o-C on OPNsense 22.1, this thread got my attention.

I've got to say, how impressed I am, that it so quickly led to a code feature request being opened.

Thumbs up !

Ok, so..... I've worked through most of the boot sequence once now and here is the plan:

Since the changes are large there is no use throwing opnsense-patch commands around so we are going to ask anyone willing to try the development release bundled with the upcoming 22.1.1 first.

This is to ensure we haven't made broad cleanup errors and is not specifically addressing the boot order problem just yet. The plan with the current code rework is to simplify things for the next step which is reordering according to interface requirement chain. It's going to be a lot easier with all the side effects and failsafe code removed.

I'll follow up again once 22.1.1 is out next week.


Cheers,
Franco

Excellent news!  When it's ready let me know and I will go from 21.7.8 -> 22.1 -> 22.1.1 and do some testing.

Thanks so much!

Robert

Small report. We are switching to plan B. :)

22.1.1 development version should fix the problem class reported by Robert. While the change was conceptually simple the challenge was to "record" which devices (scope is GIF, GRE and bridges for now) need to be configured and only configure them once when that becomes possible. I think prior to this the code would try to start things every time it thought it needed to resolve a problem proactively and the boot sequence differed from the full reconfiguration cycle after boot (rc.reload_all script via console option 11) in that regard which didn't help with boot-bound issues.

GUI-only cleanups related to this change will already be available with the community version of 22.1.1 to reduce code differences and are easier to test than the actual reordering changes in said development release.

All things considered this is good news for other network device types (VLAN, LAGG, OpenVPN, IPsec etc.) which can benefit from this and eventually integration of new network types should be possible via standalone plugins. It will also help with MVC migration of these network devices GUI-wise and move along QinQ support which is currently planned for 22.7.

22.1.1 is scheduled for next week as mentioned earlier.


Cheers,
Franco

22.1.1 is out now. No plan C so if you can just try the bundled development version if tunnels come up correctly after boot. Looks promising from our end.


Cheers,
Franco