Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - debrucer

#1
It's always the keys, isn't it?

Problem solved. I'm closer to understanding why it wasn't working, perhaps it was a number of things; but clearly, I was not doing things correctly. It's still not where I want it to be; but, it's closer today than yesterday.
#2
I am getting the following error when in my Windows 10/11 peer interface:

[TUN] [wg1] Handshake for peer 1 (172.17.1.204:51820) did not complete after 5 seconds, retrying (try 6)

I can get rid of the error by creating a network bridge, connecting my interface to a NIC in my laptop. The message goes away; but if you follow the ipconfig output, the interface no longer has the address used for the tunnel, or the destinations. The negative message is gone; but still no handshake.

Is my client interface supposed to have internet or not? After doing tracert a few dozen times, it appears that it should not. (This is not to say that my laptop itself doesn't have internet, just the WG interface does not).

The second attachment has my remote server (in black) and my local client (laptop) in blue. These are the definitions with client connected and throwing the error messages.

When you add the bridge, in addition to the change mentioned (missing IPs) the definition on the server changes to just include the interface (clients are gone from the server).

The most frequent answer that I see is a post where the person changes the file name from wg1 to wg0. This may have been "a" reason. It is not "the" reason.
#3
When I hear "road warrior" I now envision a product this is used for a particular type of user: a road warrior. Nothing wrong with that; but the implementation details are still wrapped in a commercial-ish setup. The rules, the software, the servers are not all in my control. It's been setup to do a job, and apparently it does it well. I'm not doing a very good job of explaining my position here. Even wg and wg-quick don't work perfectly together. Quick puts parameters in the configs that wg can't interpret. I have attached a now slightly outdated drawing of what I am trying to achieve. I have VPCs in four AWS regions, a copy of OPNsense residing on the public network of each. Each VPC is using two AZs within the region. Each AZ has the one public network and two private subnets: one for applications, one for databases. There are App servers and database servers on each subnet (in only one AZ, the second AZ is not currently used. It will be setup for failover).

My goal is to tunnel everything per the attached drawing, and then, to eliminate the tunnels and use the AWS capability to "share VPCs across regions".  A very similar setup, without tunnels. Okay, I will still need the tunnels from home; but after that, share the resources between VPCs as if they were all in the same location.

That's what I'm trying to do. Pretty short explanation: but you should be able to grasp what I'm up to here.

#4
Thankyou. That in itself makes perfect sense. It still remains confusing. On Linux I'd be tempted to turn iptables off to start with. I guess I don't know what masquerading is, perhaps not even how to spell it. I'm having a heck of a time with os-wireguard. I previously had wireguard-go working to some extent. But it is my intention not to use Go going forward.

I just wrote a diatribe and deleted it. So frustrated that I don't know where to begin to describe it and don't want to take it out on the world here.

I have not installed the Go components (this time) and have Wg disabled in the console tabs. No interface, no peers. My plan to use and rebuild a tunnel (by hand, outside the console) and three peers and run through wg-quick and pay little attention to them in the console. They do not seem to register status and handshake correctly.

This has been a full-time obsession since mid-November, learning pfSense, then OPNsense, and a bit about networking. I hope to pay strick attention to the man pages on wg and wg-quick, and avoid everything I read elsewhere. Seems to have commercial instructions mixed into every Google post.

I agree that I should be able to do it all from the console. I'm not sure it's all my fault it's not working today.

Thanks again.
#5
Almost every tutorial and all the Wireguard configuration script sites use iptables as part of the Wireguard installation, yet iptables is not part of the VM on AWS. The same is true for pfSense, no iptables. While I had paid support for pfSense, I asked the techs why this was so, and I got some song and dance about people installing packages they don't need. Not much of an answer.

Is iptables required? What is an equivalent set of rules in OPNsense to replace the lack of iptables?

Can I do the required masquerading without iptables?
#6
I would recommend starting with "system/firmware/settings" and pay attention to the "flavour" and "type" settings in particular. "OpenSSL" and "Community" are the two that I am currently using; but I experimented with the settings before getting there. Try different mirrors, first near your location, and then near the countries where there is more going on - like, well, I don't know. Do not rely on "default" or "custom"... Once we got over the hump, these were the only settings we need (clip attached).

Good luck!
David
#7
Not sure I know how to help here. Physical disconnection and reconnection of interfaces isn't necessarily required. If you can get into the system either at the console or through an SSH connection, you can switch interfaces aroundl Since I am using AWS, I can take snapshots at various stages, particularly right before doing something that I've found dangerous. There are two ways to roll back to the snapshot. #1 is to do it while the instance is running performing the "replace root device" option. #2 is to shutdown the instance, detach the drive and then create a new drive from my snapshot, and finally, restarting the host.

I tell you that to explain where and how I came up with a fix. I probably went through that process 20 times until I got the fix. I'd start up fresh, do something to fix it, and when it breaks again, go back and start over. All the time, watching logs and "live view" for any information.

Immediately before the attempt to fix it that worked, I had found that the upgrade was failing with a missing library. (Sorry, totally forget the name of the library; but I couldn't find it). Then I found a message indicating that one component was failing because of this library missing.

In the package's menu, I reinstalled "rrdtool", and ran the upgrade process again. That apparently replaced the missing library.

No failures. It upgraded.

Now, why this was missing, I do not know. And why it wasn't missing on three other instances I upgraded, I don't know either. I do know that after reinstalling rrdtool, the process worked. 23.1 upgraded. This may or may not be a solution for you. My processes were repeatable because of snapshots. In the old day, one would have to go through the entire process to get back to the point to try this, That may be the situation for you on a physical host.

My theory was that getting the ports right fixed my problem. Still, having that library present was a requirement.

Hope this helps.
David




#8
From the documentation:

By default, the system will be configured with 2 interfaces LAN & WAN. The first network port found will be configured as LAN and the second will be WAN.

NIC-0 became LAN and NIC-1 became WAN. My mistake was in attaching the public IP to NIC-0 (the LAN).

At various stages of the upgrade process the software apparently tried to correct my mistakes. Sometimes it goofed. Sometimes it worked. Once this was sorted out there was very little problem.

Everything is accessible now. I'm not sure where or when I went wrong on this, almost total operator error.
#9
Each of our servers has two NICs: NiC-0 = 172.xx.0.135 and NIC-1 = 172.xx.0.251. This setup has worked since mid-November; however, very early in the upgrade process, something switched their definitions. Both NIC-0 and NIC-1 now use the single private IP address of 172.xx.0.251.

In order to connect to these servers after it made the changes and the upgrade failed, all I had to do to get on was go out to the AWS management console and disassociate the public IP address from NIC-0 and assign it to NIC-1. Then I was able to connect through both the GUI and SSH (Putty).

Upgrades seem to be progressing properly now. Everything seems to be accessible; however, I haven't worked it out in my head yet as to possible repercussions. I'll be back if further comment i required.

David
#10
We have one OPNsense server in each of four AWS Regions. Each server is on the public network in that regions AWS Availability Zone (AZ). There are two private subnets in each AZ (AppServer & DBServer subnets). The four VPCs are set up identically except for their IP addresses are chosen not to conflict (172.17.x.x, 172.18.x.x, 172.19.x.x and 172.20.x.x). Prior to upgrading all four from 22.1.11_1, or whatever the latest release was, to the new 23.1, we noticed that while our tunnels still worked, we could not successfully add any new tunnels or peers. We attributated this to a confused configuration with the former os-wireguard, wireguard-go and wp-quick having been used previously, so we elected to upgrade all four servers from the latest available AWS releases. The entire upgrade process was done using only the SSH interface (as ec2-user, then sudo su -) and the menu 12 option to upgrade. There are many steps between the starting version around 21.7, to get to 23.1. Each step on each server took a number of tries, with the required reboots (often failing to restart), requiring another shutdown and startup to get going. We were somewhere near the end. At least a couple of the servers were in the process of getting the final 23.1 fix. Then they started failing and failing to recover after several restart attempts. Log files were scrutinized, and the last logs of three out of the four servers contained the following failure message:

Initializing...
Fatal error: Uncaught Error: Class "Phalcon\Di\FactoryDefault" not found in /usr/local/opnsense/mvc/script/load_phalcon.php:31
Stack trace:
#0 /usr/local/etc/inc/legacy_bindings.inc(29): require_once()
#1 /usr/local/etc/inc/config.inc(84): require_once('/usr/local/etc/...')
#2 /usr/local/etc/rc.bootup(51): require_once('/usr/local/etc/...')
#3 {main}
  thrown in /usr/local/opnsense/mvc/script/load_phal


Anyway, that is the message from a day's work here. No servers running. No tunnels rebuilt.

The error message comes up in Google in 2015, 2020, 2021... nothing current.

Can anyone help me please? Have we seen this error recently?

Thank you,
David