Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - rkubes

#1
25.1, 25.4 Series / Re: sftp backup error
July 16, 2025, 08:31:17 AM
I figured out the issue by adding some debug logging to Sftp.php

If it's okay, I can create a pull request tomorrow - unless it's urgent and someone else needs to get to it sooner.

So, the issue is when the backup files are created remotely, they're named in all lowercase.

However, when the "fileprefix" is made for finding files based on the hostname, the search pattern is not made to lowercase.

Long story short, with this issue, if anyone's hostname has capital letters in it (which maybe isn't common) AND they are configured to include the hostname in the backup config filename, then it will never find the files when it runs the "ls" command in sftp.

Edit:
I should clarify, I didn't read the full post. My above is the solution to the "remote backup returned no files" error. Not to the identify key/permission issue.
#2
25.1, 25.4 Series / Re: sftp backup error
July 16, 2025, 08:13:20 AM
Did you get a solution to this?

I just recently rebuilt the server that was hosting the SFTP, and now I'm getting this error when I try to test the SFTP configuration.

I don't know whether it was working or not before. I know it was successfully saving files before (as does this test), but I don't know if it was ever not getting a list of files on the remote server.
#3
I have two gateways configured on my instance of OPNsense. The primary is connected directly to my primary ISP's equipment and will pull a public IP. This gateway works as expected.

I have a secondary gateway configured for "failover" so that if my primary Internet drops, I can run off this secondary equipment. For reasons beyond my control, there is another router between the OPNsense interface for this gateway, and the actual Internet. So, the DHCP on this interface will pull a private IP in the 10.x.x.x range.

My challenge is I don't keep this secondary Internet up all the time, but the router is up all the time. (It's easier for me to remotely manage the upstream device to make it available when backup is needed). So, if I leave the gateway monitoring on the default, it assumes this gateway is always up, since it can ping that router all day.

However, if I change the Monitor IP of that gateway to 8.8.8.8; it will assume the gateway is down. I can even SSH into the OPNsense instance, and I can run the ping command to hit 8.8.8.8 using that Interface's IP as the source, and it will get responses with no issue. However, OPNsense UI still shows that the interface is "down." All other devices that are connected to that router (I've done this for testing) are able to get out to the Internet without issue and can ping 8.8.8.8.

I've tried both with "Disable Host Route" and without it enabled, and the result is the same. I'm not sure what else I could try.

I don't think it's a firewall rule issue, since this is outbound and I have rules to allow all outbound traffic from my LANs through the Failover gateway group. Moreover, I assume the "ping" is coming from the interface itself, so my LAN firewall rules probably don't even come into play.

As mentioned, all other devices are able to connect just fine through that router. Running ping from the OPNsense SSH shell to 8.8.8.8 is successful. And also if I do something to "force" the gateway up - like disabling monitoring or allowing it to monitor the gateway IP itself (default) - then all connectivity works as expected. It just seems like I must be doing something wrong or not understanding enough how to properly configure to use a "public" monitor IP to track when that interface truly has Internet access.

Any assistance will be appreciated!

Edit:
I apologize I was able to solve this on my own right after posting this. I figured I'd leave this here for reference in case anyone else comes across this same issue.

I noticed the one difference was the "ping" command was sending something like 50+ bytes by default, but under the "advanced" settings for the gateway there was a configuration for the "Data Length" that defaults to just "1". I tried the "ping" command with a packet size of just "1" and then 8.8.8.8 would no longer respond.

I tried then with a packet size of "10" - just picking an arbitrarily larger number and 8.8.8.8 started responding again. I didn't dive deeper to find "what is the minimum data length for a ping that 8.8.8.8 will respond to." However, 10 seems to work. Since I put 10 in for the Data Length, everything is working exactly as expected.
#4
I was able to find a solution.

In my hosts configuration I had the host name, MAC address and an IP. This was enough before as I assumed leaving the domain of the host entry blank would have it automatically inherit the system domain.

Now that I've also populated my local domain in the domain field for all of the hosts, they are resolving now.

Edit:
I should note before making that change I had tried apply multiple times and even restarted the service. The dnsmasq-hosts file was being populated but not with the FQDN, just the host name. Now by adding the domain the dnsmasq-hosts file has both the FQDN and the host name.

Edit2:
Enabling "DHCP FQDN" in the dnsmasq general settings also fixed the issue and I no longer have to specify the domain in the hosts entry for the reservation. I prefer this approach as it automatically inherits the system configured domain. Using this setting seems obvious now, but I don't know why it was working under 25.1.6 without this.
#5
When I upgraded to 25.1.6 I followed the instructions to forward requests on my local domain from unbound to dnsmasq. I have a number of servers for which I have dhcp reservations configured in dnsmasq.

On 25.1.6 this worked perfectly. If I did an nslookup of host1.my.home.arpa unbound would forward it to dnsmasq which would then respond with the IP from my hosts configuration.

Since I've upgraded to 25.1.7_2 this no longer works. Unbound forwards the request to dnsmasq, then dnsmasq forwards it to the system configured public dns servers (such as google or cloudflare).

I believe this is due to the change to make the reservations not write to the hosts file. Maybe for the static reservations I can put overrides in unbound - but I want the hosts that get dynamic ips to also be resolvable.

Is there a configuration change I'm possibly missing that I need to make now for this to work with 25.1.7_2?
#6
Perfect. With that patch in place, I verified the configuration file generated properly and then the clients booted correctly as well.

Thank you.
#7
Monviech,

Unfortunately the fixes did not work for me. I was puzzled, at first, because it looked correct. But then I figured out the issue.

So, on my system my interfaces are named as "igc0" through "igc4", and I have the boot configurations set for the two interfaces that are named: igc2 and igc3.

With that said, with this latest update, everything in the "dnsmasq.conf" file that is autogenerated uses the "igc" names correctly - no issue... except for the new "dhcp-boot" section. For some reason, it's trying to call them "opt1" and "opt2". I remember that opt1 and opt2 were their original names when the interfaces were first discovered, so, I'm assuming this is just a mapping issue and the code needs to be updated to pull the correct interface name.
#8
Quote from: Monviech (Cedrik) on May 16, 2025, 09:31:04 PMThanks for the further investigation.

Why is one tag with ":", and the other with "="?

Sorry that was a copy paste error. I had that syntax error at first and then corrected it after I had already copied it over.

I edited and fixed my original post in case anyone else copies it from there.
#9
I finally got it working.

It's a combination of the mentioned defect (8624 - where Boot settings do not go to the dnsmasq.conf file), as well as the fact that the interface tags aren't listed as an option.

I was able to confirm that the dhcp-boot directive does indeed support multiple tags. I experimented first with the "tag-if" directive, but wound up not needing it.

Below is the separate config file that I dropped into /usr/local/etc/dnsmasq.conf.d/  (named it 20-pxe.conf)

dhcp-match=set:IsBIOS,93,0
dhcp-match=set:IsEFI,93,7

dhcp-boot=tag:igc3,tag:IsBIOS,undionly.kpxe,10.0.64.10,10.0.64.10
dhcp-boot=tag:igc3,tag:IsEFI,snponly.efi,10.0.64.10,10.0.64.10
dhcp-boot=tag:igc2,tag:IsBIOS,undionly.kpxe,10.0.64.10,10.0.64.10
dhcp-boot=tag:igc2,tag:IsEFI,snponly.efi,10.0.64.10,10.0.64.10

I was able to confirm that a BIOS based client on igc3 got the BIOS boot file, and a UEFI based client on igc3 got the EFI boot file.

For all the "DHCP Options" that I configured in the GUI to try to fix this manually, I just created a new "tag" called "Disabled" that just never gets set and added that tag to all of them to disable them without having to fully delete them. It might be "nice to have" for the UI to offer an enable/disable function similar to firewall rules so that the options can be toggled without having to completely delete them.
#10
Thanks. I actually just ran into that issue when I started to give up on the approach of setting 66 and 67 directly.

The interesting thing is when I use a Set option for 66 and 67, what I'm seeing when I test the DHCP response is that 67 (filename) gets populated exactly as I've set it. No issue there. However 66 gets sent to the client as an empty string (See edit, this is not actually true). That must be why it times out trying to connect and also why I don't see any activity or attempts to reach the actual TFTP server.

I've also tried setting option 150 for the TFTP server but still get the same issue. The client just gets an empty string instead of the IP I put in.

Looking at the dnsmasq config file, I don't see anything that stands out as "incorrect" for the dhcp-option directives. I don't know if maybe dnsmasq prevents you from setting these values in the option method and instead relies on the boot options specifically.

Edit:
Well, that was a bit frustrating. I was using a PowerShell script provided by 2Pint to do the DHCP Test. But there was actually an error in their script where they referenced the wrong variable name. Once I corrected their script, I do see that the TFTP IP address is actually getting populated correctly.
I might need to figure out how I can do a TCPDump or WireShark capture of the DHCP packets at boot time to see if I can figure out what's going on or missing. I'll probably need to get another PC on that network that I can set up as the listener for that, since the PC I usually have that kind of access for is the one that I need to capture the boot time packets.

Edit2:
I see the difference in the DHCP packets now. I don't understand the DHCP spec enough to speak intelligently to it. However, when I have ISC DHCP (working), the TFTP IP address and boot file name are in a separate section of the DHCP offer packet. The way WireShark decodes it, it looks like it's in some fixed-width spots within the packet, as there's no identifier before hand. Additionally, the IP address gets stored as a 4 byte value. However, when I use my configuration on dnsmasq, instead of those same fields getting populated, they appear later in the list of DHCP options returned for value 66 and 67. Moreover, the IP address there (if it matters) is written as a null-terminated ASCII string - rather than a 4-byte IP value.

I probably will need to get dnsmasq's actual "boot" config to work for it to "properly" format the DHCP offer with the bootfile and TFTP server populated in the "correct" spot. Even though options 66 and 67 exist, it doesn't seem like setting them like normal options really works. Probably due to the boot response being a different kind of packet.

It's unfortunate that you can't configure dnsmasq to only offer the boot options on specific interfaces; but I can just manage that with firewall rules.

Edit3:
I looked at the manpage for dnsmasq, and I se the dhcp-boot option described as follows:
-M, --dhcp-boot=[tag:<tag>,]<filename>,[<servername>[,<server address>|<tftp_servername>]]
(IPv4 only) Set BOOTP options to be returned by the DHCP server. Server name and address are optional: if not provided, the name is left empty, and the address set to the address of the machine running dnsmasq. If dnsmasq is providing a TFTP service (see --enable-tftp ) then only the filename is required here to enable network booting. If the optional [b]tag(s)[/b] are given, they must match for this configuration to be sent. Instead of an IP address, the TFTP server address can be given as a domain name which is looked up in /etc/hosts. This name can be associated in /etc/hosts with multiple IP addresses, which are used round-robin. This facility can be used to load balance the tftp load among a set of servers.

What is interesting here, is the first line does make it seem like only one tag can be on a dhcp-boot option; but then the description seems to indicate that there can be multiple tags. I'll go ahead and try later adding a custom config file on the router with dhcp-boot set with an interface tag, and my BIOS/EFI tag to differentiate the filetype and report back if it works. If that does, I'll probably do a separate reply rather than an edit, since it will be a substantial enough update.
#11
I created an option entry for each of my two interfaces that should support network booting to populate DHCP option 66, with the IP of the TFTP server. Then I created tags to identify BIOS or EFI based on tag 93, and an option entry for option 67 to populate the correct boot program based on that.

I didn't want to use the built in boot option UI since it doesn't seem like you can restrict it to specific interfaces. There's no where to set an interface tag.

Unfortunately PXE still isn't working for me with dnsmasq, but does if I fall back to ISC.

The client gets as far as getting a DHCP response, then gets to the TFTP stage but times out. The firewall logging doesn't show it trying to connect to the TFTP server (neither allowed or blocked).

Tomorrow I'm going to try a DHCP test tool to see what options dnsmasq is actually sending out. I might do the same with ISC and see what the difference is.

In the meantime any other feedback is appreciated, otherwise I'll look to post my final working configuration in case it helps others.
#12
I know of course there are tons of threads this release on the transition from ISC DHCP. With that said, I've tried searching and couldn't find this specific answer.

For my use case, it is important that I be able to still support PXE booting a mix of BIOS and UEFI clients on my network before I can transition from ISC DHCP to Dnsmasq. I unfortunately don't have a "test environment" where I can comfortably use trial and error to figure out the right approach on something as important as DHCP.

What is not clear is whether the "match" option for setting tags supports wildcards - either implicitly or explicitly. Typically to send the right file to the client, I have whatever DHCP server I'm using do a "partial match" of Option 60 (Vendor Class ID). In the OPNsense ISC DHCP settings, this is done transparently, as there's just separate fields for the BIOS vs. UEFI boot program file.

I've seen examples online for dnsmasq specifically that use config entries like: "dhcp-vendorclass=BIOS,PXEClient:Arch:00000" to tag the DHCP entry.
Of course, those familiar with Option 60 and the PXE spec know the above is a partial match, as after that last 0 there are other irrelevant values that can't necessarily be known ahead of time. So that first sub-string of Option 60 is really what's important to identify.

Unfortunately, I don't see a clear path in the UI to specify a "vendor class" match directly.

I considered, of course, using the "match" option that is available in the UI and selecting Option 60. However, as noted above, I'm not clear on the wildcard capability to handle that partial string matching.

Lastly, I considered the Option 93. As I've read that machines are supposed to set this to the architecture value that I'd need. However, I'm not familiar with how widely used this is. It's been a while since I read the PXE spec document, but I don't recall Option 93 being specifically called out.

Any assistance will be greatly appreciated!

Edit:
I found dnsmasq manpage that shows the match directive does indeed support the * character as a wildcard, so I should be able to try that.
I also found that RFC 4578 calls out that tag 93 is indeed required for PXE clients to populate. I reviewed the original Intel spec document and it wasn't immediately clear. It doesn't flag option 93 as required in the main chart, but a footnote explains it is required. So, I may try that route and see if I have any issues with clients not in compliance and I can fall back to partial string matching against option 60.
#13
Hello

Today I noticed a number of my services were in a stopped state, including Acme. I was able to manually restart them all except Acme. I had to go to its settings and hit apply again to get it working.

I then saw some threads on here about certain services "not auto starting" after a reboot. So I decided to try to reboot to see if that was my problem. The UI had the spinning wheel like it was rebooting, then after some time just displayed the dashboard again (not even requiring a new login). I knew this was strange so I looked at my uptime, which is about 25 days. I always reboot after upgrades, so I know for sure I also attempted a reboot last week as well.

Reading about the past CrowdSec issue, I know this is probably indicating some service is not cleanly stopping and is holding up the reboot. (Hence why some services stop but it never actually reboots.) With that said, I'm not sure how to identify which service is holding up the reboot.

Edit/Update: I was looking at the commands people ran for the CrowdSec issue, but couldn't find a process named crowdsec. So I just tried the reboot command from SSH to see if it would output any errors, but the system actually rebooted immediately. After it booted, I'm now able to also reboot from the GUI again. I'm not sure what I did to fix it.
#14
25.1, 25.4 Series / Re: SFTP Backup Interval
March 17, 2025, 01:42:48 PM
Thanks for the info. I just did another test overnight and that one worked. It's possible the other times I tested it that I happened to also update the SFTP settings and the "manual" backup covered what the scheduled backup would have done. I'll keep an eye on it going forward.
#15
25.1, 25.4 Series / SFTP Backup Interval
March 17, 2025, 06:10:19 AM
Are there any details on the update interval for the SFTP backup?

Since I started using it, I've only had it create backups when I save the configuration of the SFTP plugin itself and it does its self test. I'll later change some other configuration, as I assumed it'd work like the Google Drive backup where it'd create a backup overnight after some configuration change but it doesn't seem to be doing that in my case.

I believe I can create a cron job for the Remote Backup action to schedule a daily backup, but I really only want the backups created after there had been configuration changes.

Does anyone know if the SFTP backup plugin is designed to work that way? If so, what can I look at to see why it may not be triggering? And what time is it supposed to trigger (the documentation for the Google Drive backup just says "early in the morning"). Or if it's not meant to work like that, is there any advice on how I can create a script to look for configuration changes? Just take a hash of the configuration file? Or is there a "last modified" timestamp (I guess from the file system)?

Thanks.