Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - rkubes

#1
General Discussion / Handling TFTP Responses
September 25, 2025, 04:39:43 PM
For those unfamiliar, as a quick recap, TFTP has clients reach in to the TFTP server on UDP port 69 (that part is easy). The clients can use any ephemeral UDP port that they're sending from. In most rules, this isn't an issue because you don't care about the source port, and a firewall state is created for the connection.

However, for TFTP, when the server responds it creates a new connection on a different ephemeral port on the server side to reach that same ephemeral port that the client reached in from.

Example: Client uses 1234 to hit Server 69 to ask for file XYZ; Server then picks port 5678 to respond to Client on port 1234 (the port the client originally sent from)

I had a rule that was working well in my network for a while that was 1024:5000, and 32768:61000. This was working for a while, since my devices were primarily either hardware NICs that tended to stay in that upper range, and iPXE software that I had configured for the 1024:5000 range. With that said, after adding some VMs, I'm now seeing those clients try on the 14xxx and 19xxx ranges.

I know that one solution is I can just open UDP of "all ports" from my TFTP server to the devices that I expect to PXE boot.

With that said, I wanted to see if there's any "magic" that exists in the Rules or Filters that may help me link "You can allow this packet to go out to port XX, if there was a packet that originally came in from that same port." I'm skeptical, but figured I'd ask to be sure.

The other approach I'm going to take is see if I can configure the TFTP server that I'm using to only respond "FROM" a certain port range that I can then predict as the "source port". I know one of the TFTP servers I use supports this, I just need to confirm if the other one can - then I'd feel more comfortable with not wanting to get as deep as trying to match request and responses directly.
#2
Thanks for the follow up.

I did originally test the floating rule to "block all to private networks (via alias)" and not quick, but then found my per-interface "allow all to any" was basically overriding it. So I already adjusted my "allow all" rules to "allow all to !Private_Networks".

I agree, to your point, that's making the floating block (not quick) rule redundant at this point - since it would then just fall to the default block anyway. I may just disable it after I'm done reviewing logs for a few days to make sure everything's working as intended. For now, the specific logging is just helping me identify if there's any potential "allows" I missed.

In the future, I may rework my rules to all be floating, if that is indeed considered efficient. It would just be a bit of a migration effort initially. I can see how that would be easier to manage, long-term, though.
#3
The examples that document show are for "pass" rules, which the "quick" makes sense, since it's not a 'defaulting'/catch all else type rule.

I believe the order of operations will be "Floating Rules First", then within floating rules "Interface Group Rules first", and then getting down to individual Interface group rules, then individual interface rules.

Would it be 'correct' for me to make a "block" rule for the interface group of interfaces that I generally don't want having cross communication - but mark that rule as not "quick." And then within the individual interfaces, I'd still then be able to use "pass" rules to allow specific IPs/ports between those interfaces?

I want to avoid the "block all" rule being hit first and then not being able to get to the 'pass' rules within the individual interface.

Edit:
I realize a more efficient approach might be to edit my "allow all outbound" rule on each interface to exclude the Group's net - but I'm trying to avoid having to set the default block rules on a per interface basis to avoid forgetting.
#4
I have multiple LAN networks spread between different physical interfaces, VPNs, and VLANs.

I should also call out that I have WAN Failover configured, so all my "allow all outbound" (after all my block rules) are configured to use that Failover Gateway which only goes to either WAN device.

When I first started configuring my OPNsense device several years ago, I would go to each LAN and make a list of rules on each LAN that was "block all This LAN to That LAN." Then as I'd add new LANs over time, I'd need to remember to add another rule to every LAN to default block access to the other LANs. I'm realizing this is becoming a maintenance hassle to keep accurate.


Part of me is thinking, since my default "allow out" rules use the WAN gateway, they probably can't talk to the other LANs by default anyway (other than my explicit allow rules). But I don't know if it's "safe" to rely on that.

I'm also thinking I could make an alias that includes all IPs in the private ranges (192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/? I can look that last CIDR up). Then just make a block rule on each LAN to catch anything that wasn't explicitly allowed, and then no longer have to remember to manually add each new Interface/net as I add more LAN segments.

I tried searching this topic, but a lot of the results were people wanting to figure out a way to filter traffic between devices WITHIN the same LAN (which don't make it past switches to the OPNsense instance anyway).

Still before I potentially send myself down another "bad" path, I wanted to understand what others are doing and what recommendations there are.

Thanks
#5
25.1, 25.4 Series / Re: sftp backup error
July 16, 2025, 08:31:17 AM
I figured out the issue by adding some debug logging to Sftp.php

If it's okay, I can create a pull request tomorrow - unless it's urgent and someone else needs to get to it sooner.

So, the issue is when the backup files are created remotely, they're named in all lowercase.

However, when the "fileprefix" is made for finding files based on the hostname, the search pattern is not made to lowercase.

Long story short, with this issue, if anyone's hostname has capital letters in it (which maybe isn't common) AND they are configured to include the hostname in the backup config filename, then it will never find the files when it runs the "ls" command in sftp.

Edit:
I should clarify, I didn't read the full post. My above is the solution to the "remote backup returned no files" error. Not to the identify key/permission issue.
#6
25.1, 25.4 Series / Re: sftp backup error
July 16, 2025, 08:13:20 AM
Did you get a solution to this?

I just recently rebuilt the server that was hosting the SFTP, and now I'm getting this error when I try to test the SFTP configuration.

I don't know whether it was working or not before. I know it was successfully saving files before (as does this test), but I don't know if it was ever not getting a list of files on the remote server.
#7
I have two gateways configured on my instance of OPNsense. The primary is connected directly to my primary ISP's equipment and will pull a public IP. This gateway works as expected.

I have a secondary gateway configured for "failover" so that if my primary Internet drops, I can run off this secondary equipment. For reasons beyond my control, there is another router between the OPNsense interface for this gateway, and the actual Internet. So, the DHCP on this interface will pull a private IP in the 10.x.x.x range.

My challenge is I don't keep this secondary Internet up all the time, but the router is up all the time. (It's easier for me to remotely manage the upstream device to make it available when backup is needed). So, if I leave the gateway monitoring on the default, it assumes this gateway is always up, since it can ping that router all day.

However, if I change the Monitor IP of that gateway to 8.8.8.8; it will assume the gateway is down. I can even SSH into the OPNsense instance, and I can run the ping command to hit 8.8.8.8 using that Interface's IP as the source, and it will get responses with no issue. However, OPNsense UI still shows that the interface is "down." All other devices that are connected to that router (I've done this for testing) are able to get out to the Internet without issue and can ping 8.8.8.8.

I've tried both with "Disable Host Route" and without it enabled, and the result is the same. I'm not sure what else I could try.

I don't think it's a firewall rule issue, since this is outbound and I have rules to allow all outbound traffic from my LANs through the Failover gateway group. Moreover, I assume the "ping" is coming from the interface itself, so my LAN firewall rules probably don't even come into play.

As mentioned, all other devices are able to connect just fine through that router. Running ping from the OPNsense SSH shell to 8.8.8.8 is successful. And also if I do something to "force" the gateway up - like disabling monitoring or allowing it to monitor the gateway IP itself (default) - then all connectivity works as expected. It just seems like I must be doing something wrong or not understanding enough how to properly configure to use a "public" monitor IP to track when that interface truly has Internet access.

Any assistance will be appreciated!

Edit:
I apologize I was able to solve this on my own right after posting this. I figured I'd leave this here for reference in case anyone else comes across this same issue.

I noticed the one difference was the "ping" command was sending something like 50+ bytes by default, but under the "advanced" settings for the gateway there was a configuration for the "Data Length" that defaults to just "1". I tried the "ping" command with a packet size of just "1" and then 8.8.8.8 would no longer respond.

I tried then with a packet size of "10" - just picking an arbitrarily larger number and 8.8.8.8 started responding again. I didn't dive deeper to find "what is the minimum data length for a ping that 8.8.8.8 will respond to." However, 10 seems to work. Since I put 10 in for the Data Length, everything is working exactly as expected.
#8
I was able to find a solution.

In my hosts configuration I had the host name, MAC address and an IP. This was enough before as I assumed leaving the domain of the host entry blank would have it automatically inherit the system domain.

Now that I've also populated my local domain in the domain field for all of the hosts, they are resolving now.

Edit:
I should note before making that change I had tried apply multiple times and even restarted the service. The dnsmasq-hosts file was being populated but not with the FQDN, just the host name. Now by adding the domain the dnsmasq-hosts file has both the FQDN and the host name.

Edit2:
Enabling "DHCP FQDN" in the dnsmasq general settings also fixed the issue and I no longer have to specify the domain in the hosts entry for the reservation. I prefer this approach as it automatically inherits the system configured domain. Using this setting seems obvious now, but I don't know why it was working under 25.1.6 without this.
#9
When I upgraded to 25.1.6 I followed the instructions to forward requests on my local domain from unbound to dnsmasq. I have a number of servers for which I have dhcp reservations configured in dnsmasq.

On 25.1.6 this worked perfectly. If I did an nslookup of host1.my.home.arpa unbound would forward it to dnsmasq which would then respond with the IP from my hosts configuration.

Since I've upgraded to 25.1.7_2 this no longer works. Unbound forwards the request to dnsmasq, then dnsmasq forwards it to the system configured public dns servers (such as google or cloudflare).

I believe this is due to the change to make the reservations not write to the hosts file. Maybe for the static reservations I can put overrides in unbound - but I want the hosts that get dynamic ips to also be resolvable.

Is there a configuration change I'm possibly missing that I need to make now for this to work with 25.1.7_2?
#10
Perfect. With that patch in place, I verified the configuration file generated properly and then the clients booted correctly as well.

Thank you.
#11
Monviech,

Unfortunately the fixes did not work for me. I was puzzled, at first, because it looked correct. But then I figured out the issue.

So, on my system my interfaces are named as "igc0" through "igc4", and I have the boot configurations set for the two interfaces that are named: igc2 and igc3.

With that said, with this latest update, everything in the "dnsmasq.conf" file that is autogenerated uses the "igc" names correctly - no issue... except for the new "dhcp-boot" section. For some reason, it's trying to call them "opt1" and "opt2". I remember that opt1 and opt2 were their original names when the interfaces were first discovered, so, I'm assuming this is just a mapping issue and the code needs to be updated to pull the correct interface name.
#12
Quote from: Monviech (Cedrik) on May 16, 2025, 09:31:04 PMThanks for the further investigation.

Why is one tag with ":", and the other with "="?

Sorry that was a copy paste error. I had that syntax error at first and then corrected it after I had already copied it over.

I edited and fixed my original post in case anyone else copies it from there.
#13
I finally got it working.

It's a combination of the mentioned defect (8624 - where Boot settings do not go to the dnsmasq.conf file), as well as the fact that the interface tags aren't listed as an option.

I was able to confirm that the dhcp-boot directive does indeed support multiple tags. I experimented first with the "tag-if" directive, but wound up not needing it.

Below is the separate config file that I dropped into /usr/local/etc/dnsmasq.conf.d/  (named it 20-pxe.conf)

dhcp-match=set:IsBIOS,93,0
dhcp-match=set:IsEFI,93,7

dhcp-boot=tag:igc3,tag:IsBIOS,undionly.kpxe,10.0.64.10,10.0.64.10
dhcp-boot=tag:igc3,tag:IsEFI,snponly.efi,10.0.64.10,10.0.64.10
dhcp-boot=tag:igc2,tag:IsBIOS,undionly.kpxe,10.0.64.10,10.0.64.10
dhcp-boot=tag:igc2,tag:IsEFI,snponly.efi,10.0.64.10,10.0.64.10

I was able to confirm that a BIOS based client on igc3 got the BIOS boot file, and a UEFI based client on igc3 got the EFI boot file.

For all the "DHCP Options" that I configured in the GUI to try to fix this manually, I just created a new "tag" called "Disabled" that just never gets set and added that tag to all of them to disable them without having to fully delete them. It might be "nice to have" for the UI to offer an enable/disable function similar to firewall rules so that the options can be toggled without having to completely delete them.
#14
Thanks. I actually just ran into that issue when I started to give up on the approach of setting 66 and 67 directly.

The interesting thing is when I use a Set option for 66 and 67, what I'm seeing when I test the DHCP response is that 67 (filename) gets populated exactly as I've set it. No issue there. However 66 gets sent to the client as an empty string (See edit, this is not actually true). That must be why it times out trying to connect and also why I don't see any activity or attempts to reach the actual TFTP server.

I've also tried setting option 150 for the TFTP server but still get the same issue. The client just gets an empty string instead of the IP I put in.

Looking at the dnsmasq config file, I don't see anything that stands out as "incorrect" for the dhcp-option directives. I don't know if maybe dnsmasq prevents you from setting these values in the option method and instead relies on the boot options specifically.

Edit:
Well, that was a bit frustrating. I was using a PowerShell script provided by 2Pint to do the DHCP Test. But there was actually an error in their script where they referenced the wrong variable name. Once I corrected their script, I do see that the TFTP IP address is actually getting populated correctly.
I might need to figure out how I can do a TCPDump or WireShark capture of the DHCP packets at boot time to see if I can figure out what's going on or missing. I'll probably need to get another PC on that network that I can set up as the listener for that, since the PC I usually have that kind of access for is the one that I need to capture the boot time packets.

Edit2:
I see the difference in the DHCP packets now. I don't understand the DHCP spec enough to speak intelligently to it. However, when I have ISC DHCP (working), the TFTP IP address and boot file name are in a separate section of the DHCP offer packet. The way WireShark decodes it, it looks like it's in some fixed-width spots within the packet, as there's no identifier before hand. Additionally, the IP address gets stored as a 4 byte value. However, when I use my configuration on dnsmasq, instead of those same fields getting populated, they appear later in the list of DHCP options returned for value 66 and 67. Moreover, the IP address there (if it matters) is written as a null-terminated ASCII string - rather than a 4-byte IP value.

I probably will need to get dnsmasq's actual "boot" config to work for it to "properly" format the DHCP offer with the bootfile and TFTP server populated in the "correct" spot. Even though options 66 and 67 exist, it doesn't seem like setting them like normal options really works. Probably due to the boot response being a different kind of packet.

It's unfortunate that you can't configure dnsmasq to only offer the boot options on specific interfaces; but I can just manage that with firewall rules.

Edit3:
I looked at the manpage for dnsmasq, and I se the dhcp-boot option described as follows:
-M, --dhcp-boot=[tag:<tag>,]<filename>,[<servername>[,<server address>|<tftp_servername>]]
(IPv4 only) Set BOOTP options to be returned by the DHCP server. Server name and address are optional: if not provided, the name is left empty, and the address set to the address of the machine running dnsmasq. If dnsmasq is providing a TFTP service (see --enable-tftp ) then only the filename is required here to enable network booting. If the optional [b]tag(s)[/b] are given, they must match for this configuration to be sent. Instead of an IP address, the TFTP server address can be given as a domain name which is looked up in /etc/hosts. This name can be associated in /etc/hosts with multiple IP addresses, which are used round-robin. This facility can be used to load balance the tftp load among a set of servers.

What is interesting here, is the first line does make it seem like only one tag can be on a dhcp-boot option; but then the description seems to indicate that there can be multiple tags. I'll go ahead and try later adding a custom config file on the router with dhcp-boot set with an interface tag, and my BIOS/EFI tag to differentiate the filetype and report back if it works. If that does, I'll probably do a separate reply rather than an edit, since it will be a substantial enough update.
#15
I created an option entry for each of my two interfaces that should support network booting to populate DHCP option 66, with the IP of the TFTP server. Then I created tags to identify BIOS or EFI based on tag 93, and an option entry for option 67 to populate the correct boot program based on that.

I didn't want to use the built in boot option UI since it doesn't seem like you can restrict it to specific interfaces. There's no where to set an interface tag.

Unfortunately PXE still isn't working for me with dnsmasq, but does if I fall back to ISC.

The client gets as far as getting a DHCP response, then gets to the TFTP stage but times out. The firewall logging doesn't show it trying to connect to the TFTP server (neither allowed or blocked).

Tomorrow I'm going to try a DHCP test tool to see what options dnsmasq is actually sending out. I might do the same with ISC and see what the difference is.

In the meantime any other feedback is appreciated, otherwise I'll look to post my final working configuration in case it helps others.