Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - drosophila

#1
Hm. Vielleicht ist das ja genau das Problem, daß VLAN7 in diesem Fall nicht stimmt? Kann man das in der Fritze irgendwo nachsehen, oder muß man das sniffen?
#2
1und1 resellt alles, was irgendwo rumliegt. :) Hier wurde z.B. von Deutsche Glasfaser verlegt und 1und1 vermarktet das. AFAICS hat die DG das exakt selbe oder zumindest extrem ähnliche System wie Lünecom, was hier diskutiert wurde. Ich habe zumindest beim Reinschauen dasselbe gesehen, inklusive der Freischaltung über einen Server im 10./8 Netz usw. . Also könnte es schon sowas sein, allerdings müßte dann auch die Fritze erst freigeschaltet werden gemußt haben.
#3
Good detective work! This seems to indicate that there is some cruft stored in the config files, possibly leftovers from old configuration changes, now masked by their wrapper services being disabled. IDK why sense would store ARP things in the config, but it might be old entries for IP assignments on some obscure sub-service page, or maybe even MAC-based rules for the firewall that have a side-effect. Fixed ARP entries are possible at least in the config files (you can manually add ARP entries from the console, so scripts can also do it). The fact that it comes back only after some time hints that it might coincide with provider-induced WAN IP reassignments, resulting in the respective services are being refreshed.
Of course, it is possible to assign arbitrary MACs to interfaces in most OS/drivers, but that would also have been set up manually, just like custom scripts.

It might work to "clean up" the config by exporting it, clearing the current one, then re-importing the previously exported one. That would get rid of everything that is in auto-generated config files but not in the actual config. You could however grep through the generated xml and see if those MACs appear in it, and if so, where.
#4
Quote from: mrThirsty on May 06, 2026, 11:56:30 AMall devices both wired and wireless can't seem to do anything during this
First thing to check: what do the lights do? Do switch / NIC indicate heavy traffic? No traffic at all? Normal traffic?
Second thing to check: can they communicate among themselves? IOW, on their segment / switch?
Quote from: mrThirsty on May 06, 2026, 11:56:30 AM, I am leaning toward it being a WAN-related issue as on the odd occasion when the freeze up has been long enough, I am still able to log into the Admin portal on my router.
So the LAN part is perfectly fine (unless you're using a dedicated admin port?), and the devices on your LAN can probably communicate normally among themselves. And the Protectli isn't frozen either, nor is the network stack. Keep the dashboard open and observe it for CPU / RAM / whatever spikes that shouldn't be there when the lockup happens. You probably need to add a couple of useful widgets first, like "CPU", "Traffic Graphs" and "Thermal Sensors".
Quote from: mrThirsty on May 06, 2026, 11:56:30 AMI have determined the issue is my OpnSense router as I have removed it from my network and then ran each of the ISP modem and Amplify-HD as the router for a day each and during those two days I did not have any of the freezes. I have also taken the extreme move of completely wiping my router and just having it run as it comes out of the box, just as a DHCP server, no ZenArmour or OpenVPN etc. and I still get the freezing. No matter what configuration I run my network in, as soon as OpnSense is the router, the freezing happens.
That rules out updates for ZA or other blocklists clogging up the machine. I'd look for WAN-related events like IP address changes, possibly interface-related if we assume that the WAN interface might simply have a defect. Would it be possible to reassign the WAN and one of the other interfaces (the box has 4 AFAICS) to see if the issue persists unchanged (so, on WAN) or sticks to the interface?

Also, you could observe the "Live view" on the Firewall. You don't need to interpret the individual lines, just look for changes in pattern (note that to do so, you'd need to stare at those logs while everything is normal for a while to be able to see what the normal patterns might be: not every "wall of red lines" indicates something unusual.). You might need to first enable logging for all rules first. Familiarize yourself with the settings and buttons on that page so you can hit the "stop" button in time, possibly increase the "Table size" to 100 or even more for that.

Since that is an Intel, you should install the "os-cpu-microcode-intel" plugin, even though this doesn't seem to be CPU related.

I would set up a machine that is normally on anyway (like your desktop / work machine) to do a continuous ping (not floodping, just a friendly once-per-second endless ping (/t in Windows, I believe)) to one of your other "always on" LAN devices (printer, TV, smartbulb, home automation, toaster, ...) in one terminal, and another such ping to something on the internet that won't be going anywhere, say, www.microsoft.com. Possibly a third terminal pinging the LAN interface of your Sensebox for good measure.

Just keep them running until the event and see which one starts failing / changes behavior, and how, once it occurs. Or even if at all: if it is a DNS issue, running pings won't be affected.
#5
Hab das gestern auch mitbekommen, glücklicherweise hatte ich an der Sensebox seit Tagen nichts gemacht und darum das Problem sofort für eine externe Störung gehalten und nur ganz allgemein hier reingeschaut. Dann tauchte auch schon hier Dein Thread auf und ich hatte was zu Lesen. :)
#6
Yup, that works, thanks for the heads-up! :) That way I can filter by EFI / BIOS like it's supposed to be done though the GUI, only the filenames are less readable but much better than manual file alterations with unconventional characters. This way, at least it's clear what's going on. I'll settle for this. :)
Quote from: Monviech (Cedrik) on May 03, 2026, 07:12:35 AMSince you are already way into hex anyway, why not send hex directly?
Because I had assumed that would only take a single byte or somesuch. :)
The format could be explained a little better in the help resp. error text, to make clear it expects something like "2f707865626f6f7466696c650056" (which ends up verbatim in the config file). ;)
Quotepayload in hexadecimal byte pairs.
The error text
QuoteHex value must contain valid hexadecimal byte pairs.
To input a series of un-delimited and un-prefixed hex characters was unexpected, given that most (all?) other tools delimit them by a space save for MAC addresses where they use a colon. Maybe the text could read "payload in hexadecimal byte-wise representation, without delimiters and prefixes."?

Would be neat to have a "comment" field per option where I could put the reminder "For nVidia to work, must always append "0056" to hex value!" or somesuch. ATM, I made an option that has this in its description to serve this purpose. :)
#7
Das kommt mir irgendwie vor als könnte es sich nicht entscheiden, welches Interface das Richtige ist. Ist das eine komplett neue Konfiguration, oder wurden da evtl. mal die Interface-Assignments getauscht, so dass nun in einer aktuell nicht sichtbaren Konfigurationsstelle das falsche Interface steht, so dass es eine Race-Condition gibt? Kann man z.B. mal die beiden Interface-Assignments zwischen LAN und WAN tauschen (und natürlich die Kabel umstöpseln), um zu sehen, ob das Problem dann auch auftritt?
#8
Alright, this has not been the final word. Instead I tested some more with appending a true zero. Doing so through the GUI seems to be impossible because it always escapes the escape character "\" into "\\" and therefore destroys the escaping, but from the config file itself it works with an appended "\u000". But for some reason this works only if at least one letter follows the true zero, otherwise KEA just ends there and doesn't send the zero at all. Adding any character(or multiple) makes it send the zero. Like this:

43 0e 2f 70 78 65 62 6f 6f 74 66 69 6c 65 00 56 ff 23 94
That translates to "/pxebootfile<zero>V", note how the length field now is 0e, correctly counting the length including the zero and the appended "V". This is handled correctly by both Realtek and nVidia, both end up requesting "/pxebootfile", which is correct. So is KEA wrong in not sending the final zero even though it correctly specifies the length? Did nVidia work off an outdated spec or did they simply commit the basic blunder of counting from zero instead of one? Does Realtek have a more stringent sanitizer that converts all non-ASCII characters into end-of-string?

Sadly, the "specification" in RFC4578 only says "The format and contents of these options are NOT defined by the PXE specification.", so this must be defined outside of an RFC.
IBM has a document here that lists "UINT8 bootfile[128];   Bootfile: Boot file name. Null terminated string." on page 50. So it seems like the error is indeed on KEAs part, but what about the difference in parameters (noted above: "file" vs. "BF")? Is there yet another spec that supersedes the 1999 IBM document?
So it seems there is one from uefi here. They don't want to let me look at their precious document through my secured browser, and I'm not willing to compromise my anonymity, especially not to satisfy some random sites self-importance. So my research ends here for now. Maybe I'll get back to it on someone elses computer, who doesn't care about security and privacy.

The final question is: does the GUI have some means to escape escape characters so that it does not escape escape characters? IOW: how do I get the "\" into the config file through the GUI without it being mangled into "\\"? So that the resulting config file reads: "data": "\/pxebootfile\u0000V"Not relying on evil filesystem hacks is nice, but managing KEA through files while a perfectly capable GUI is there for everything else also detracts from maintainability.
#9
Just as (final?) info: the evil hack works: I created a symlink with a final character of 0xff, that points to another symlink with the same name except for the appended 0xff, which in turn points to the intended bootfile (so I can simply change this intermediate symlink, and PXE loaders without this bug will use that, so all end up loading the same file). Then I set KEA to serve the filename without the appended 0xff. The nVidia PXE boot agent boots just fine now, so that really is the entire problem.

Now the only thing I could do is to try to find out whether the 0xff that KEA sends after the end of the BF parameter is somehow necessary or not, and if not, try to convince the KEA developers to change that into an 0x0, hoping that the nVidia PXE boot agent would then view this as null-terminated string and handle it properly. Or just keep the evil hack, which of course stinks in terms of maintainability. :(
#10
Quote from: nero355 on April 25, 2026, 03:15:13 PMBut I can remember that at some point it was no longer compatible with UEFI Servers so double check if the software you eventually go for can also do that!
That would likely be a reason to keep trying to get KEA to work; possibly I'll attempt to install a symlink to the actual install file that with an appended "ff". Should be possible but it's certainly an evil hack. KEA can easily filter by boot agent type and therefore serve different stuff for BIOS vs. EFI.

Quote from: https://punkt.de/de/blog/2017/automatisierte-installation-von-servern-mit-freebsd-und-zfs.htmlDie bereits per TFTP bereitgestellte Umgebung muss nun noch zusätzlich per NFS exportiert werden.
That likely was the issue that kept my attempt from succeeding. It did load the netboot-capable boot but since I expected this to be fully handled by TFTP I didn't export the boot FS via NFS (which would have been easy since NFS already serves selected datasets). A pity that one can't do away with TFTP entirely, NFS is less hacky and much faster. and if you have TFTP running, then things must already be secured against nefarious clients so that NFS won't be an additional security issue. For installers, one can just set all respective underlying filesystem permissions to be readonly for everyone since they're not sensitive and thus only manipulation must be prevented. The automated install might need sensitive things like passwords or certificates to be served ("Anschließend setzen wir das Root–Passwort, damit wir uns über die Konsole anmelden können: (...) ROOT_PW_HASH ist dabei natürlich durch den echten Hash des Root–Passworts zu ersetzen."), but a basic interactive install doesn't have these problems.

Thanks for the heads-up!
#11
I see, either way makes sense. It seems like I could implant the CPU type in the PLUGIN_VARIANT variable by deducing it from dmesg output, but that would probably be brittle, even if it can run early enough.
#12
Quote from: pfry on April 07, 2026, 05:03:58 PMTaxes = vaporized (yet somehow still alive and aware), but hey, those are inevitable.
That's because you are paying the taxes that the likes of Musk, Zuckerberg and Trump do not pay.
#13
After looking at some packet dumps it seems like there are at least two methods of sending the boot file name. Obviously, KEA and udhcpd use different ones, apart from ordering packet options differently.
KEA sends: ... 43 0b 2f 70 78 65 6c 69 6e 75 78 2e 30 ff 63 ...
which translates to "/pxelinux.0" but as you can see, there is an "ff" at the end. This is not a null-terminated string, instead the string length is prefixed, "0b" (decimal 11) in this case.
The decoded packet reads:
e.e.e.e.67 > 255.255.255.255.68:  xid:<censored> flags:0x8000 Y:c.c.c.c S:b.b.b.b ether <censored> vend-rfc1048
DHCP:OFFER SM:255.240.0.0 DG:g.g.g.g NS:e.e.e.e DN:"local" LT:3116 SID:e.e.e.e BF:"/pxelinux.0" (DF) [tos 0x10]
udhcpd sends: ... 00 00 2f 70 78 65 6c 69 6e 75 78 2e 30 00 00 ...
which, again, translates to "/pxelinux.0" but is surrounded by all zeros so it is, intentionally or not, a null-terminated string.
The decoded packet reads:
e.e.e.e.67 > 255.255.255.255.68:  xid:<censored> flags:0x8000 Y:c.c.c.c S:b.b.b.b ether <censored> sname "<censored>" file "/pxelinux.0" vend-rfc1048
DHCP:OFFER SID:e.e.e.e LT:864000 SM:255.240.0.0 DG:g.g.g.g NS:e.e.e.e DN:"local"

So while KEA passes the filename in a "BF" parameter, udhcpd supplies it in a "file" parameter.

So, given the correct string length passed by KEA it seems like the bug is in the nVidia boot agent, which, of course, is the worst possible outcome because that means that these will never work with KEA.
I might find a way to make KEA send a "file" parameter instead of the "BF", hoping that it will also surround it by zeros or that it will just be implemented properly. More likely I'll ditch KEA for Dnsmasq because naturally I want a solution that works in all cases. The only upside of this is that now I know it.
Quote from: nero355 on April 24, 2026, 03:39:20 PMWould it be an option to have a dedicated PXE Boot VLAN on your network ?

In the past I have worked for a company that had this and the software doing the PXE Boot stuff was some kind of dedicated Linux distro at the time.
So in this case OPNsense would be just the Router providing internet access for all those NetBoot images :)
That would be an excellent option for actual thin clients, as this kind of boot is quite insecure and should ideally be confined even on the LAN.
However, I use PXE for all sorts of things except thin clients. :) (my infrastructure is not robust enough; to depend on it, I'd need HA first) Like initial / post-modification memtest86+ tests and serving the OS installers (debian netinstall), and that can potentially be every machine (except the TFTP server itself, of course). That way, I don't have to fiddle with USB sticks and can apply the same procedure to other peoples machines if need be. :) That's why I need it to "always work", as the type of client is not forseeable. One thing I still need to figure out is how to chainload the BSD bootloader from pxelinux (which provides the menu). Is there some sort of "GRUB" for PXE?
#14
Hehe, looks like you didn't buy the Venus to show it off, given that I can barely make it out behind all those fans and heatsinks! :D Was that a NAS with all those harddrives?
Quote from: nero355 on April 24, 2026, 12:57:09 AMNo idea : Sold it for big €€€ to a collector in the U.K. a couple of years ago ^_^
Good call, plus it's good that at least these special editions get preserved that way. Win-win, it seems! :)
Quote from: nero355 on April 24, 2026, 12:57:09 AMBut I think it would because of the brand of the NICs : Broadcom or Marvell
I think those did not have any Nvidia parts involved.
Hmm, Marvell was one but the other was a Vitesse PHY that was fed from the nForce4 and thus would likely also have used the nVidia PXE boot... I think it used the "forcedeth" driver, anyway. Oh well. :)

BTT: I believe I've found the problem: the nVidia boot agent seems to terminate the boot file name with some illegal character, or rather, with something while there shouldn't be anything there. The server log viewer would show just two blank lines (which is a bug in the server GUI...), but its syslog shows this:
RRQ from x.x.x.x filename /pxelinux.0<FF>The working Realtek boot agents send the same request but without the postfixed "<FF>" (which probably translates to a single byte of all ones). Obviously that should not be there but now the question is: why does it work with udhcpd and Dnsmasq, just not with KEA? Is it possible that KEA incorrectly sends this but Realtek just discards it while nVidia keeps it? Or could this be a config problem?
I seem to recall that 0xff is the UTF-8 "continuation" marker, but even if that is correct that still doesn't explain why it's there.
#15
Da hatte doch eben noch jemand ein ähnliches Problem, nur hinter einer Fritzbox. Vielleicht hilft hier ein mehrfacher refresh ja auch? Nicht, daß da auch radvd läuft...?