OPNsense Forum

English Forums => 25.1, 25.4 Production Series => Topic started by: ripdog on February 22, 2025, 01:45:28 AM

Title: VLAN deletion fails silently
Post by: ripdog on February 22, 2025, 01:45:28 AM
Hi all,

I've been running Opnsense for a number of years on an ISP which required all traffic to be tagged with VLAN 10. I've now changed ISPs, and no longer have to tag my traffic. So, I unassigned the VLAN device in Interfaces>Assignments, and went into Interfaces>Devices>VLAN to delete the VLAN. This doesn't work.

In the browser devtools, the delItem call returns http 200 "not found", always a good sign. Oddly, the UUID which is sent in the delItem call is different from the UUID returned in the immediately following searchItem call. I tried to delItem the UUID manually, but that didn't work either.

The cause was immediately obvious. I re-ran the searchItem call, and the reported UUID for the VLAN changed. It is being re-generated every time it's requested.

For this reason, editing the VLAN fails, as the getItem calls fail with the error code http 200 and empty body.

(You can tell this is quality, secure software with the way they very securely return 200 for all http reqs, and don't report errors to the user.)

The VLAN certainly still exists, is available for assigning and appears in ifconfig output.

I've tried forcing a deletion by putting the ifname in /tmp/.vlans.removed and running #configctl interface vlan configure, which correctly reports a failure to delete the interface by printing... 'OK'. (I'm noticing a pattern here) It does clear the file in /tmp/.

Any ideas?
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 22, 2025, 08:05:42 AM
Quoteand went into Interfaces>Devices>VLAN to delete the VLAN. This doesn't work
How did it not work, error message in the GUI?

QuoteIn the browser devtools, the delItem call returns http 200 "not found" ...
...
You can tell this is quality, secure software with the way they very securely return 200 for all http reqs, and don't report errors to the user.
Let's not jump into the web programming as the first step, let's see if it can be replicated in the GUI.

And I'm sure you are aware what the HTTP return codes refer to. And since you do, you know that 200 means the HTTP request was successful; which it was. The API call on the other hand returned a sensible answer too, "not found".
Title: Re: VLAN deletion fails silently
Post by: ripdog on February 22, 2025, 09:07:49 PM
QuoteHow did it not work, error message in the GUI?

Sorry, I should have been more clear. As the thread title said, it fails silently. No feedback whatsoever. That's why I delved into the devtools, as I had no idea what was happening.

QuoteAnd I'm sure you are aware what the HTTP return codes refer to. And since you do, you know that 200 means the HTTP request was successful; which it was. The API call on the other hand returned a sensible answer too, "not found".

That's a very... interesting... opinion as to the purpose of http status codes. In this case, the correct code would have been 404, entity not found. Using them in this way can dramatically simplify error handling on the client side, allowing client side code to confidently assume the operation succeeded on a return of 200 and confidently assume failure on a non-2xx return.
Title: Re: VLAN deletion fails silently
Post by: Patrick M. Hausen on February 22, 2025, 09:20:37 PM
Quote from: ripdog on February 22, 2025, 09:07:49 PMIn this case, the correct code would have been 404, entity not found.

Just no. 404 is when the web server cannot find anything mapped to a particular URL not when an application cannot find a managed entity inside that application. HTTP error codes are concerned with the HTTP protocol only, not with anything you drive over HTTP.

Could you line out the steps you took? IMHO they should be (for 25.1):

- Interfaces > YOUR_VLAN - uncheck "prevent interface removal"
- Interfaces > Assignments - delete the interface assignment
- Interfaces > Devices > VLAN - delete the VLAN

For 24.x IIRC switch "Devices" in the last step for "Other types" or similar.

HTH,
Patrick
Title: Re: VLAN deletion fails silently
Post by: ripdog on February 23, 2025, 02:06:05 AM
Quote- Interfaces > YOUR_VLAN - uncheck "prevent interface removal"
- Interfaces > Assignments - delete the interface assignment
- Interfaces > Devices > VLAN - delete the VLAN

Thanks for the reply, but I did state in the first post that I did unassign the interface first. I didn't touch a checkbox called 'prevent interface removal', but presumably that no longer applies if the interface is unassigned?

QuoteJust no. 404 is when the web server cannot find anything mapped to a particular URL not when an application cannot find a managed entity inside that application. HTTP error codes are concerned with the HTTP protocol only, not with anything you drive over HTTP.

We're going to have to agree to disagree. In a REST protocol, both HTTP verbs and status codes are for the use of the application programmer, and are very useful in the creation of a predictable and easy-to-use API. Without use of status codes, the client-side has to do a bunch of complex validation logic on all replies from the server to ensure that the operation succeeded, ensuring that valid JSON and the correct 'success' message were passed.

With use of the 404 code, the client side could do simply set a default error handler for a set of status codes which displays the error to the user. This can be done globally for every request in the application with a single line of code.

And this thread proves my point - the Opnsense frontend does not validate the server response, and thus when the operation fails unexpectedly, no error is presented and I have to dig into the devtools to find out what went wrong.

Besides, if the application doesn't make use of status codes, literally why do they even exist? Apart from the main document request, where the browser displays HTTP errors, any failures to requests initiated by the page or JS are completely silent - unless the user looks in the devtools. So why do they exist? They exist for the application to use, to report errors from the backend to the frontend in a simple and standardized way.
Title: Re: VLAN deletion fails silently
Post by: EricPerl on February 23, 2025, 02:44:33 AM
Just looking at the config file, VLANs are identified by UUIDs.
It would make sense for low-level operations on VLANs (search, enumerations, edit, delete...) to use these UUIDs.

I've encountered weirdness like this before but after creation (IIRC assignment fails after creation).
I think I missed an apply. I suspect the same could happen the other way (deletion fails because the assignment is still present).
Looking at the configuration history can help in such cases.

404 is not the correct status code if the resource exists. That's not controversial.
200 is probably not the correct status code either if the operation did not effectively succeed.
Given the symptoms here, 409 - Conflict might have been better.
It's difficult to judge without details about the exact underlying condition.
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 23, 2025, 08:43:30 AM
Quote from: ripdog on February 23, 2025, 02:06:05 AMWe're going to have to agree to disagree. In a REST protocol, both HTTP verbs and status codes are for the use of the application programmer, and are very useful in the creation of a predictable and easy-to-use API.
Well, paint me blue and call me Tracy. I did read up on it and you are correct!

I did wrongly assume that HTTP is only the transport layer. And since an app has to evaluate the answer anyway that the HTTP status code didn't matter. I could not have been more wrong.

I read up on: https://restfulapi.net/http-status-codes/
Title: Re: VLAN deletion fails silently
Post by: Patrick M. Hausen on February 23, 2025, 10:16:28 AM
But your browser is not talking to an API. It's talking to a web application which is running just fine. Even when there was no bug and it would correctly display "VLAN not found" or some such, you would see a perfectly well rendered UI with an error message box in it. That's HTTP 200 in my book.
Title: Re: VLAN deletion fails silently
Post by: ripdog on February 23, 2025, 10:21:51 AM
Quote from: Patrick M. Hausen on February 23, 2025, 10:16:28 AMBut your browser is not talking to an API. It's talking to a web application which is running just fine. Even when there was no bug and it would correctly display "VLAN not found" or some such, you would see a perfectly well rendered UI with an error message box in it. That's HTTP 200 in my book.

It's not the browser in question, but rather the application running on the browser. The browser is just a platform, like the java runtime - it doesn't do anything on it's own, and it doesn't interpret http status codes (except for rendering errors when the main document fails to load). The codes exist for the convenience of the application, typically making error handling easier and more general.

After all, you said it yourself - the application should show errors from the server as an error message. But Opnsense is *not showing the error*! It's failing silently! Which is why I had to break out the devtools to find out what went wrong!

So if the Opnsense server-side reported errors in the standardised way, using http status codes, then the Opnsense front-end could be programmed to automatically show error messages whenever an API request returns a failure code. That's what they're for!

Anyway, this is getting off-topic. The primary issue is that I am stuck with an unremovable VLAN, and all the error messages in the world won't rid me of a useless VLAN.
Title: Re: VLAN deletion fails silently
Post by: Patrick M. Hausen on February 23, 2025, 11:04:24 AM
Got it now. I thought you wanted a 404 delivered to the browser.


For the problem and probably bug at hand:

Download a configuration backup, carefully remove the VLAN from the XML, reimport.

Then please create an issue in github with the relevant part of the config XML attached.
Title: Re: VLAN deletion fails silently
Post by: Patrick M. Hausen on February 23, 2025, 11:10:53 AM
Quote from: ripdog on February 23, 2025, 02:06:05 AMThanks for the reply, but I did state in the first post that I did unassign the interface first. I didn't touch a checkbox called 'prevent interface removal', but presumably that no longer applies if the interface is unassigned?

Correct. That option prevents removal of the assignment. I just outlined all the steps for completeness and possibly other readers.
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 23, 2025, 11:17:24 AM
Quote from: ripdog on February 23, 2025, 10:21:51 AMAnyway, this is getting off-topic. The primary issue is that I am stuck with an unremovable VLAN, and all the error messages in the world won't rid me of a useless VLAN.
If you're comfortable with the SSH shell, after making a backup remove it from the config.xml and reload the services "(11) Reload all services".

Search "vlans version" in the file, there the vlan section starts, remove the complete <vlan>...</vlan> section:
  <vlans version="1.0.0">
    <vlan uuid="73eeed76-87ad-4f32-858e-4ccf887004e1">
      <if>xn1</if>
      <tag>200</tag>
      <pcp>0</pcp>
      <proto/>
      <descr>LAB VLAN200</descr>
      <vlanif>vlan01</vlanif>
    </vlan>
  </vlans>

Maybe it's an issue because now there rules for vlanif naming seem to have changed. I can only name it vlan0.xx. Will test it but it has to wait for Monday.

Edit: Maybe you could share only that VLAN config part? I could then add it manually to my config and see what happens.
Title: Re: VLAN deletion fails silently
Post by: ripdog on February 23, 2025, 01:09:56 PM
It's funny, I had the same idea to reimport the xml, modified - but I was a little scared that importing a settings backup would clobber other settings.

Here's my VLANs from my config backup. It looks a little different :)

Quote<vlans version="1.0.0">
    <vlan>
      <if>em0</if>
      <tag>10</tag>
      <vlanif>em0_vlan10</vlanif>
    </vlan>
  </vlans>

Presumably the lack of stored UUID explains the UUID changing every time the VLAN list is requested.

QuoteIf you're comfortable with the SSH shell, after making a backup remove it from the config.xml and reload the services "(11) Reload all services".

Actually, that's the option I would prefer, if it's safe. If I make the changes to the configuration manually, I can be more assured that whatever backup restoration logic won't clobber anything unrelated. Where exactly is the config.xml?
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 23, 2025, 01:21:06 PM
Quote from: ripdog on February 23, 2025, 01:09:56 PM
QuoteIf you're comfortable with the SSH shell, after making a backup remove it from the config.xml and reload the services "(11) Reload all services".

Actually, that's the option I would prefer, if it's safe. If I make the changes to the configuration manually, I can be more assured that whatever backup restoration logic won't clobber anything unrelated. Where exactly is the config.xml?
The file is /conf/config.xml (the backups are in /conf/backup).

About the "if it's safe" part: you're changing the config file directly. I wouldn't do it if you can't afford a (worst-case) downtime and have a clear way to restore a backup (involving a monitor or serial connection).
Personally I never had an issue (touch wood) but don't do it often and have everything ready for a restore.
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 23, 2025, 03:38:00 PM
Quote from: ripdog on February 23, 2025, 01:09:56 PM
Quote<vlans version="1.0.0">
    <vlan>
      <if>em0</if>
      <tag>10</tag>
      <vlanif>em0_vlan10</vlanif>
    </vlan>
  </vlans>
Just a quick test by adding your vlan config to my config (in the lab of course :) and I can not remove it either. And if I try to edit it, all the fields are empty.

My second step was to add a uuid (uuidgen is helpful for that) to that vlan tag and then it worked. Editing was possible and removing too.

Btw, what version OPNsense are you running, my test was on 25.7.a_36.
Title: Re: VLAN deletion fails silently
Post by: EricPerl on February 23, 2025, 09:39:02 PM
I feel ignored 😉

The absence of the UUID in the config is clearly a bug.
The creation of a UUID on the fly might self-correct the issue but only IF the UUID is added to the config (dicey on an enumeration or search).
Circling back to status codes, 404 was likely the appropriate code because whatever ID was used obviously did not match any resource.

You should be able to find the operation that "corrupted" the configuration by looking at the history.
Hopefully you didn't mess that up when you edited in the config file by hand.
The backup folder contains previous versions of the config file, apparently ordered by timestamps.
In the GUI, you can compare versions in System > Configuration > History.
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 24, 2025, 09:46:31 AM
Quote from: EricPerl on February 23, 2025, 09:39:02 PMI feel ignored 😉

The absence of the UUID in the config is clearly a bug.
...
Circling back to status codes, 404 was likely the appropriate code because whatever ID was used obviously did not match any resource.
UI, you can compare versions in System > Configuration > History.
You're now acknowledged, young Padawan :)

Of course I'm with you as in that it is a bug. For it to be reported I guess it would be helpful to know the current OPNsense version and if possible get an idea of the upgrade path taken (e.g. "started with OPNsense 1 in 1889 and went from there" :) ) and best the full config file (minus all privacy sensitive data). Somewhere along the upgrades the UUIDs should have been added but wasn't.
Title: Re: VLAN deletion fails silently
Post by: ripdog on February 24, 2025, 09:56:58 AM
Thanks for the help, all. I can confirm that adding a UUID to the VLAN in config.xml and reconfiguring fixed my issue.

I can only assume that the age of my VLAN, remaining untouched for multiple main version upgrades of Opnsense caused it to never receive a UUID at all. I suppose I should file a bug, now... sigh.


QuoteI feel ignored 😉

I'm sorry, but I didn't see anything in your post which required a reply. It was obvious from the beginning that UUIDs were used to identify the VLANs, as all API methods were referencing UUIDs - the issue being that the UUID for my VLAN changed every time it was retrieved.

QuoteI think I missed an apply. I suspect the same could happen the other way (deletion fails because the assignment is still present).

I had no reason to believe any of this applied to me. I had a fully functioning VLAN for years, I obviously didn't miss clicking apply. And my VLAN was not assigned.

QuoteYou should be able to find the operation that "corrupted" the configuration by looking at the history.
Hopefully you didn't mess that up when you edited in the config file by hand.

I have never edited my config file before today. Hell, I had to ask in this thread where it was!

My oldest backup has my vlan in its current state - that is, without a UUID. It was from 6 April 2023.

QuoteBtw, what version OPNsense are you running, my test was on 25.7.a_36.

OPNsense 25.1.1-amd64.

QuoteFor it to be reported I guess it would be helpful to know the current OPNsense version and if possible get an idea of the upgrade path taken (e.g. "started with OPNsense 1 in 1889 and went from there" :) ) and best the full config file (minus all privacy sensitive data). Somewhere along the upgrades the UUIDs should have been added but wasn't.

I'm afraid I haven't the foggiest when I installed Opnsense. I'd guest 2022...? But I've at least run this install through the last 3-4 major versions.

The config file is huge, I wouldn't feel comfortable sharing it publicly even if I had tried to clean it up.
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 24, 2025, 10:07:40 AM
Quote from: ripdog on February 24, 2025, 09:56:58 AMThe config file is huge, I wouldn't feel comfortable sharing it publicly even if I had tried to clean it up.
That is fair enough, I can understand.

When I get time this week I'll try to install a version from 3 or 4 years back. I assume it to be reproducible that way, as long as a VLAN without an UUID is created.
Title: Re: VLAN deletion fails silently
Post by: patient0 on February 24, 2025, 02:01:08 PM
Quote from: ripdog on February 24, 2025, 09:56:58 AMI'm afraid I haven't the foggiest when I installed Opnsense. I'd guest 2022...? But I've at least run this install through the last 3-4 major versions.
Btw, was is usually your upgrade path? Upgrade through OPNsense itself or reinstall from scratch and import the config file?
Title: Re: VLAN deletion fails silently
Post by: EricPerl on February 24, 2025, 09:04:31 PM
The wink emoji was a clue that this was kind of a joke (the being ignored part).

This said,
QuoteJust looking at the config file, VLANs are identified by UUIDs.
It would make sense for low-level operations on VLANs (search, enumerations, edit, delete...) to use these UUIDs.
Was my way of enticing the OP to look at the config file and compare UUIDs there with the ones used in the API calls.
The absence of UUID was fatal.

Some of the rest was conjecture based on lack of context...

If the oldest backup already has a corrupt entry, I'm not entirely where a bug report about that part would go.
It's clearly the source of the issue.

A bug to handle the resulting situation more gracefully might have a better chance for a fix (more resilience).
It's easy to reproduce!
Title: Re: VLAN deletion fails silently
Post by: franco on February 24, 2025, 09:14:39 PM
Sorry, I didn't see this earlier but I think most has been said.

To explain how this can happen... it doesn't except that I think I know when this may have occurred:

> community/22.1/22.1.4:o interfaces: VLAN MVC conversion with API and QinQ support

22.1.4 added the UUID use for MVC use, but for a while there was a bug in the console that did not add the UUID for a newly created VLAN item until:

> community/22.7/22.7.7:o console: store UUID for VLAN device

That means that 22.7 images in particular exhibited this problem, which would be my guess when this install was first set up?


Cheers,
Franco
Title: Re: VLAN deletion fails silently
Post by: patient0 on March 01, 2025, 09:00:10 AM
I wasn't able to reproduce the issue, tried a few things starting with 21.1.

Regular upgrading worked until 22.1.10_4 (from 22.1) where UUID where introduced. The VLAN was ok (UUID added) but WAN was stuck and I had to assign WAN to the VLAN parent interface, apply and back to VLAN to get it working again. Comparing the config assign-and-reassign did just add some empty tags for bridges, gifs and gres.

And importing the non-UUID config into 22.1 did work too, the UUID was added. Not much of help but that's that.

Strangely OTs VLAN tag is missing the descr and pcp tags completely, they got added the first time I added the VLAN in 21.1, even after removing them manually from the XML beforehand.