VLAN deletion fails silently

Started by ripdog, February 22, 2025, 01:45:28 AM

Previous topic - Next topic
Hi all,

I've been running Opnsense for a number of years on an ISP which required all traffic to be tagged with VLAN 10. I've now changed ISPs, and no longer have to tag my traffic. So, I unassigned the VLAN device in Interfaces>Assignments, and went into Interfaces>Devices>VLAN to delete the VLAN. This doesn't work.

In the browser devtools, the delItem call returns http 200 "not found", always a good sign. Oddly, the UUID which is sent in the delItem call is different from the UUID returned in the immediately following searchItem call. I tried to delItem the UUID manually, but that didn't work either.

The cause was immediately obvious. I re-ran the searchItem call, and the reported UUID for the VLAN changed. It is being re-generated every time it's requested.

For this reason, editing the VLAN fails, as the getItem calls fail with the error code http 200 and empty body.

(You can tell this is quality, secure software with the way they very securely return 200 for all http reqs, and don't report errors to the user.)

The VLAN certainly still exists, is available for assigning and appears in ifconfig output.

I've tried forcing a deletion by putting the ifname in /tmp/.vlans.removed and running #configctl interface vlan configure, which correctly reports a failure to delete the interface by printing... 'OK'. (I'm noticing a pattern here) It does clear the file in /tmp/.

Any ideas?

Quoteand went into Interfaces>Devices>VLAN to delete the VLAN. This doesn't work
How did it not work, error message in the GUI?

QuoteIn the browser devtools, the delItem call returns http 200 "not found" ...
...
You can tell this is quality, secure software with the way they very securely return 200 for all http reqs, and don't report errors to the user.
Let's not jump into the web programming as the first step, let's see if it can be replicated in the GUI.

And I'm sure you are aware what the HTTP return codes refer to. And since you do, you know that 200 means the HTTP request was successful; which it was. The API call on the other hand returned a sensible answer too, "not found".
Deciso DEC740

QuoteHow did it not work, error message in the GUI?

Sorry, I should have been more clear. As the thread title said, it fails silently. No feedback whatsoever. That's why I delved into the devtools, as I had no idea what was happening.

QuoteAnd I'm sure you are aware what the HTTP return codes refer to. And since you do, you know that 200 means the HTTP request was successful; which it was. The API call on the other hand returned a sensible answer too, "not found".

That's a very... interesting... opinion as to the purpose of http status codes. In this case, the correct code would have been 404, entity not found. Using them in this way can dramatically simplify error handling on the client side, allowing client side code to confidently assume the operation succeeded on a return of 200 and confidently assume failure on a non-2xx return.

Quote from: ripdog on February 22, 2025, 09:07:49 PMIn this case, the correct code would have been 404, entity not found.

Just no. 404 is when the web server cannot find anything mapped to a particular URL not when an application cannot find a managed entity inside that application. HTTP error codes are concerned with the HTTP protocol only, not with anything you drive over HTTP.

Could you line out the steps you took? IMHO they should be (for 25.1):

- Interfaces > YOUR_VLAN - uncheck "prevent interface removal"
- Interfaces > Assignments - delete the interface assignment
- Interfaces > Devices > VLAN - delete the VLAN

For 24.x IIRC switch "Devices" in the last step for "Other types" or similar.

HTH,
Patrick
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote- Interfaces > YOUR_VLAN - uncheck "prevent interface removal"
- Interfaces > Assignments - delete the interface assignment
- Interfaces > Devices > VLAN - delete the VLAN

Thanks for the reply, but I did state in the first post that I did unassign the interface first. I didn't touch a checkbox called 'prevent interface removal', but presumably that no longer applies if the interface is unassigned?

QuoteJust no. 404 is when the web server cannot find anything mapped to a particular URL not when an application cannot find a managed entity inside that application. HTTP error codes are concerned with the HTTP protocol only, not with anything you drive over HTTP.

We're going to have to agree to disagree. In a REST protocol, both HTTP verbs and status codes are for the use of the application programmer, and are very useful in the creation of a predictable and easy-to-use API. Without use of status codes, the client-side has to do a bunch of complex validation logic on all replies from the server to ensure that the operation succeeded, ensuring that valid JSON and the correct 'success' message were passed.

With use of the 404 code, the client side could do simply set a default error handler for a set of status codes which displays the error to the user. This can be done globally for every request in the application with a single line of code.

And this thread proves my point - the Opnsense frontend does not validate the server response, and thus when the operation fails unexpectedly, no error is presented and I have to dig into the devtools to find out what went wrong.

Besides, if the application doesn't make use of status codes, literally why do they even exist? Apart from the main document request, where the browser displays HTTP errors, any failures to requests initiated by the page or JS are completely silent - unless the user looks in the devtools. So why do they exist? They exist for the application to use, to report errors from the backend to the frontend in a simple and standardized way.

Just looking at the config file, VLANs are identified by UUIDs.
It would make sense for low-level operations on VLANs (search, enumerations, edit, delete...) to use these UUIDs.

I've encountered weirdness like this before but after creation (IIRC assignment fails after creation).
I think I missed an apply. I suspect the same could happen the other way (deletion fails because the assignment is still present).
Looking at the configuration history can help in such cases.

404 is not the correct status code if the resource exists. That's not controversial.
200 is probably not the correct status code either if the operation did not effectively succeed.
Given the symptoms here, 409 - Conflict might have been better.
It's difficult to judge without details about the exact underlying condition.

Quote from: ripdog on February 23, 2025, 02:06:05 AMWe're going to have to agree to disagree. In a REST protocol, both HTTP verbs and status codes are for the use of the application programmer, and are very useful in the creation of a predictable and easy-to-use API.
Well, paint me blue and call me Tracy. I did read up on it and you are correct!

I did wrongly assume that HTTP is only the transport layer. And since an app has to evaluate the answer anyway that the HTTP status code didn't matter. I could not have been more wrong.

I read up on: https://restfulapi.net/http-status-codes/
Deciso DEC740

But your browser is not talking to an API. It's talking to a web application which is running just fine. Even when there was no bug and it would correctly display "VLAN not found" or some such, you would see a perfectly well rendered UI with an error message box in it. That's HTTP 200 in my book.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on February 23, 2025, 10:16:28 AMBut your browser is not talking to an API. It's talking to a web application which is running just fine. Even when there was no bug and it would correctly display "VLAN not found" or some such, you would see a perfectly well rendered UI with an error message box in it. That's HTTP 200 in my book.

It's not the browser in question, but rather the application running on the browser. The browser is just a platform, like the java runtime - it doesn't do anything on it's own, and it doesn't interpret http status codes (except for rendering errors when the main document fails to load). The codes exist for the convenience of the application, typically making error handling easier and more general.

After all, you said it yourself - the application should show errors from the server as an error message. But Opnsense is *not showing the error*! It's failing silently! Which is why I had to break out the devtools to find out what went wrong!

So if the Opnsense server-side reported errors in the standardised way, using http status codes, then the Opnsense front-end could be programmed to automatically show error messages whenever an API request returns a failure code. That's what they're for!

Anyway, this is getting off-topic. The primary issue is that I am stuck with an unremovable VLAN, and all the error messages in the world won't rid me of a useless VLAN.

February 23, 2025, 11:04:24 AM #9 Last Edit: February 23, 2025, 11:38:35 AM by Patrick M. Hausen
Got it now. I thought you wanted a 404 delivered to the browser.


For the problem and probably bug at hand:

Download a configuration backup, carefully remove the VLAN from the XML, reimport.

Then please create an issue in github with the relevant part of the config XML attached.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: ripdog on February 23, 2025, 02:06:05 AMThanks for the reply, but I did state in the first post that I did unassign the interface first. I didn't touch a checkbox called 'prevent interface removal', but presumably that no longer applies if the interface is unassigned?

Correct. That option prevents removal of the assignment. I just outlined all the steps for completeness and possibly other readers.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

February 23, 2025, 11:17:24 AM #11 Last Edit: February 23, 2025, 11:23:32 AM by patient0
Quote from: ripdog on February 23, 2025, 10:21:51 AMAnyway, this is getting off-topic. The primary issue is that I am stuck with an unremovable VLAN, and all the error messages in the world won't rid me of a useless VLAN.
If you're comfortable with the SSH shell, after making a backup remove it from the config.xml and reload the services "(11) Reload all services".

Search "vlans version" in the file, there the vlan section starts, remove the complete <vlan>...</vlan> section:
  <vlans version="1.0.0">
    <vlan uuid="73eeed76-87ad-4f32-858e-4ccf887004e1">
      <if>xn1</if>
      <tag>200</tag>
      <pcp>0</pcp>
      <proto/>
      <descr>LAB VLAN200</descr>
      <vlanif>vlan01</vlanif>
    </vlan>
  </vlans>

Maybe it's an issue because now there rules for vlanif naming seem to have changed. I can only name it vlan0.xx. Will test it but it has to wait for Monday.

Edit: Maybe you could share only that VLAN config part? I could then add it manually to my config and see what happens.
Deciso DEC740

It's funny, I had the same idea to reimport the xml, modified - but I was a little scared that importing a settings backup would clobber other settings.

Here's my VLANs from my config backup. It looks a little different :)

Quote<vlans version="1.0.0">
    <vlan>
      <if>em0</if>
      <tag>10</tag>
      <vlanif>em0_vlan10</vlanif>
    </vlan>
  </vlans>

Presumably the lack of stored UUID explains the UUID changing every time the VLAN list is requested.

QuoteIf you're comfortable with the SSH shell, after making a backup remove it from the config.xml and reload the services "(11) Reload all services".

Actually, that's the option I would prefer, if it's safe. If I make the changes to the configuration manually, I can be more assured that whatever backup restoration logic won't clobber anything unrelated. Where exactly is the config.xml?

Quote from: ripdog on February 23, 2025, 01:09:56 PM
QuoteIf you're comfortable with the SSH shell, after making a backup remove it from the config.xml and reload the services "(11) Reload all services".

Actually, that's the option I would prefer, if it's safe. If I make the changes to the configuration manually, I can be more assured that whatever backup restoration logic won't clobber anything unrelated. Where exactly is the config.xml?
The file is /conf/config.xml (the backups are in /conf/backup).

About the "if it's safe" part: you're changing the config file directly. I wouldn't do it if you can't afford a (worst-case) downtime and have a clear way to restore a backup (involving a monitor or serial connection).
Personally I never had an issue (touch wood) but don't do it often and have everything ready for a restore.
Deciso DEC740

Quote from: ripdog on February 23, 2025, 01:09:56 PM
Quote<vlans version="1.0.0">
    <vlan>
      <if>em0</if>
      <tag>10</tag>
      <vlanif>em0_vlan10</vlanif>
    </vlan>
  </vlans>
Just a quick test by adding your vlan config to my config (in the lab of course :) and I can not remove it either. And if I try to edit it, all the fields are empty.

My second step was to add a uuid (uuidgen is helpful for that) to that vlan tag and then it worked. Editing was possible and removing too.

Btw, what version OPNsense are you running, my test was on 25.7.a_36.
Deciso DEC740