Opensense Fatal Design Bug

Started by forum111, January 24, 2024, 01:00:42 PM

Previous topic - Next topic
January 24, 2024, 01:00:42 PM Last Edit: January 24, 2024, 01:04:28 PM by forum111
I will try to explain why opensense have general architecture design problem.

Let say  you are administrator in Microsoft Azure. Everything is cloude and virtual.
On the main server we have one virtual machine - opensense.
Azure virtual machine work on some internal network 192.168.10.0/24. You can not change virtual server network.
The resone for that is related to existing 100 virtual machines works on the server.

Now how you will bridge opensense which is deployed on the one virtual machine?

You can not!!!

The resone for that is related to entire save and applay feature.
OpenWRT can track all changes and at the end apply everything at one pass.
Opensense have many operations which are not track, added into the que and prepare  for applay.

For example, it is impossible to do this at one simple apply.
1. Lan 192.168.10.1 is static with dhcp attached.
2. The we add new bridge with static 192.168.20.1 and dhcp attached.
3. Now apply everything and it is OK
4. Then set lan to none and remove dhcp.
5. Change bridge to 192.168.10.1 and update attached dhcp
6. Apply all (this can not be done in opensense).


update the network subnet for a minute before delete LAN and then switch to 192.168.10.0 but from bridge.

For example, to change network from

Just take a look openwrt to see how each step of change is track and can be applay. Even if we delete interface.
Validation must not be at save! Must be at applay.  For example: opensense validate dhcp interfaces problem at save not at applay changes.

Even worst,many time applay is missing and the save is in fact applay.  Which is wrong. Let say deletion operation.


Have you tried using the VM console instead of the UI?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

OK. I promise to explain everything.
First off I try console. This is the only way I can get enter after step 3, quote from here: https://docs.opnsense.org/manual/how-tos/lan_bridge.html

Now the problem is related to the tutorial. How to assign, replace LAN0 interface from physical device(let say vmx0) to bridge. There not a single help on that topic.

Apart of that. Why? Why we associate LAN to bridge? This is wronge.
The correct one is:
1. disable this auto firewall on the fly when changing/adding/deleting interfaces
2. Create bridge and set it up and assign dhcp server with static IP.
3.Then everything should work when we stop DHCP server for LAN0. Before that need to assign new LAN0 to bridge. And that is all. This must be OK but for Opnsense is not OK.


Bug 1. DHCP server on the Bridge does not give IPs to clients.
Bug2. Can not enter into static bridge interface, even if firewall rules are set correctly.

I will try once again to explain why Opnsense will never be a professional router.

Let make this test.
------------------------------------------------------------------------------------------------
interfaces        | IP4 type                                |      IP                           |   DHCP server      |                |           
----------------------------------------------------------------------------------------------------
em0 - WAN       | DHCP                                    |     200.20.20.20           |                           |                |           
vmx0 - LAN0     | Static IP                                |     192.168.1.1             |  ACTIVE for lease
vmx1- OPT1      | NONE (act as switch only)      |
vmx2- OPT2      | NONE (act as switch only)      |
vmx3- OPT3      | NONE (act as switch only)      |
vmx4- OPT4      | NONE (act as switch only)      |

bridge0 (OPT1,OPT2,OPT3,OPT4) | Static IP       |     192.168.10.1          |   ACTIVE for lease |                |   

bridge0 is added into interface assignments and activated. After all interfaces are up and running. Both DHCP services are running for LAN0 and bridge0.

Now let's make this change.
------------------------------------------------------------------------------------------------
interfaces        | IP4 type                                |      IP                           |   DHCP server      |                |           
----------------------------------------------------------------------------------------------------   
vmx0 - LAN0     | NONE (act as switch only)      |                                    |  DEACTIVE service


Then we add new bridge0 assignments:
bridge0 (OPT1,OPT2,OPT3,OPT4, LAN0)


Then we change ip of bridge and dhcp server range:
bridge0 (OPT1,OPT2,OPT3,OPT4) | Static IP       |     192.168.1.1          |   ACTIVE for lease |                | 


The last steps, first is impossible to be made with opnsense.
First, you will lose connections. Then you can not enter into bridge ports. Even firewall is done for each interfaces. And the only way is to enter from virtual console but there not any command in regards of set this command for example:
Set interface IPv4 type from static to NONE type. Also, missing help for cli command. Firewall with cli is madness. For example we need some command just to pass everything?  I try 2 weeks. I often located admins who fighting months with this.

Yes for home router may be OK. But big infrastructure? No!!!


Quote from: Patrick M. Hausen on January 24, 2024, 01:14:56 PM
Have you tried using the VM console instead of the UI?

January 25, 2024, 12:05:19 PM #3 Last Edit: January 25, 2024, 12:06:59 PM by Patrick M. Hausen
Big infrastructure should not use OPNsense bridging to create a "switch", because packet forwarding will be done by the main CPU in software. Use a real switch and create e.g. a "router on a stick" setup.

But if you insist, depending on the bandwidth requirements, of course that can be done. But you need to set two tunables as documented in the LAN bridge document I already linked:

https://docs.opnsense.org/manual/how-tos/lan_bridge.html

Don't overlook/skip step 6. This is mandatory for the firewall rules to be applied at the logical interface (the bridge) instead of the individual members.

Also: this looks like a virtualised environment, specifically VMware ESXi. Why not do the switching in ESXi as recommended? Why do you need N bridged vmxnet interfaces? Just hook a single interface to a vswitch and you are done.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: Patrick M. Hausen on January 25, 2024, 12:05:19 PM
Big infrastructure should not use OPNsense bridging to create a "switch", because packet forwarding will be done by the main CPU in software. Use a real switch and create e.g. a "router on a stick" setup.

But if you insist, depending on the bandwidth requirements, of course that can be done. But you need to set two tunables as documented in the LAN bridge document I already linked:

https://docs.opnsense.org/manual/how-tos/lan_bridge.html

Don't overlook/skip step 6. This is mandatory for the firewall rules to be applied at the logical interface (the bridge) instead of the individual members.

Also: this looks like a virtualised environment, specifically VMware ESXi. Why not do the switching in ESXi as recommended? Why do you need N bridged vmxnet interfaces? Just hook a single interface to a vswitch and you are done.

I am continually amazed at the number of people that want to use OPNsense for switching instead of buying/configuring a separate piece of equipment.

Quote from: CJ on January 25, 2024, 02:58:53 PM
I am continually amazed at the number of people that want to use OPNsense for switching instead of buying/configuring a separate piece of equipment.
I think it does make sense for a home or small office setup and e.g. a 6-port Protectli appliance. People rightfully (IMHO) expect to be able to use all ports just like with any consumer router.

And with the complete rework of the bridging code 1 Gbit/s can easily be achieved.

The OP insists they are running "big infrastructure" which is a different use case altogether.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

see below... it doesn't change....
kind regards
chemlud
____
"The price of reliability is the pursuit of the utmost simplicity."
C.A.R. Hoare

felix eichhorns premium katzenfutter mit der extraportion energie

A router is not a switch - A router is not a switch - A router is not a switch - A rou....

Quote from: Patrick M. Hausen on January 25, 2024, 03:02:00 PM
I think it does make sense for a home or small office setup and e.g. a 6-port Protectli appliance. People rightfully (IMHO) expect to be able to use all ports just like with any consumer router.

And with the complete rework of the bridging code 1 Gbit/s can easily be achieved.

Sure, but it can also easily handled with less fuss but purchasing a $20 switch. :)

Quote from: Patrick M. Hausen on January 25, 2024, 03:02:00 PM
The OP insists they are running "big infrastructure" which is a different use case altogether.

Agreed.  This thread just spurred my comment on things in general.

Quote from: chemlud on January 25, 2024, 03:03:22 PM
see below... it doesn't change....

I have had so many arguments with coworkers about how it's okay to have something be slightly less efficient in order to vastly reduce it's complexity, etc.

Quote from: CJ on January 25, 2024, 03:24:08 PM
Quote from: Patrick M. Hausen on January 25, 2024, 03:02:00 PM
I think it does make sense for a home or small office setup and e.g. a 6-port Protectli appliance. People rightfully (IMHO) expect to be able to use all ports just like with any consumer router.

And with the complete rework of the bridging code 1 Gbit/s can easily be achieved.

Sure, but it can also easily handled with less fuss but purchasing a $20 switch. :)

Quote from: Patrick M. Hausen on January 25, 2024, 03:02:00 PM
The OP insists they are running "big infrastructure" which is a different use case altogether.

Agreed.  This thread just spurred my comment on things in general.

Quote from: chemlud on January 25, 2024, 03:03:22 PM
see below... it doesn't change....

I have had so many arguments with coworkers about how it's okay to have something be slightly less efficient in order to vastly reduce it's complexity, etc.

Put the OPs particular scenario aside for a moment:
In a more generic sense, the OP has a point regarding the way OPNsense saves configuration changes.
I also noticed that it seems to save more often than I expect and would like.

The ideal save mechanism would be:
1. make as many configuration changes to any part of the entire system configuration as you like - NOTHING is APPLIED and NOTHING is SAVED unless you manually click "SAVE".
- there could be a global Configuration Setting called "Save configuration changes automatically" that could override the requirement to manually click "SAVE" each time.
2. Upon manually clicking SAVE, all changes since the most recent "SAVE" operation are saved to a "Candidate Configuration" - this is the complete set of all changes compiled together since the last "APPLY/COMMIT" operation (i.e. the last time changes were applied to the running/live/operating configuration).
3. Validate Configuration - this could be automatic or manual, and it is simply a function that validates all configuration changes for consistency and completion - that is, it will check to see if any incompatible changes have been saved, or whether any of the new configuration changes require other changes that are still missing.
4. Manually click "APPLY/COMMIT" (or schedule an "APPLY/COMMIT" job at a specified date-time) - this is when the "Candidate Configuration" is applied to the running system configuration and the operation of the device takes on the new behaviour according to the new configuration settings applied.
.

If this was built-in, this would be am AMAZING feature!

The cherry on top would be to be able to do Partial Commit/Apply" operations - that is, the ability to commit only some changes to the running configuration PROVIDED THAT the sub-set of changes being applied are a complete set of functionality settings that do not need all the other SAVED changes to be applied at the same time in order to work.

For example:
1. Let's say you have updated the System NTP servers - this is ONE atomic change;
2. Now let's say you have created a new OpenVPN Client tunnel and interface assignment.
3. You want, and are allowed, to APPLY the NTP changes during Business hours - this would be the first COMMIT/APPLY;
4. However, you must schedule a job to APPLY the new OpenVPN changes Out of Hours under a separate Commit operation.

(But I know I am venturing deep into Enterprize System territory here...)

Quote from: skatopn on February 01, 2024, 11:05:38 AM
Put the OPs particular scenario aside for a moment:
In a more generic sense, the OP has a point regarding the way OPNsense saves configuration changes.
I also noticed that it seems to save more often than I expect and would like.

The ideal save mechanism would be:
1. make as many configuration changes to any part of the entire system configuration as you like - NOTHING is APPLIED and NOTHING is SAVED unless you manually click "SAVE".
- there could be a global Configuration Setting called "Save configuration changes automatically" that could override the requirement to manually click "SAVE" each time.
2. Upon manually clicking SAVE, all changes since the most recent "SAVE" operation are saved to a "Candidate Configuration" - this is the complete set of all changes compiled together since the last "APPLY/COMMIT" operation (i.e. the last time changes were applied to the running/live/operating configuration).
3. Validate Configuration - this could be automatic or manual, and it is simply a function that validates all configuration changes for consistency and completion - that is, it will check to see if any incompatible changes have been saved, or whether any of the new configuration changes require other changes that are still missing.
4. Manually click "APPLY/COMMIT" (or schedule an "APPLY/COMMIT" job at a specified date-time) - this is when the "Candidate Configuration" is applied to the running system configuration and the operation of the device takes on the new behaviour according to the new configuration settings applied.
.

If this was built-in, this would be am AMAZING feature!

The cherry on top would be to be able to do Partial Commit/Apply" operations - that is, the ability to commit only some changes to the running configuration PROVIDED THAT the sub-set of changes being applied are a complete set of functionality settings that do not need all the other SAVED changes to be applied at the same time in order to work.

For example:
1. Let's say you have updated the System NTP servers - this is ONE atomic change;
2. Now let's say you have created a new OpenVPN Client tunnel and interface assignment.
3. You want, and are allowed, to APPLY the NTP changes during Business hours - this would be the first COMMIT/APPLY;
4. However, you must schedule a job to APPLY the new OpenVPN changes Out of Hours under a separate Commit operation.

(But I know I am venturing deep into Enterprize System territory here...)

Actually, I've run into the issue where I didn't hit Apply/Save more often than I have where OPNsense saved something I wasn't expecting it to. :D

What you're talking about is like how a lot of Enterprise gear works, so that you can make all the changes in memory but they're not actually saved until you tell them to.  This way if you mess up, a quick power interruption puts things back.  But that brings it's own troubles as there have been many cases where admins never saved things and the first power outage takes down the entire network because it resets to the old config and no one has any idea what the current config was.

Additionally, every single change in OPNsense is tracked and fairly easily reversed.  If you do something you didn't mean to, you can go to the config history and revert back.  Admittedly, sometimes this requires a reboot to get all of the various services to see the correct config, but it's pretty painless.

I have spent more than the last 22 years working on enterprise firewall and security systems, so...yeah, you are correct.