Automatically generated rules - is the reason I stopped migrating to OPNSense

Started by newjohn, September 26, 2023, 06:12:07 AM

Previous topic - Next topic
Did you create the VLANs in OPNsense? It looks like it. How is the parent interface passed into the VM? If this is all virtual, the common approach is to define all VLANs on ESXi and pass a matching number of virtual interfaces into the firewall VM. Possibly alle your VLANs are not really VLANs at all?
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: newjohn on September 26, 2023, 11:11:09 PM
Therefore althought its always a possiility due to misconfiguration i think its unlikely?

Only one statement CAN be true: You discovered a major security bug/flaw in PF and/or OPNSense OR you have issues in your virtual environment, network topology, firewall config etc.

Because your issue is quite simple, allowing ICMP echo request / reply between two interfaces, I suggest you create a more simple setup to debug and report on with a basic and easy to understand topology and ruleset. Your virtual so deploying an extra VM should be an easy task for such a potential high impact security bug/flaw (not only impacting OPNSense).

So a default setup with LAN + WAN and after that creating an extra OPT interface to introduce a second LAN segment. Keep EVERYTHING default just configure the interface IP's and DO NOTHING else. To make it super transparent, don't mess with VLAN interfaces YET. If you need just create static mappings between your OPNSense interfaces and your existing VLANs by creating VLAN port groups in vSphere VDS or Open vSwitch on KVM or whatever you are using.

Now if you can confirm that in this DEFAULT setup you can ping from a host in the OPT network to a host in the LAN network it's time to get scared. The other way around, ping from _DEFAULT_ LAN network to OPT network, will work as explained in previous posts.

I couldn't bother to plow through your ruleset (too much noise), but I did check your statement at a system here with ten's of raw interfaces, bridges and VLANs (both LAGG and non-LAGG) and couldn't reproduce it on any.

From my perspective a misconfiguration is VERY likely, but I'm eager to hear about the issues you find in the above "test" setup.

Quote from: newjohn on September 26, 2023, 11:11:09 PM
Quote from: IsaacFL on September 26, 2023, 10:30:25 PM
Are you are running opnsense virtualized?

I think you have your Virtual Host configured incorrectly to support vlans. Either that or your external smart switch is incorrectly set up.  The symptoms you are describing is exactly what happens when vlans are not configured correctly on the external switch and they are getting combined. This is external to opnsense.

Yes its virtulised. When i first read your input in the first instance it did seem to make sense. But as i thought it through i thought otherwise. Let me explain why i think its not the case.

Suppose i misconfigured ESXi and/or the switch. opnsense should still block the ping when it passes through it. we know it passes through it because if i shut it down both vms stop being able to ping each other. Therefore althought its always a possiility due to misconfiguration i think its unlikely?

No, you definitely have the virtualization misconfigured. Basically you have cross connected multiple layer 3 ip subnets onto the same layer 2 Ethernet segment. Virtually or via your external switch.

Quote from: netnut on September 27, 2023, 12:01:03 AM
Quote from: newjohn on September 26, 2023, 11:11:09 PM
Therefore althought its always a possiility due to misconfiguration i think its unlikely?

Only one statement CAN be true: You discovered a major security bug/flaw in PF and/or OPNSense OR you have issues in your virtual environment, network topology, firewall config etc.

Because your issue is quite simple, allowing ICMP echo request / reply between two interfaces, I suggest you create a more simple setup to debug and report on with a basic and easy to understand topology and ruleset. Your virtual so deploying an extra VM should be an easy task for such a potential high impact security bug/flaw (not only impacting OPNSense).

So a default setup with LAN + WAN and after that creating an extra OPT interface to introduce a second LAN segment. Keep EVERYTHING default just configure the interface IP's and DO NOTHING else. To make it super transparent, don't mess with VLAN interfaces YET. If you need just create static mappings between your OPNSense interfaces and your existing VLANs by creating VLAN port groups in vSphere VDS or Open vSwitch on KVM or whatever you are using.

Now if you can confirm that in this DEFAULT setup you can ping from a host in the OPT network to a host in the LAN network it's time to get scared. The other way around, ping from _DEFAULT_ LAN network to OPT network, will work as explained in previous posts.

I couldn't bother to plow through your ruleset (too much noise), but I did check your statement at a system here with ten's of raw interfaces, bridges and VLANs (both LAGG and non-LAGG) and couldn't reproduce it on any.

From my perspective a misconfiguration is VERY likely, but I'm eager to hear about the issues you find in the above "test" setup.

I agree it would be good to get to the bottom of this.

Really busy day at work so have not had the chance to do the test you asked yet. But I did some thinkering and think I figured it out. I need another pair of eyes to confirm if this is the case.

VLAN190 has permit all rule
VLAN160 has NO rule

If i start the ping from vlan190 to vlan160 (vlan190 has permit all statement) it works ok (as expected).

If i reboot both PCs and the firewall so there is no established connection as far as the firewall is concerned and try to ping from vlan160 (which does not have any firewall statements), it fails (as expected).

So far the firewall does what its expected.

However, the point it gets unexpected is as follows:

After the reboot all connections states is cleared. so if i start the ping from vlan160 to vlan190, it fails as expected.
However, I noticed the second i start pinging from vlan190 back to vlan160 naturally vlan190 works ok, but because the firewall now has established an open state between this two IP Addresses it allows vlan160 to ping back vlan190.

if i stop the ping from vlan190 and reset the state table the ping stops which supports my findings?

Whats is your take on this please?
Please see the attached screenshot.

Quote from: Patrick M. Hausen on September 26, 2023, 11:48:35 PM
Did you create the VLANs in OPNsense? It looks like it. How is the parent interface passed into the VM? If this is all virtual, the common approach is to define all VLANs on ESXi and pass a matching number of virtual interfaces into the firewall VM. Possibly alle your VLANs are not really VLANs at all?

How is the parent interface passed into the VM?
I use esxi and its virtual interface, not passed through.

If this is all virtual, the common approach is to define all VLANs on ESXi and pass a matching number of virtual interfaces into the firewall VM.
I think thats would not be posisble as ESXi have limit of 10 NICs max per VM. so if you want to have more than 10 vlans which i do that approach wont be posssible.

Possibly alle your VLANs are not really VLANs at all?
The same Infrastructure works ok with pfsense, but pfsense has a bug i hate hence the adventure to opnsense :)

Did you configure a vswitch, and added a port group with the VLAN ID of 4095? And connected this Portgroup with 4095 as E1000 or vmxnet3 nic to the Opnsense so its a trunk port (accepting all vlan tags)?
And then created port groups on that vswitch with all additional vlan IDs 100-190... you have and connected those to your VMs?

I only configure Opnsenses on ESXi with PCIe passthrough when they need VLANs, I tested it and they don't have problems with states as described above. I can't replicate it there. I also run a few opnsenses that have one vnic per portgroup for different networks. It also doesn't happen there.

So, if anybody has a setup that uses portgroups with VLANs...
Hardware:
DEC740

Quote from: Monviech on September 28, 2023, 08:01:40 AM
Did you configure a vswitch, and added a port group with the VLAN ID of 4095? And connected this Portgroup with 4095 as E1000 or vmxnet3 nic to the Opnsense so its a trunk port (accepting all vlan tags)?
And then created port groups on that vswitch with all additional vlan IDs 100-190... you have and connected those to your VMs?

I only configure Opnsenses on ESXi with PCIe passthrough when they need VLANs, I tested it and they don't have problems with states as described above. I can't replicate it there. I also run a few opnsenses that have one vnic per portgroup for different networks. It also doesn't happen there.

So, if anybody has a setup that uses portgroups with VLANs...

Did you configure a vswitch, and added a port group with the VLAN ID of 4095?
Yes.
And connected this Portgroup with 4095 as E1000 or vmxnet3 nic to the Opnsense so its a trunk port (accepting all vlan tags)?
Yes.
And then created port groups on that vswitch with all additional vlan IDs 100-190... you have and connected those to your VMs?
Yes.

=======================
However, as some people raised concern about the setup being virtual and possibly causing the issue I dug up a  mini PC did fresh baremetal opnsense install and run the tests, to my suprise i got the same results.

So virtual or baremetal does not make a difference, as soon as you start pinging from vlan190 to vlan160, vlan160 can start to ping vlan190 back.

Just so it does not get lost in the conversation, vlan160 only gets response to pings if i start pinging from vlan190. If i start the ping on vlan160 the pings does not work, It only get response when i start the ping from vlan190.

Therefore anyone wants to test this scenario suggest you clear the firewall states or better reboot if possible and start pinging from the vlan that has NO config and watch the result give it 10-15 seconds and only than start pinging from the VLAN has the permit statement.

as for your tests, i am suprised in a sense that you did not get the same results. In your setup do both vlans have rules on them configured? perhaps this behaviour only happens when there is no config on one of the vlans?

Just for information purpose:

Iam using proxmox and multiple VLAN interfaces configured like this:

LAGG Interface0 -> VLAN0 -> Bridge NOT vLan aware
To
LAGG Interface0 -> VLANx -> Bridge NOT vLan aware

Single Interface -> Bridge VLAN Aware (iam thinking to change that)


And for OPNsense I added all VLAN bridges to the VM and do not make any VLAN inside opnsense (except pppoe/wan which I might to change today)


With this setup, i had another issue (multiple RAs reaching a windows client, due to Realtek driver behavior in windows*) but I can't confirm the issue the OP has.


* configured a VLAN ID in the driver but because the port was a trunk port with a default VLAN, some packages from the not configured default VLAN where still reaching my PC. The same was not visible under macOS or Linux ...)

Here's my final test with all tcpdumps and all pf rules attached:

Opnsense:
hn5 10.16.1.254/24 - PF rules: none, only the "Automatically generated rules"
hn7 10.0.0.254/24 - PF rules: @418 pass in quick on hn7 inet proto icmp all keep state label "420257620b64c28d62b138a1f0bb8329"

VM1:
Ubuntu 20.04 LTS - 10.0.0.203

VM2:
Windows 10 - 10.16.1.254

Packet Captures:

Here you can see that the Windows VM2 already sends pings to the Ubuntu VM1 since a while (id 1, seq 256...  id 1, seq 257) but there is no ICMP echo reply with the same ID.

Then I start a ping from the Ubuntu VM1 to the Windows VM2, which works (ICMP echo request, id 13...  ICMP echo reply id 13)

But, there are still no ICMP echo replys with the id 1 after the ICMP echo request and echo reply with id 13 start.

Opnsense hn5:
root@opn01:~ # tcpdump -i hn5 proto ICMP and host 10.16.1.101 or proto ICMP and host 10.0.0.203 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on hn5, link-type EN10MB (Ethernet), capture size 262144 bytes
09:19:51.937928 IP 10.16.1.101 > 10.0.0.203: ICMP echo request, id 1, seq 256, length 40
09:19:56.931815 IP 10.16.1.101 > 10.0.0.203: ICMP echo request, id 1, seq 257, length 40
09:19:57.017238 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 1, length 64
09:19:57.017385 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 1, length 64
09:19:57.017807 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 1, length 64
09:19:58.018376 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 2, length 64
09:19:58.018559 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 2, length 64
09:19:58.019178 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 2, length 64


OPNsense hn7:
root@opn01:~ # tcpdump -i hn7 proto ICMP and host 10.16.1.101 or proto ICMP and host 10.0.0.203
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on hn7, link-type EN10MB (Ethernet), capture size 262144 bytes
09:19:57.017180 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 1, length 64
09:19:57.017841 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 1, length 64
09:19:57.017927 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 1, length 64
09:19:58.018352 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 2, length 64
09:19:58.019209 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 2, length 64
09:19:58.019321 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 2, length 64


VM1:
administrator@vm1:~$ sudo tcpdump -i any proto ICMP and host 10.16.1.101 or proto ICMP and host 10.0.0.203 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
09:19:57.044829 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 1, length 64
09:19:57.045816 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 1, length 64
09:19:58.045987 IP 10.0.0.203 > 10.16.1.101: ICMP echo request, id 13, seq 2, length 64
09:19:58.047248 IP 10.16.1.101 > 10.0.0.203: ICMP echo reply, id 13, seq 2, length 64


You can see at every step that only ICMP echo request id 13 and ICMP echo reply id 13 comes through, and ICMP echo request id 1 gets filtered out by the firewall, even if states are established.

I can't replicate the behavior. Maybe somebody else can? I can't spend more time on this.
Hardware:
DEC740


This is not an opnsense issue.

It is an issue with the external network misconfigured on either the external switch or on his virtualization.

Would probably be better if the OP went to a support group on how to setup vlans on his smart switch as he said he saw it on his bare metal install.




Quote from: newjohn on September 28, 2023, 12:55:44 AM
Whats is your take on this please?
Please see the attached screenshot.

I explained my take extensivily in previous post:

- Go back to the drawing board
- Simplify your setup
- From here proof your initial statement that "Automatically generated rules" allows traffic that isn't expected to be allowed (ICMP or whatever.)

So a default install, 1 WAN, 1 LAN and 1 OPT interface, no VLAN's, no manual config except for 3 interface IP configs. If in this setup you can proof your statement many people are willing to look into your issue. Should take you less time than posting screenshoits of state reset buttons.

For me, I'm too old to look into virtualised infra's with VLAN trunk ports without a detailed low level design and not knowing if basic network skills are in place.

Below is what I tested so far and to be honest I am concerned with the results.

I realise some users spent some valuable time on this and I appriciate it, however i think for the benefit of everyone who is using opnsense I believe we need to get to the bottom of this.

If my tests are not flowed somehow, and i dont see how thats posisble as its simple ping test, when all baremetals now, than we are looking at a bug or a bigger issue.

I know some will dismiss this without even looking into it, but I am concerned. Something is not right with opnsense.

To remove any possibility that this is vmware issue I used two win11 baremetal PCs for testing and also moved the opnsense to baremetal pc.

The current test setup:
PC-S1 - Source PC - win 11   (baremetal) -  VLAN140
PC-S2 - Source PC - Ubuntu  (VM) -             VLAN210
PC-D1 - Destination PC win 11(baremetal) -  VLAN190
Opnsense (baremetal)

Test results:

VLAN140 disabled all the rules. (NO rules on floating or on vlan140 - all disabled)
Test results -
Started pinging from - PC-S1 - Ping did NOT work -however as soon as i started pinging back form PC-D1 both source and destination can ping each other.

Than i though maybe this is a windows issue, so i tested with an ubuntu (vm) and the same results.

following that, i thought maybe because there are some config on VLAN140 causing the issue even though its disabled. So moved the test to VLAN210 which has no config at all. And again the resutls are the same.

Some of you requested more basic testing, but i already (I know not the brightest move) switched to opnsense baremetal thinking this is happening due to the FW virtual issue and i wont get the same problem on baremetal, so its not easy to remove all the config again as the household will probably not be happy. Therefore i did the testing while also trying to keep the home internet working.






Quote from: CJ on September 30, 2023, 02:32:46 PM
Are you still using LAN as the parent of all of the VLANs?

Yes.

However, I come across this question couple of times now. Is that not how we supposed to do it?