Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - surfrock66

#1
I've been on a network learning journey and have built a network with an L3 switch and multiple vlans in my house.  Opnsense is acting as the firewall for WAN connection.  The architecture is, I have a 99 VLAN for network devices, then I have 7 VLANS in the house, all their gateways are my L3 switch, and all the VLANS with external access go through opnsense, which only has a LAN interface on my 99 (networking) VLAN, a WAN interface, and a wg0 interface.

Primarily, I tried to adapt this guide, though my separate L3 setup I think deviates heavily from this: https://www.zenarmor.com/docs/network-security-tutorials/how-to-setup-wireguard-on-opnsense

I want to create a WireGuard VPN into the house (I had one working in a prior iteration of my network and have restarted, wiping that out).  I'm having difficulty wrapping my head around the architecture of this.  Ultimately, Wireguard clients would come in, I assume, on their own VLAN/subnet (I'm designating this 6).  My opnsense box is connected to my L3 switch with a 2-port LACP trunk currently carrying VLANS 6 and 99. 

My L3 switch (a brocade icx-6610) has a virtual interface on VLAN6, as at one point I assumed this would be my gateway device but maybe that is not necessary?  Also, I'm assuming the wireguard network does NOT need DHCP, but it will need DNS (as I have both internal DNS resolution, and then upstream to a family-filter DNS provider for the kids) which is already on my LAN and easy enough to configure.  Internal communication would require wireguard clients to go through the L3 switch then to their destination, and my assumption is WAN traffic would go directly back out the WAN interface of opnsense (after LAN dns resolution).

Everything I've done to try to make this work has been unsuccessful, so I'm willing to start this part of the system from scratch.  I've set up the wireguard instance, I have a tunnel address, and my endpoints can actually successfully connect.  I have a successful handshake from my phone from the WAN, and I can see it in the Wireguard status.  Everything past this is lost, and I think it's because I'm so turned around in my routes/rules that I need to just reconsider that part of this from whole cloth.

Per the above guide, I have a firewall rule passing all traffic from the wg0 interface net to any destination.

My connected client can ping 8.8.8.8., can ping the opnsense box at the wg0 ip address, but CANNOT ping my LAN DNS server or any other LAN resources, so at this point it appears no routing is passed between the LAN VLANS whatsoever. 

My instinct is that I need a second interface on the 6 VLAN that defines connects back to the L3 switch?  At one point I had added a gateway called "LAN_GW_VLAN_6" on the wireguard interface but that broke things in a way that confused me and I just disabled it.

Any advice on what the interfaces, gateways, and routing/firewall rules would need to look like would be appreciated.  High level is ok, as I'm very much learning.
#2
The bigger issue is just that she does her tele-medicine at times when I usually want to tinker, so getting a time where I'm home and available and she's not working is rough :/
#3
Ok, good news, I re-imaged and after about an hour of tinkering it's working.  (My wife is a doctor who does tele-medicine from home so it was tricky to get a downtime, even riskier if I couldn't get back to working; usually she works when kids are in bed and that usually my window for these kind of projects).  I still have my old config backup; I have a lot of firewall rules and services to put back in (I had redirects for google trying to reach their dns from chromecasts to my pihole, I had a zabbix client pointing to my zabbix server, I had wireguard working and want to see if I can restore existing key exchanges, it was tied to my LDAP server, etc).  I really want to compare my old backup with a new one when this is done and see if I can't figure out what was broken.  I want to document that because I found a bunch of people with similar questions that only had incomplete answers: 

1) From the CLI, the WAN interface was DHCP, I set up the lagg between my 2 ports (lagg0), created a vlan 99 interface off of it (lagg0_vlan99) and made that the LAN interface with a static IP and no gateway.
2) I made a gateway for my 10.99.1.254 LAN gateway, had to assign it to the LAN interface when I made it.  It is not tagged as upstream.  One thing I noticed, WAN_GW is priority 255; it was 254 before.  Just a difference I noticed.
3) I made an alias for each of my VLANS that might need internet access
4) In Outbound NAT, I switched it to Hybrid and made rules to allow traffic through to each VLAN.
5) Under Firewall->Rules->LAN I created a pass rule for each VLAN (This will get tuned later)

With this, LAN clients access the WAN, after putting in a port forward WAN clients can access things on the LAN, the firewall can ping both LAN and WAN.
#4
I have the installer to re-install tonight after the family goes to bed, but I had a minute to try a configuration.  I have a static route for 10.0.0.0/8, it has a gateway (because there's no way to create a static route without one unless I'm missing something).  The LAN interface has no gateway attached, it's set to auto-detect (the only 2 options are auto-detect and the LAN_GW).  The only active gateways are LAN_GW, WAN_GW (set as upstream) and WAN_GWv6.  I set that, as in the screenshots, and I get the following.  Can ping WAN from opnsense, can't ping LAN from opnsense.  Exit, can ping LAN from LAN, can't ping WAN from LAN.  I go back and tag the LAN_GW as upsteam, internet comes back on.

surfrock66@sr66-opnsense-1:~ $ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=115 time=28.971 ms
^C
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 28.971/28.971/28.971/0.000 ms
surfrock66@sr66-opnsense-1:~ $ ping -S 10.99.1.40 10.2.2.213
PING 10.2.2.213 (10.2.2.213) from 10.99.1.40: 56 data bytes

^C
--- 10.2.2.213 ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss
surfrock66@sr66-opnsense-1:~ $ exit
Connection to 10.99.1.40 closed.
surfrock66@sr66-thelio:~/.scripts$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.


^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2055ms

surfrock66@sr66-thelio:~/.scripts$ ping 10.2.2.213
PING 10.2.2.213 (10.2.2.213) 56(84) bytes of data.
64 bytes from 10.2.2.213: icmp_seq=1 ttl=63 time=0.462 ms
64 bytes from 10.2.2.213: icmp_seq=2 ttl=63 time=0.339 ms
^C
--- 10.2.2.213 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1020ms
rtt min/avg/max/mdev = 0.339/0.400/0.462/0.061 ms


I worry that if I start from scratch, I am gonna end up in the exact same place since I got here and don't understand how.
#5
I might be a bit missing something here, but don't I need to have a gateway defined in order to assign a static route to it?  I have the LAN_GW in there, and then in static routes, when I define it, it points to the subnet 10.0.0.0/8 and then I select LAN_GW from the drop down list, right?

So, if I take "upstream gateway" off the LAN_GW, that's fine, my LAN loses connection to the internet.  What then confuses me is if I go to the LAN interface and scroll to the bottom, I have a static IPv4, then below that a drop down to choose an ipv4 default gateway.  It has "LAN_GW" selected, and the only other option is "Auto-Detect."  If I try to choose "Auto-Detect" I get an error saying something like it "conflicts with a static route."  I think I have the order I did that correct, it was from memory last night so I don't have the exact orange/red error popup and I'm going to try again tonight.

It seems I can't disable the LAN_GW without losing the static route, and I can't detach it from the interface?  I'm leaning to starting from scratch but I want to iterate until I can get a downtime in case it's still solve-able.  If I'm misunderstanding something though I am totally open to that.
#6
Yes but when I disabled that, I lost all WAN access from LAN clients, so I re-enabled it; sorry the order of things is a bit sloppy as I have to rush and minimize the downtime for now.
#7
I have a diagram here, and can provide more detail as needed.  At the bottom you will see why I have /16; in truth, it's from back when I only had a single subnet, and I made it /16 so I could use the third octet to form DHCP scopes.  That's how the network worked in my head and I knew the IP scheme, so when it came time to add VLANS much later, I just made those the 2nd octet, and that's how we are here today.  Maybe one day I'll re-do that, but it's not in scope right now:

https://nextcloud.surfrock66.com/s/txnZdzxHaiA5t65

I'm trying to get a time the family will tolerate an extended outage; I have backups but these things go however they go.  The one big thing worrying me is, I did have a working wireguard setup before, and I'd love to preserve that (all my key pairs) and my port forward rules (I have a lot of weird rules set up).  I don't see a path to wiping this and starting over that doesn't involve doing all that from scratch, huh.
#8
I wasn't able to do a full cleanroom test due to family needing internet and me not being able to take a downtime, however I had some time for a quick round of tests and think I have some interesting information.

I have the static route in, so I untagged the "LAN_GW" as an upstream gateway, and tagged "WAN_GW" as an upstream gateway.  No change in the ability for opnsense to ping anything (it can ping WAN, not LAN), however all my LAN clients lost internet.  In this state, from opnsense, I ran a "ping -S 10.99.1.40 10.2.2.213" (that's my DNS server).  This failed, but interestingly enough I was looking at the live logs, and even though the interface is LAN, the source IP was the WAN IP.  I'm very confused; I've confirmed the LAN and WAN interfaces are correct and they have correctly assigned default gateways.  See the attached picture.

This would make sense; is opnsense doing something to switch the LAN and WAN somehow?  I'm blown away how this is the case; that being said, it makes sense that tagging the LAN interface as upstream allows traffic out.
#9
Ok great, I'll try this when I get home tonight.

On the trunk going to the opnsense box from the L3 switch, I just have 99 (network vlan) and 6 (doing experiments with wireguard).  All the other real vlans (2, 3, 4, 5, 7, 10, etc) are NOT on that trunk, and must go through the L3 device.
#10
Just a couple of questions, for my understanding.

1) You said the interface should be the LACP Trunk; I had made a vlan interface off of that.  Should the LAN be the LACP LAGG (lagg0) or the vlan interface (lagg0_vlan99).  I had put the latter, just confirming.

2) When the CLI asks if it needs a gateway when defining the LAN IP, it says something like "probably yes for WAN, probably no for LAN" but in my case since the LAN requires a gateway, I put yes and put in the 10.99.1.254 address.  If I don't do that, I can't get to the web interface after setting it up.  That seems to check the "upstream gateway" box for that defined gateway, hence my confusion over that setting.
#11
I've been going through a network transition as part of a learning journey and am having an issue I can't seem to solve.  High level, I have a 10.*.*.* network with a bunch of /16 VLANS and I just put in a new Layer3 switch that acts as the gateway for each VLAN.  The /16 is a legacy thing from a previous configuration, and 10.*.1.254 is the gateway on each VLAN.  The L3 switch has a default 0.0.0.0/0 route pointing to the opnsense box, which is 10.99.1.40.  99 is my networking device vlan.  Opnsense is 24.1.2 running on a standalone box with 4 NICS, one going to my comcast gateway and 2 others are a LACP LAGG to the L3 switch (a trunk carrying VLANS 99 and 6, 6 being my wireguard network which is not currently set up).  I have a DHCP and DNS server on the LAN, on the 2 VLAN, and there is an IP helper on each vlan for it.  Everything there is working fine.

Each of my other vlans has been defined as an alias in opnsense, and I have a NAT rule permitting traffic.  At this time, all clients on the LAN have internet access, and from the WAN my port forward rules are working.  Almost everything appears to be working.

...With the exception of the firewall itself.  It can ping the WAN, but cannot ping anything on the LAN on any VLAN (including the 99 which is the VLAN it's on, or other VLANS).  Actually, I can ssh into the box from a client on the 4 vlan, get in fine, then can't ping back to the client I'm connected from.  One additional thing, when I assign IP addresses I have to set a default gateway for the LAN network and tag it as an upstream gateway...this didn't make sense to me, but if I didn't do that all LAN clients lose internet access.  That LAN_GW gateway is 255 priority but is tagged as upstream, where the WAN_GW is priority 254.  I was thinking it was a static route thing, so I defined static routes for all my VLANS to go through the LAN_GW gateway but that didn't change anything.

I've changed so many things and done so many experiments that I'm a bit lost, and am looking for some guidance of what the gateways, static routes, and rules SHOULD be configured like in a configuration like mine.  If opnsense were doing the L3 routing, I think I'd have to add all vlans to the trunk and make a vlan interface on each, but I don't think that's the case here?

I am very much learning right now, but I have this sense that the firewall is not seeing my LAN networks as LAN, and is routing connections to the WAN interface.  I've tried traceroute to the LAN and it times out.  I've tried "ping -S 10.99.1.40 10.2.2.213" and it times out.  The firewall rules are mostly default, save for some things I had to do to get my chromecasts to point to pihole.
#12
2 more pics of configs
#13
I've provided some more screenshots to help with the config; I'm not sure if it's helpful or not.  My L3 gateway is the router and it's default route points to the opnsense firewall, which has 1 interface internal and 1 to comcast, and I'm sure I messed something up.
#14
The good news is that I didn't have a NAT rule in; I changed it to hybrid and added an outbound manual rule on the LAN interface (from information in this thread https://forum.opnsense.org/index.php?topic=18889.0 ).  Now I can access the internet, which is great!

Several things are NOT working though.

1) The firewall itself is not accessing the WAN, for example, I cannot check for updates.  I do have an internal DNS and DHCP server which are working and Opnsense is using this DNS server for resolution.
2) My port forward rules aren't working at all, from the outside I appear to be unable to get to anything on the inside.
3) I'm still on the non-lacp interface; I was never able to figure out why vlan traffic wasn't passing through the LACP interface.

I think all three of those are solvable though; any advice is appreciated but it's less urgent as the family has internet at the moment.
#15
Ok, for now I took LACP out of the equation and it appears to be working, I have an interface on an unused port directly between my opnsense firewall and my L3 switch and all is well to get to the firewall on the LAN via vlan 99.

That being said, I have one final issue...I'm not passing traffic to WAN, and I think I've been looking at it too long to see the issue.  I have 2 interfaces with 2 gateways; the WAN interface has a gateway from DHCP from comcast and is getting auto-created with weight 254, and my LAN interface gets auto-detected with a gateway on the L3 switch on the 99 VLAN (as would be expected) but with weight 255, and is being tagged as (active) in the interface.

I haven't messed with routing rules on the LAN, but it appears I'm getting into a routing loop:

domainname@prefix-thelio:~/.scripts$ ping 10.99.1.40
PING 10.99.1.40 (10.99.1.40) 56(84) bytes of data.
64 bytes from 10.99.1.40: icmp_seq=1 ttl=63 time=0.166 ms
64 bytes from 10.99.1.40: icmp_seq=2 ttl=63 time=0.103 ms
^C
--- 10.99.1.40 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1005ms
rtt min/avg/max/mdev = 0.103/0.134/0.166/0.031 ms
domainname@prefix-thelio:~/.scripts$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 10.99.1.40 icmp_seq=1 Time to live exceeded
From 10.99.1.40 icmp_seq=2 Time to live exceeded
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1000ms

domainname@prefix-thelio:~/.scripts$ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1  prefix-prosafe-00.subdomain.domainname.com (10.4.1.254)  0.385 ms  1.415 ms  0.664 ms
2  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  0.302 ms  0.285 ms  0.268 ms
3  10.99.1.254 (10.99.1.254)  0.754 ms  0.914 ms  1.078 ms
4  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.250 ms  1.231 ms  1.210 ms
5  * * *
6  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.126 ms  0.324 ms  0.424 ms
7  * * 10.99.1.254 (10.99.1.254)  0.543 ms
8  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  0.433 ms  0.470 ms  0.458 ms
9  10.99.1.254 (10.99.1.254)  0.692 ms  0.831 ms *
10  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  0.560 ms  0.596 ms  0.714 ms
11  * * *
12  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  0.814 ms  0.645 ms  0.778 ms
13  * * *
14  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  0.893 ms  0.879 ms  0.819 ms
15  * * *
16  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.104 ms  0.966 ms  0.994 ms
17  * 10.99.1.254 (10.99.1.254)  0.995 ms  1.168 ms
18  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  0.812 ms  0.846 ms  0.874 ms
19  10.99.1.254 (10.99.1.254)  1.256 ms * *
20  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  0.988 ms  0.867 ms  0.918 ms
21  * * *
22  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.250 ms  1.360 ms  1.320 ms
23  * * *
24  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.222 ms  1.842 ms  1.292 ms
25  * * *
26  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.430 ms  1.460 ms  2.181 ms
27  * * *
28  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.817 ms  1.835 ms  1.717 ms
29  * * *
30  prefix-opnsense-1.subdomain.domainname.com (10.99.1.40)  1.827 ms  1.811 ms  1.862 ms


I've been looking at this for so long I'm not seeing what to do.  In System -> Routes -> Status the default route is my LAN gateway on the 99 network, so it makes sense that the L3 switch sends traffic to the firewall, then that is sending traffic to the L3 switch, and it's a loop. 

In the Firewall rules, my WAN just has port forward rules which used to work, and the LAN just has the rules in the attached photos (and some chromecast internal dns rules which aren't in play)

I'm stuck and I've been looking at this for long enough I'm not seeing clearly, any advice is appreciated.