Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - zemsten

#1
This is a novel idea! I just got it setup and it hasn't broken anything, so I'll rock it for a while and see what happens. Thanks much, I appreciate all you do around here!
#2
Sorry, I definitely should have mentioned that in my initial post as well. I am using the kmod implementation. That slipped my mind as I've been using it basically the entire time I've been using wireguard.  8)
#3
I'm having a bit of trouble setting up two wireguard client connections, with two different WAN interfaces.

I have WAN1 and WAN2, two independent connections to the internet. WAN2 generally has higher bandwidth and is the preferred connection in my gateway group for WAN_FAILOVER.

I have two wireguard clients configured. WG_WAN1 and WG_WAN2. These connect to two separate endpoints. I want WG_WAN1 to only connect via WAN1 and WG_WAN2 to only connect via WAN2. So far I've achieved this by adding static routes to their endpoint IPs, defining which interface I want to route the traffic on.

Now normally this works great and everything functions as expected. The trouble I run into is when WAN2 goes down for any appreciable time and things failover to WAN1. Initially I see WG_WAN2 go down as expected, but if WAN2 stays down for a while, eventually WG_WAN2 will come back up, routed through WAN1. This is the part that I do not want to happen.

I do have default gateway switching turned on in the firewall, as I want traffic originated from it to handle a single WAN failure (for DNS). Everything else is policy routed through my gateway groups and works great. I believe that a static route should have precedence over discovered routes, but I may be wrong there.

I should also add that I'm running these wireguard clients with their own assigned interfaces, if that wasn't obvious from context.

Am i missing a crucial element in how to bind a WG client to a particular WAN interface in a failover setup?

#4
Well, no progress on my front. My ISP is a MVNO, so it's also possible that some traffic is getting blocked/filtered, although it seems unlikely based on previous success.

I need stability more than I need wireguard, so unfortunately I just decided to ditch the setup in lieu of OpenVPN...

Really hope this piques someone's interest though in the search for a similar problem. I'd love to know the solution.
#5
Quote from: zemsten on May 29, 2021, 04:09:28 AM
I do, it was 30 seconds. I tried changing to 5, and I see the same thing unfortunately.

I'll also mention that at this point I have unset the keepalive, and it still dies.

I'm back on the kernel driver, as I saw mention somewhere of setting a kernel tunable for debugging, net.wg.debug, but I don't seem to have that sysctl OID, even after verifying the kernel has loaded if_wg.ko.

This problem is driving me nuts. Nothing gets logged. I'm on the verge of switching back to openvpn, which would really be a blow to overhead.
#6
I do, it was 30 seconds. I tried changing to 5, and I see the same thing unfortunately.
#7
Alright, let me preface this with the fact that I've had a wireguard tunnel up and working to NAT my traffic outbound for a long while now. All of it sudden it has become very inconsistent with no observable changes on my end. I'm on the latest version of OPNSense.

I have a fairly complex network by homelab standards. There are several VLANs on my parent LAN interface, as well as a few separate hardware interfaces, all talking to an old protectli vault running OPNSense. My WAN is a single interface feeding upstream to a separate cellular modem running OpenWRT. Unfortunately there's an extra NAT layer due to that, but that's a different discussion...
I have a wireguard client setup with Mullvad. The config has always worked great. The only thing I changed after setting it up initially was disabling route pulls and configuring a manual gateway, so it wouldn't shove itself in the firewall's routing table as a default route.
I have policy routes setup so that most of my traffic ends up NATing out the wireguard tunnel, with the exception of one entire VLAN, and a couple select hosts elsewhere that just NAT out the modem's gateway. Notably, the firewall itself does not NAT out the wireguard tunnel.

This has always worked until recently.

I'm now having problems where the wireguard gateway will jump to 100% packet loss seemingly randomly. The tunnel dies, and so does the connectivity for those policy routed hosts. If I deactivate and reactivate wireguard, either in the web GUI or with a `wg-quick down wg0 && wg-quick up wg0`, it never gets another handshake. I have to reboot the firewall entirely to get wireguard to come back. I have my endpoint config pointing to an IP address rather than a hostname, so I know it's not DNS. I can still talk to the internet and even that wireguard endpoint specifically from the firewall's CLI, so I know reaching it isn't a problem. Why does wireguard just die though?

A bit more background, although I don't believe it's too important. I did switch to the wireguard kernel module when it became available. I know that it was for testing and potentially unstable. It worked great for weeks, maybe even over a month. When these failures started happening, I uninstalled the kernel module and went back to wireguard-go, just to be in a "fully supported" state so nothing could be blamed on beta testing. The exact same thing happens.

I've gone through what feels like umpteen logs, and I cannot find anything related to the cause. I see the events logged in various places when the wireguard tunnel goes down, but I don't see anything right before that is causing it.

Now I am here, frustrated and desparately asking for the help of this community, as this is almost completely breaking the usability of my network at random intervals.
#8
20.7 Legacy Series / Re: 20.7.4 and netmap -- em drivers
November 04, 2020, 05:03:29 PM
Well, I did the testing and nothing seems to have changed unfortunately.

Here's the pertinent excerpt from my loader.conf, for a snapshot of my tuneables.

hw.ixl.enable_head_writeback="0"
net.enc.in.ipsec_bpf_mask="2"
net.enc.in.ipsec_filter_mask="2"
net.enc.out.ipsec_bpf_mask="1"
net.enc.out.ipsec_filter_mask="1"
net.inet.icmp.reply_from_interface="1"
net.local.dgram.maxdgram="8192"
vfs.read_max="128"
net.inet.ip.portrange.first="1024"
net.inet.tcp.blackhole="2"
net.inet.udp.blackhole="1"
net.inet.ip.random_id="1"
net.inet.ip.sourceroute="0"
net.inet.ip.accept_sourceroute="0"
net.inet.icmp.log_redirect="0"
net.inet.tcp.drop_synfin="1"
net.inet6.ip6.redirect="1"
net.inet6.ip6.use_tempaddr="0"
net.inet6.ip6.prefer_tempaddr="0"
net.inet.tcp.syncookies="1"
net.inet.tcp.recvspace="65536"
net.inet.tcp.sendspace="65536"
net.inet.tcp.delayed_ack="0"
net.inet.udp.maxdgram="57344"
net.link.bridge.pfil_onlyip="0"
net.link.bridge.pfil_local_phys="0"
net.link.bridge.pfil_member="1"
net.link.bridge.pfil_bridge="0"
net.link.tap.user_open="1"
kern.randompid="347"
net.inet.ip.intr_queue_maxlen="1000"
hw.syscons.kbd_reboot="0"
net.inet.tcp.log_debug="0"
net.inet.icmp.icmplim="0"
net.inet.tcp.tso="0"
net.inet.udp.checksum="1"
kern.ipc.maxsockbuf="4262144"
vm.pmap.pti="0"
hw.ibrs_disable="0"
security.bsd.see_other_gids="0"
security.bsd.see_other_uids="0"
net.inet.ip.redirect="0"
net.inet.icmp.drop_redirect="1"
net.inet.tcp.hostcache.cachelimit="0"
net.inet.tcp.soreceive_stream="1"
net.isr.maxthreads="-1"
net.isr.bindthreads="1"
net.pf.source_nodes_hashsize="1048576"
cc_cubic_load="YES"
net.inet.tcp.cc.algorithm="cubic"
net.link.ifqmaxlen="512"
net.inet.tcp.recvbuf_inc="65536"
net.inet.tcp.recvbuf_max="4194304"
net.inet.tcp.sendbuf_inc="65536"
net.inet.tcp.sendbuf_max="4194304"
net.inet.tcp.mssdflt="1460"
net.inet.tcp.minmss="536"
net.inet.tcp.abc_l_var="44"
net.inet.tcp.initcwnd_segments="44"
net.inet.tcp.rfc6675_pipe="1"
dev.em.0.fc="0"
dev.em.1.fc="0"
dev.em.2.fc="0"
dev.em.3.fc="0"
net.bpf.zerocopy_enable="1"


As I mentioned before, everything was working as expected before, so I'm still not sure what changed....
#9
20.7 Legacy Series / Re: 20.7.4 and netmap -- em drivers
November 02, 2020, 09:27:32 PM
I have some more testing to do when I get home. I added a bunch of sysctl kernel parameters yesterday just working through optimization for my long fat pipe (4G internet). I want to make sure that I'm causing any more issues before I submit any data.

Looking at the forum today though, it seems there are other issues with IPS floating around, likely unrelated but I'll check into those too.
#10
Correct. I disabled all offloading when I first setup IPS with Suricata on 20.7.3-netmap. I just verified that those settings are the same as well.
#11
Quote from: mb on October 26, 2020, 11:22:20 PM
Hi @zemsten, I'm not quite sure if I understand completely. Do you still have the problem with Suricata in IPS mode?

Yes, I am still running into the same issue as described in my first post. Right now I just have Suricata in IDS mode on my VPN_WAN and WAN interfaces, simply monitoring traffic rather than intercepting it. As soon as I turn on IPS mode, I have huge packet loss and the VPN gateway goes down all together. I didn't have this problem with the 20.7.3-netmap kernel.
#12
Logging is working fine. Don't know what the hiccup was before. That's fine though. I can successfully run Suricata in IDS mode, so it very much feels like a driver problem to me.
#13
Excellent, glad we're on the same page about the netmap stuff!

Nothing weird in the suricata logs, which is the first thing I checked. It even starts up with the netmap:em1^ prefix, just like it did on the 20.7.3-netmap kernel. Everything appears totally normal after that. The next log is when I shut it down. I didn't check the system.log, however when I just went to do so, the last entry I have is from Oct 15 which begs another question.... why don't I have current logging happening...? I do ship my logs off to an external syslog server, parsing them with graylog, but I don't see them there either. I'm getting current logs from the components, dhcpd, ntpdate, filterlog mostly. That's what I expect. Hmm. I did go and reset log files via the web GUI and it generated a new one alright and I see the dhcp daemon reload but that's it so far.
#14
20.7 Legacy Series / 20.7.4 and netmap -- em drivers
October 23, 2020, 05:15:29 PM
Update was successful but left me with some questions about the netmap features. I was using the 20.7.3-netmap kernel in the last release and it was working excellently. I have a 4-port NIC running the em driver. With the netmap kernel I had suricata running in IPS mode on my WAN and VPN client interfaces, no problems. I also had Sensei running on my parent LAN for a few VLANs in native netmap mode, no problems.

Maybe I'm interpretting your release notes wrong, but it sounded like the netmap kernel was rolled into the default for 20.7.4 and what I'm experiencing leads me to believe it's not. I've got Sensei working okay, although I can't comment on consistency yet. Whenever I enable Suricata though, it starts dropping packets like mad. I looked for a 20.7.4-netmap but don't see it. Am I missing something?