Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - talopensense

#1
Quote from: franco on August 20, 2021, 01:38:30 PM
I'm assuming wildcard address port 123. So

# pgrep ntpd

yields a PID, but

# cat /var/run/ntpd.pid

is not the same?


Cheers,
Franco

This is correct and I am reporting the exact same behaviour since I am on the 21.7.1 as well and noticed that issue recently.


root@opn1:~ # pgrep ntpd
70102
root@opn1:~ # cat /var/run/ntpd.pid
64059
root@opn1:


I killed ntpd and removed the ntpd.pid file in /var/run folder which let the ntp service restart from the UI.

Is there any indication of what could have caused that issue?
#2
Adding some more contents, hopefully from the closest to FreeBSD it can be:
https://lists.freebsd.org/pipermail/freebsd-hackers/2021-March/057127.html

Quote


Dear FreeBSD Community,

In light of the recent commentary on FreeBSD's development practices,
members of the Core team would like to issue the following statement.

Code quality is an essential FreeBSD value: From the 1980s when work on
BSD became the de facto standard TCP/IP stack, to our more recent work
around performance scalability on multicore, attention to detail is
critical. The recent concerns regarding the WireGuard patches remind us
that our development processes must always continue to mature. While the
project has historically, and aggressively, led the way in adopting new
development methodologies - from public version control to being early
adopters of static analysis tools such as Coverity - these events have
brought to light a real gap that needs to be addressed.

The high stability and quality of FreeBSD is a testimony to the
experience of our developers. As in any open source project, we rely on
developers to exercise good judgement in seeking review and committing
new features, and to follow the guidelines laid out in the Committer's
Guide. We make heavy use of public code review, and FreeBSD developers
spend a significant amount of time improving each others' contributions.

We were excited to provide a kernel WireGuard implementation in FreeBSD
13.0. Before the if_wg(4) rewrite was committed, several FreeBSD
developers proactively worked on fixing bugs and writing tests and
documentation for the original implementation. In other words, we had
spent time during the release's Q/A period looking for problems, and
that unfortunately culminated in if_wg(4) being removed from 13.0 during
the release cycle. As FreeBSD developers, it is incumbent on each of us
to support each other's work by providing code review and helping test
and fix the code. This incident highlights the need to do this work more
proactively, and to maintain a robust, multi-layered development process
that can catch problems as they fall through the cracks.

Over the next month the FreeBSD Core Team will lead a discussion on
appropriate pre-commit testing, static analysis, code review, and
integration policies to avoid a repeat of this situation and to continue
improving FreeBSD's code quality. We know there will be challenges in
key areas, such as third-party device drivers, and components of the
system where fewer developers have sufficient expertise. The FreeBSD
Foundation has full-time staff members participating in significant code
review today, and is committed to supporting the needs identified by the
Core team and the developer community for this effort.

We look forward to input from the community on our proposals for updated
policies as we move forward, maintaining high code quality as a core
value for FreeBSD.

Thanks,
-Mark, with core@ hat on

#3
Hi all,

I have wireguard-go implemented in multiple OPNsense instances running 21.1.3 and 21.1.3_3.
When there is 0 packet loss, there is no issue. When small packet loss is seen, it seems to affect WG stability exponentially.
Is there a way to understand why my wireguard connections show 50% packet loss when a ping to the endpoint destination of that tunnel has less than 2%?
The problem I experience is that the forwarding literally stops on WG tunnels for multiple seconds if not minutes with no clear reason when the ping to the endpoint keeps going.
I am not too sure where to look for more details.

I have checked the firewall logs. I see that common message each time I see WG tunnels not passing any more traffic:

pflog0: promiscuous mode enabled
pflog0: promiscuous mode disabled


I have checked the interfaces counters for any major errors but everything is clear (i.e no errors):

Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
igb0   1500 <Link#1>      00:90:0b:44:6d:02  2062082     0     0   638152     0     0
igb0      - fe80::%igb0/6 fe80::290:bff:fe4        0     -     -        1     -     -
igb0      - 174.112.148.0 174.112.148.15        1692     -     -     5616     -     -
igb1   1500 <Link#2>      00:90:0b:44:6d:03   654939     0     0  1635305     0     0
igb1      - 172.27.0.0/22 172.27.0.252         28938     -     -    35430     -     -
igb1      - fe80::%igb1/6 fe80::290:bff:fe4      110     -     -      112     -     -
igb1      - 172.27.0.0/22 172.27.0.254           316     -     -        0     -     -
igb2   1500 <Link#3>      00:90:0b:44:6d:04    10003     0     0    11472     0     0
igb2      - fe80::%igb2/6 fe80::290:bff:fe4        0     -     -        1     -     -
igb2      - 192.168.1.0/2 192.168.1.3           8559     -     -     3241     -     -
igb3*  1500 <Link#4>      00:90:0b:44:6d:05        0     0     0        0     0     0
igb4*  1500 <Link#5>      00:90:0b:44:6d:06        0     0     0        0     0     0
igb5*  1500 <Link#6>      00:90:0b:44:6d:07        0     0     0        0     0     0
enc0*  1536 <Link#7>      enc0                     0     0     0        0     0     0
lo0   16384 <Link#8>      lo0                  16749     0     0    16749     0     0
lo0       - ::1/128       ::1                      0     -     -        0     -     -
lo0       - fe80::%lo0/64 fe80::1%lo0              0     -     -        0     -     -
lo0       - 127.0.0.0/8   127.0.0.1            16746     -     -    16749     -     -
pflog 33160 <Link#9>      pflog0                   0     0     0    60670     0     0
pfsyn  1500 <Link#10>     pfsync0                  0     0     0        0     0     0
ovpnc  1500 <Link#11>     ovpnc1                9272     0     0    12888     0     0
ovpnc     - fe80::%ovpnc1 fe80::290:bff:fe4        0     -     -        1     -     -
ovpnc     - 10.8.0.0/24   10.8.0.8               693     -     -        0     -     -
wg0    1420 <Link#12>     wg0                    727     0     0     4073     0     0
wg0       - 172.27.252.0/ 172.27.252.1          2244     -     -     2647     -     -
wg1    1420 <Link#13>     wg1                    409     0     0      933     0     0
wg1       - 172.27.77.0/2 172.27.77.254            0     -     -        0     -     -



I disabled schedules as it kept interrupting traffic a lot more with that above message (pf being reloaded) popping up on a regular basis, trying to minimize the moving pieces to help with troubleshooting.

I use a remote LibreNMS instance that can reach private IP addresses behind that OPNsense instance over WG tunnel that is experiencing packet loss. It is just a bit difficult to correlate that kind of packet loss to the behaviour I observe on OPNsense that looks like something is going on in the way OPNsense 'reacts' to WG tunnels. I cannot say they are going down since I don't believe we can say they are stateful tunnels - I might be wrong here, if they are, well they I can only see the handshake.

On the WG tunnels, I have a keepalive of 2, that I reduced to 1 just to see if that would make a difference.

I have added a few RRD graphs showing the impact of the connectivity loss as LibreNMS cannot query SNMP due to the severe connectivity issue through WG. In order to rule out LibreNMS, I can confirm the same instance can reach others destinations on the internet with zero gap, zero loss. I am definitely aware that packet loss is not good and should be addressed, it is just the exponential effect on WG that is surprising to me and a bit hard to understand given the limited experience I have at troubleshooting WG and the behaviour of the Go implementation on OPNsense. I don't think it is a problem as such, it is more about how to make it more evident.

For reference see the attached graphs:

Normal no packet loss monitor report of memory utilization (no gap):
20210327-WG-troubleshooting-no-packetloss-2_result.jpg

How it has started:
20210327-WG-troubleshooting-packetloss-2_result.jpg

How it is going:
20210327-WG-troubleshooting-packetloss-1_result.jpg

Meanwhile packet loss towards major destination is minimal and, on a residential connection, the ISP is not really interested in doing much beyond modem/router restarts. It is a longer story, but they activated OFDMA on upstream channel last November which triggered a significant issue for many people. They disabled it during February and, apparently, they are now trying again. So, it will be a long battle, but I wanted to see if I could get a bit further along with traces from WG to confirm the effect of packet loss beyond running pings on top of WG tunnels. The real question is what happens in WG when I see the traffic stopping for such a long period of time while the underlying connectivity is experiencing not as big of a packet loss.

For what I can verify myself, I have the widget to see the last handshake for each WG tunnel, I look at the log where I can see clearly when WG comes up or goes down (if I disable WG) but during what I have seen with connectivity being affected, there is nothing at all in the logs.

Any idea would be welcome to help me progress on understanding WG in OPNsense a bit better.
#4
Thanks a lot for the reply. Funny enough, I came across your post as I was looking at how people may have done it on pfsense ;-)

The question was more related to the ability to install a package from FreeBSD with the absolute URL on OPNsense? Is that something that will work and call all the other dependencies?

Ultimately, I can try on a VM and see how far I go.
#5
Thank you all - I add the same issue. I applied 1.13.0_1 patch and I will see how it goes.
#6
Hi all,

I would need to run smokeping to do monitoring of a service from my OPNsense instance directly and not from a host behind the instance.
Since the package for smokeping exists for both FreeBSD - https://www.freshports.org/net-mgmt/smokeping/ and HardenedBSD - https://github.com/HardenedBSD/hardenedbsd-ports/tree/master/net-mgmt/smokeping, I was wondering if there is an easy way to deploy it on OPNsense.
It is purely for testing and not for production.

I wanted to see if I could create a formal plugin/package but that would be if I find a way to understand the logic for getting HardenedBSD packages on OPNsense (beyond UI integration and testing which I understand take time).

Thanks in advance for your guidance.
#7
Sorry - I skipped over the service type as 3322 is first listed...

Thanks for pointing out the obvious and sorry for the noise
#8
Hi All,

I am a bit confused while trying to configure the Dynamic DNS for DuckDNS.

Following the 'full help' I put my DuckDNS token as username and leave the password field as blank but I keep getting an error message stating the password field is required.

Any idea what's wrong?

I tried putting something bogus in the password field but I have no idea where to look for the log when the dynamic dns service retries and either fails or succeed. The cached IP keeps showing N/A

Any help/guidance would be appreciated.

OPNsense version: 18.7.9