Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - Andreas L.

#1
I have updated my server as usual (and as I had done all times before) and as this update required a reboot I'm stuck now. The kernel does not boot on my hardware anymore, server is hanging with "Probing 13 blocks devices..." and that's it, see attached screenshot.

I'm running on bare metal and that is one hw-info from before: https://bsd-hardware.info/?probe=e161817fa3

I'm really looking for help, any idea how I could boot from USB and restore to former kernel somehow?
#2
I updated Thursday evening to OPNsense 21.1-amd64 and realized next morning that routed permanent video streams between LAN and WAN were significant slower until they broke very soon.

To the background, I have OPNsense running within my local network separating networks, so there is another router before reaching the Internet, which allows me 1GBit speedtests via OPNsense within my infrastructure.
So, in short I have LAN <-> OPNsense --> WAN <--> FritzBox --> Internet, this allows me stress tests from my LAN into the WAN without going through the slower internet connection of the provider. If it matters, I am running OPNsense on a 4 core Intel Xeon E5506, 20GB RAM, 2x Broadcom NetXtreme II BCM5709, 4x Intel 82580. Sensei currently deactivated.

After analysis I figured out that IDS/IPS is the root cause here. I came updated from 20.7.8_4 were everything was fine and as I read here https://opnsense.org/opnsense-21-1-marvelous-meerkat-released/ there are no changes made to Suricata within the release. I did not make any changes to the related setup or rules etc.

So, I made some interesting iperf3 measurements.

OPNsense v20.7.8_4:
Host LAN-net <-> Host WAN-net with IDS/IPS activated --> ~550 MBit/s

OPNsense v21.1:
Host LAN-net <-> Host WAN-net with IDS/IPS activated:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  1.68 GBytes   240 Mbits/sec  128             sender
[  5]   0.00-60.17  sec  1.68 GBytes   239 Mbits/sec                  receiver


Host LAN-net <-> Host WAN-net no IDS/IPS:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  2.54 GBytes   729 Mbits/sec  978             sender
[  5]   0.00-30.01  sec  2.54 GBytes   728 Mbits/sec                  receiver


OPNsense <-> Host WAN-net no IDS/IPS:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  3.29 GBytes   942 Mbits/sec  408             sender
[  5]   0.00-30.00  sec  3.29 GBytes   941 Mbits/sec                  receiver


As a result I can see within the update a performance dropped from ~550 Mbit/s down to ~240 Mbit/s, which is a performance drop of ~310 MBit/s aka 56%, which I cannot explain but measure. My overall routing power between LAN and WAN seems to be around 729 Mbit/s, which is acceptable for me as quite some video streams were passing through the firewall during measurement and where I do not have a comparable value from before the update.

Any suggestions what causes this IDS/IPS impact? Can someone second this behavior on his setup as well? I know for future only-internet-connections, this might be sufficient, but currently I feel unhappy with the result as it just came with the update to 21.1.
Looking forward for hints, ideas and comments.
#3
I'm planning on setting up traffic shaping with pipes and queues to limit bandwidth or prioritize traffic. I read through the documentation (https://docs.opnsense.org/manual/shaping.html) but I did not clearly unterstand if this always limits the bandwidth or only when the cable is under full load - if you understand what I mean?

For example, I have a 100 MBit internet connection and I want to prioritize bandwidth for a specific host (in a specific network) with always at least 40 MBit, which means, there are only 60 Mbit left for all other hosts, when all 100 Mbit are requested via WAN. Today I would assume there is an equal balance between all of them. But if no one is using the bandwidth I want everything - all 100 MBit - to be available for the specific host asking and do not want to limit it to 40 MBit only.

As I see it, I can setup a dedicated pipe according to (https://docs.opnsense.org/manual/how-tos/shaper.html) case 1 "Reserve dedicated bandwidth for a realtime traffic". Then it is reserved if all is requested, but will this host also get everything, if nothing else is used? This is not clear to me. There the setup seems to ensure, but also limit it, by creating upload and download pipes as it states "desired bandwidth" and not "minimum bandwidth" or likewise. I hope I can explain myself?

Or can/do I need to setup queues with pipes on top somehow? Such that I can kind of give a pipe a higher weight bevor it kicks in and if the weight is not requested, the pipe does not have any value?

What I would like to prevent is not to leave out available bandwidth just due to limiting down packets in relation to the corresponding pipe. If there is no traffic, I want to offer everything. How can I ensure this?
#4
I'm running OPNsense (20.7.2-amd64) with one Broadcom NetXtreme II BCM5709 for WAN (bce0) and one for LAN (bce1), further on I have 4x Intel 82580, which I use for other LANs like IoT (igb1) and Guests (igb0) etc.

I have "some" traffic on WAN with quite constantly 60 to 100MBit (mainly due to IP cam streams), which I consider as handeable with my setup. I also have IDS/IPS up and running as well as Sensei.

After "a while" (usually only minutes after reboot) of traffic I get the following error in the log, multiple times per second:

2020-09-10T00:28:10   kernel   490.690419 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:28:05   kernel   485.572543 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:28:00   kernel   480.194945 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:28:00   kernel   479.940436 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:27:54   kernel   474.761838 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:27:49   kernel   469.475112 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:27:44   kernel   464.324372 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:27:39   kernel   459.205033 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:27:33   kernel   453.830080 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:27:28   kernel   448.126626 [4006] netmap_transmit bce0 drop mbuf that needs checksum offload
2020-09-10T00:27:23   kernel   443.431391 [ 320] generic_netmap_register Emulated adapter for bce0 activated
2020-09-10T00:27:23   kernel   443.431259 [1130] generic_netmap_attach Emulated adapter for bce0 created (prev was NULL)
2020-09-10T00:27:23   kernel   bce0: permanently promiscuous mode enabled
2020-09-10T00:27:23   kernel   443.407436 [1035] generic_netmap_dtor Emulated netmap adapter for bce0 destroyed
2020-09-10T00:27:23   kernel   443.407409 [1130] generic_netmap_attach Emulated adapter for bce0 created (prev was NULL)

As you can see on the attached screenshot, the MBUF usage is at 0% and with ~9720 way below the limit of 1.271.626, so there should be plenty of MBUF available.

So what triggers this error?

I can get rid of it, when deactivating IDS/IPS, and since I'm testing it, the error did not show up again. So is it somehow IPS throughput related? Nonetheless, I would like to turn IDS/IPS on again :).

How can I tune my system, so the "netmap_transmit" can handle the load? (BTW: What process/step ist it, what does it do here?)
And whay does the mbuf "need checksum offload"? What does that exactly mean?

Some more config details:

I have all three hooks set, so all of these three are disabled:
- Hardware CRC
- Hardware TSO
- Hardware LRO


root@OPNsense:~ # sysctl -a | grep nmbclusters
kern.ipc.nmbclusters: 1271626

root@OPNsense:~ # sysctl -a | grep msi
hw.sdhci.enable_msi: 1
hw.puc.msi_disable: 0
hw.pci.honor_msi_blacklist: 1
hw.pci.msix_rewrite_table: 0
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1
hw.mfi.msi: 1
hw.malo.pci.msi_disable: 0
hw.ix.enable_msix: 1
hw.bce.msi_enable: 1
hw.aac.enable_msi: 1
machdep.disable_msix_migration: 0
machdep.num_msi_irqs: 512
dev.igb.3.iflib.disable_msix: 0
dev.igb.2.iflib.disable_msix: 0
dev.igb.1.iflib.disable_msix: 0
dev.igb.0.iflib.disable_msix: 0


BTW: I also experimented with following values, which did not bring any change:

kern.ipc.nmbclusters="2543660"
hw.bce.tso_enable="0"
hw.pci.enable_msix="0"
#5
I have a strange IPv6 behavior running my OPNsense 20.7.2-amd64 with IPv6 behind a FritzBox.

I got an IPv6 address as well as /60 sub net assigned to my WAN, but when I try to ping the gateway directly from the firewall, all ICMPv6s get lost. I had opened firewall for all ICMPv6 on WAN on all directions.

This is what happens:

fe80::c225:6ff:feff:820d = FritzBox Link local address, correctly set as default IPv6 gateway
bce0 = WAN infterface

I cannot directly ping my router aka FritzBox :o:

root@OPNsense:~ # ping6 -c 3 fe80::c225:6ff:feff:820d%bce0
PING6(56=40+8+8 bytes) fe80::221:5eff:fec8:be88%bce0 --> fe80::c225:6ff:feff:820d%bce0

--- fe80::c225:6ff:feff:820d%bce0 ping6 statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss


I cannot ping google's ipv6 dedicated host:

root@OPNsense:~ # ping6 -c3 ipv6.google.com
PING6(56=40+8+8 bytes) 2a02:2f4:xxxx:xxxx:221:5eff:fec8:be88 --> 2a00:1450:4001:81b::200e
--- ipv6.l.google.com ping6 statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss


BUT I can make a fine traceroute6 to that address, that works as expected (done via UDP):

root@OPNsense:~ # traceroute6 ipv6.google.com
traceroute6 to ipv6.l.google.com (2a00:1450:4001:81b::200e) from 2a02:2f4:xxxx:xxxx:221:5eff:fec8:be88, 64 hops max, 20 byte packets
1  2a02:2f4:xxxx:xxxx:c225:6ff:feff:820d  0.510 ms  0.468 ms  0.388 ms
2  2a02:2f0:0:72::  4.589 ms  14.459 ms  21.283 ms
3  2a02:2f0:0:34::  4.682 ms  4.618 ms  4.451 ms
4  2a02:2f0:4002::5d32:a0  7.877 ms  4.728 ms  4.649 ms
5  2001:4860:0:12e6::4  5.377 ms
    2001:4860:0:12e3::3  4.899 ms
    2001:4860:0:12e4::2  5.358 ms
6  2001:4860::c:4001:ec6  5.069 ms
    2001:4860::c:4001:ebe  15.328 ms
    2001:4860::c:4001:ec6  4.939 ms
7  2001:4860::c:4001:9920  15.494 ms
    2001:4860::c:4001:5c4  10.797 ms
    2001:4860::c:4001:9920  15.498 ms
8  2001:4860::8:0:cb95  14.999 ms
    2001:4860::c:4000:f873  14.720 ms *
9  2001:4860::1:0:d0d8  15.346 ms
    2001:4860::9:4001:31f1  14.559 ms  14.683 ms
10  2001:4860:0:1::673  14.393 ms  14.732 ms
    2001:4860:0:1::671  14.432 ms
11  fra15s16-in-x0e.1e100.net  14.496 ms
    2001:4860:0:1::671  14.465 ms  14.405 ms


Then I just ask for all the router in my local network via multicast request and I suddenly get an answer from the Fritzbox :o, this really puzzles me:

#All Routers Address:
root@OPNsense:~ # ping6 -c 2 ff02::2
PING6(56=40+8+8 bytes) fe80::221:5eff:fec8:be88%bce0 --> ff02::2
16 bytes from fe80::c225:6ff:feff:820d%bce0, icmp_seq=0 hlim=64 time=0.562 ms
16 bytes from fe80::c225:6ff:feff:820d%bce0, icmp_seq=1 hlim=64 time=0.629 ms

--- ff02::2 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.562/0.595/0.629/0.034 ms


So it obviously can receive and answer on ICMPv6, so I ping it directly again, but it does not answer:


root@OPNsense:~ # ping6 -c 3  fe80::c225:6ff:feff:820d%bce0
PING6(56=40+8+8 bytes) fe80::221:5eff:fec8:be88%bce0 --> fe80::c225:6ff:feff:820d%bce0

--- fe80::c225:6ff:feff:820d%bce0 ping6 statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss


I really need a good advise, what is wrong here?

Summary:

  • Multicast ping aka ICMPv6 works
  • Direct ping does not work
  • traceroute6 into internet works fine
  • ping into internet does not work

#6
I have a problem getting IPv6 up and running in a stable way. I'm connected via FTTH behind a FritzBox (FB) and running IPv4 fine. Since some days my f***** provider forced me silently behind a Carrier Gate NAT but added - at least - a public IPv6 within the same change now. So I got a common /56 to use on my own, which is assigned towards my FB. Within the FB IPv6 setup I activated "DNS-Server und IPv6-Präfix (IA_PD) zuweisen" aka allowing it to share (parts of) the /56 via DHCPv6 further on with other routers in the LAN.

Behind the FB I've my OPNsense (OPNsense 20.7.1-amd64) running, where all my clients are, so I'm routing from my computer (Ubuntu 20.04 Linux) via OPNsense via FB into the Web. I know I'm double NATted within IPv4 (with CGNAT I guess even three times, but who cares...).  This forced me now to get deeper into IPv6 again so I also activated IPv6 within OPNsense according to common descriptions and examples in the web.

This means, what I did so far - beside the FB setup as explained before:

  • Activated IPv6 witin OPNsense
  • Set "IPv6 Configuration Type" on WAN (bce0) IF to DHCPv6
  • Set within the basic "DHCPv6 client configuration":
    • Request only an IPv6 prefix --> true
    • Prefix delegation size --> 60 (As I got a /56 and I just wanted to have "some" (4 Bytes aka 16) subnets available on OPNsense (some more I can experiment on another router later)
    • Send IPv6 prefix hint --> true
    • Use IPv4 connectivity --> false
    • Use VLAN priority --> Disabled
  • On the LAN interface (bce1) I defined "IPv6 Configuration Type" as "Track Interface"
  • Deactivated "Block private networks" as well as "Block bogon networks" on LAN IF (as the LAN behind the FB obviously falls under these rules)
  • Under "Track IPv6 Interface" I set the value "WAN" for parameter "IPv6 Interface"
  • Left the "IPv6 Prefix ID" unchanged at 0
  • "Manual configuration" I left "false"
  • Setup a Firewall rule to allow all ICMPv6 travel IN from WAN as well as for LAN (to cover all IPv6 ping and MTU-size requirements etc.)

With this setup, LAN got a decent IPv6 assigned from the FB as well as the /60. So this works fine and all clients within the OPNsense LAN got IPv6 addresses from the first subnet assigned. This looked OK so far and as far as I can evaluate.

Problem:
Now we come to the problem, sometimes I can ping the web and sometimes and I can only ping the OPNsense firewall from my computer. Then OPNsense cannot ping the FB via fe80 addr and then it works out of sudden. I would swear, I did not change the setup, I either rebooted or I only "refreshed" an interface, like e.g. WAN interface by pressing safe/apply without any changes and then it worked - sometimes. I'm annoyed as I cannot find a pattern for "sometimes" and my setup is fairly common with all I googled so far. So why does the behaviour flip so much? When it works it usually works really long. Last time I destroyed and reset me setup again when we had the Cloudfare aka transit provider issue in the internet as I first thought it was me here at home - bad timing :).

First there was no chance for the OPNsense to ping the FB, but then I read somewhere else that OPNsense is usually not following the entries in the routing table but enforces a GW per IF. So activated the option "Disable force gateway" on 'Firewall --> Settings --> Advanced' which leads to the route table to be used/evaluated. Right after this it worked out of the box, but I was never able to reproduce this step.

I then restored everything from a working backup, did the steps again and got another behaviour. Today I stopped experimenting and turned IPv6 of again on LAN so far as many website did not work with IPv6 not being correctly routed into the web.

I also had sometimes the impression that I made changes, which did not really have an effect or immediate impact. Then I thought I have it, restarted the firewall and after the boot it did not work out again or was behaving differently again, while it was working fine before I rebooted.

Solution approaches:
Beside many changes in the setup and some reboots I focussed closer on the firewall log. I added a logging IPv6 rule allowing all in via LAN, I named it "Default allow LAN IPv6 to any rule" and it allows any IPv6 protocol from everywhere to eveywhere in LAN, so more or less opening up the firewall. Just to be sure I also added the same rule for "out" on LAN in case I need to secure the way back and log it. Similar rules are defined for WAN, so I had a quite oben IPv6 net allowing all between OPNsense-LAN and Fritzbox-LAN aka OPNsense-WAN.

On my machine (ending on ::14cf) I have a client running which constanly tries to reach a secured (port tcp 443) web service on a server in the web (ending with ::59) and as you can see, sometimes packages are blocked randomly by the default deny rule, even though I explicitly opened all ports here. ICMPv6 is working fine as well. So even though the firewall was "mainly green", in this case the connection was not working - but traceroute6 (UDP) showed a perfect route from start to end - so routing in general seems to work correctly.




I have further screenshots, where I also show the details of the rules, while kicking in:

Block-Details:


Accept-Details:


And it does not matter, when I change the service or the target in the web, I can always monitor the same behaviour.
BTW, traceroute6 is always sucvcessful and correct. So the route as such seems to be fully correct. DNS works perfect and route is correct, but the real tcp content is not routed as it should.


What I might need to add is my local machines routing table for IPv6 (parts):

route -n6:

Kernel-IPv6-Routentabelle
Destination                    Next Hop                   Flag Met Ref Use If
::1/128                        ::                         U    256 2     0 lo
2a02:xxx:xxxx:fcf0::/64        ::                         Ue   100 9     0 enp5s0_vlan3
fe80::/64                      ::                         U    256 10     0 enp5s0_vlan3
::/0                           fe80::221:5eff:fec8:be8a   UGe  100 9     0 enp5s0_vlan3
::1/128                        ::                         Un   0   11     0 lo
2a02:xxx:xxxx:fcf0::14cf/128   ::                         Un   0   10     0 enp5s0_vlan3
2a02:xxx:xxxx:fcf0:82ee:73ff:fe28:2165/128 ::                         Un   0   4     0 enp5s0_vlan3
fe80::82ee:73ff:fe28:2165/128  ::                         Un   0   9     0 enp5s0_vlan3
ff00::/8                       ::                         U    256 11     0 enp5s0_vlan3
::/0                           ::                         !n   -1  1     0 lo


With:
fe80::221:5eff:fec8:be8a --> Being my OPNsense LAN link local address

And with "ip a s" showing (parts):
enp5s0_vlan3@enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 80:ee:73:28:21:65 brd ff:ff:ff:ff:ff:ff
    inet 192.168.42.10/24 brd 192.168.42.255 scope global enp5s0_vlan3
       valid_lft forever preferred_lft forever
    inet6 2a02:xxx:xxxx:fcf0::14cf/128 scope global dynamic noprefixroute
       valid_lft 4094sec preferred_lft 435sec
    inet6 2a02:xxx:xxxx:fcf0:82ee:73ff:fe28:2165/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 85754sec preferred_lft 13754sec
    inet6 fe80::82ee:73ff:fe28:2165/64 scope link
       valid_lft forever preferred_lft forever


I have no clue, why there is this random behaviour of OPNsense with IPv6 while IPv4 is running like a charm. I cannot find a pattern or where to work on, were is the problem or is it a software issue?
Looking very much for anyones smart input or thoughts or own experience, after I have invested days now and a lot of googleing!
Thanks in advance!

PS: And thanks for reading until here :). This description became more and more now :-\.
#7
Hej firewall experts, I go nuts as I have two times the same thing, but once in IPv4 working and once in IPv6 not working. This is only all about allowing mDNS broadcasts to the common broadcast addresses (224.0.0.251 and [ff02::fb]) on port 5353 via UDP from LAN.

I have set up two aliases including the hosts as described above and as it can be seen on the screenshots. First I had both addresses in one alias, but now I split it up for dedicated IPv4 and IPv6 targets and created two rules by also copying them. Both on the LAN interface. IPv4 always worked right from the beginning but the copied IPv6 one is not considered and then finally blocked.

What is wrong here or what is different for IPv6? I do not get or see it.

I'm running newest release (OPNsense 20.1.2-amd64, FreeBSD 11.2-RELEASE-p17-HBSD, OpenSSL 1.1.1d 10 Sep 2019).
Looking for some good ideas/feedback or what I'm overlooking. Please ask, if you need more details.
Thanks in advance.
#8
General Discussion / LAN "disconnected" after high load
September 05, 2019, 09:35:51 PM
Hej guys,
I really need some help as I recognized some strange behavior of my OPNsense setup (OPNsense 19.7.3-amd64, FreeBSD 11.2-RELEASE-p14-HBSD, OpenSSL 1.0.2s 28 May 2019).

I hope I provide enough information so you understand the issue. I'm running my OPNsense as a KVM machine on quite powerful server (Ubuntu 18.04) with dedicated passthrough network cards (4CPUs, 16 GB RAM). Currently I have assigned two NICs, one for WAN and one for LAN and "other LANs" with different VLANs. All works as expected with different DHCP servers, firewall rules etc. and I have very low load (only three clients as being in test mode).

See attached pic: 2019-09-05_OPNsense-IFs.png

Yesterday I started a "huge" test load, so I opened a port on WAN and forwarded it via NAT Port Forward to the SSH port of the client in the usual LAN network. Connections work like a charm and I was able to connect to the SSH server via port forward from WAN network - all as it should be.

Then I started quite some load (rsync via SSH) on only two connections (two rsync processes running equally). And around 500 GB were planned to be transferred from WAN to LAN (1GBit connection bandwidth).

See attached pic: 2019-09-05_OPNsense_LoadDuringTest.png

It all ran quite well, even though I was surprised about the high CPU load with my setup. I do not have intrusion detection, ntopng, NetFlow, Insight etc. set up right now, so packets are expected to only be guided through the firewall rules.

Then I let it run, went to bed and today in the morning (when I was at work already), the connection was surprisingly dead, so I was not able to reach the LAN from the WAN anymore. So I connected from my phone via wireguard to OPNsense and connection went smooth, load was around 0,30 so all seemed to be fine on OPNsense and I tried to check, if I could see anything from here. I used the diagnostic tools to ping the SSH host directly from OPNsense as "being in the local LAN" then and was not able to ping it. So I thought my target system died, but it never came to my mind that the LAN IF died :o.

When I cam home, I recognized my other host (also in the same LAN) was offline as well and I was not able to reach OPNsense from LAN anymore - but I was still able to connect via wireguard from my phone. So I rebooted OPNsense from my phone and did a clean reboot and immediately after the reboot my two hosts in the LAN were back online again as you can see in the graph (I started one rsync again in the early evening to test the connection again).

See attached pic: 2019-09-05_Traffic_LAN-Info.png
See attached pic: 2019-09-05_Traffic_WAN-Info.png

Conclusion so far:
During normal low lazy load from several LAN clients, it all runs stable. But having "heavy load" from WAN into LAN over a longer period made by OPNsense "somehow invisibly die" on the LAN side and I have no clue what happened here. This also happend once before but there I thought it was my fault due to other changes.

Does anyone have an idea or need more details?
There are no error packages logged, there are no related log entries, I could find. Could that be KVM setup related in relation to high load over a longer period? I can also always reproduce this.

Thanks in advance!
/Andreas