OPNsense
  • Home
  • Help
  • Search
  • Login
  • Register

  • OPNsense Forum »
  • Profile of Andreas_ »
  • Show Posts »
  • Topics
  • Profile Info
    • Summary
    • Show Stats
    • Show Posts...
      • Messages
      • Topics
      • Attachments

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

  • Messages
  • Topics
  • Attachments

Topics - Andreas_

Pages: [1] 2
1
22.1 Legacy Series / CARP MASTER during reboot despite maintenance mode
« on: February 22, 2022, 04:33:59 pm »
When doing regular maintenance on our CARP cluster, I regularly disable CARP on the machine and enter persistent maintenance mode. I'd expect it to never get MASTER until I enable CARP again.

Now, I rebooted the machine (22.1.1), and while it came up I glanced "Timeout on ix2, becoming MASTER" on the console for a second or so until it stepped back to BACKUP.
While I also have layered interfaces (vlan over lagg over 10GBit), this very ix2 interface is just a plain 1GBit onboard Intel NIC, connected to a switch, no VLAN no whistles or bells (upstream internet).

Having double master even for fractions of a second will screw up network traffic more or less badly, so this really isn't good and shouldn't happen, maintenance mode or not.

So how to safely reboot a router without triggering major trouble?

2
22.1 Legacy Series / Interface errors after Upgrade
« on: February 18, 2022, 06:30:19 pm »
After upgrading from 21.7 to 22.1.1, my firewall shows an error rate of 0.05/s avg on the WAN interface (SuperMicro A2SDI on-board NIC connected to a Juniper switch afaik), which used to be zero before (netstat -i Ierrs). The machine has 4 more connections, all are still at zero errors.

What may be the reason?

3
20.7 Legacy Series / IPSEC traffic stalling after 20.7.1 upgrade
« on: September 01, 2020, 03:52:20 pm »
We have an opnsense installation (CARP pair), running on 20.1.3 until recently, with 4 ipsec peers (2xIKEv1, 2xIKEv2) and some 20 tunnels defined. This used to run flawlessly, until I upgraded those machines to 20.7.1. Since then, the tunnels will stop working after a while, until a new connect is forced on the tunnel. Strangely, all logging looks normal on both sides of the tunnel, even when the tunnel traffic has stalled (still IKE/ISAKMP traffic, but no more ESP packets)

The situation is a little different between peers, and sometimes there are stable phases for one peer, getting bad again after a while, but none is 100% fine. It will take some seconds to some minutes until the tunnels stall; more traffic seems to speed up the failure.

I reinstalled one firewall with 20.1, and now we have stable performance again. The backup machine is in maintenance mode and still 20.7.1 (with syslog-ng fixed).

When reviewing the updates that happened between 20.1.3 and 20.7.1, strongswan was upgraded from 4.8.2 to 4.8.4 (in April), and the kernel from 11.2 to 12.1. Since IKE/ISAKMP traffic seems normal, I'd suspect some issue in the kernel/pf, but I'm out of clues how to narrow down the reason further.

Any thoughts on this?
Regards,
Andreas

4
20.1 Legacy Series / APPLY added interface shakes up firewall
« on: June 05, 2020, 01:38:28 pm »
On a pair of 20.1 firewalls coupled with CARP, I have some 27 VIFs defined, total 10 interfaces.
Recently, I added two more VLANs and two interfaces. When enabling the new interfaces on the CARP master (without IP for the start), many VIFs became unresponsive after APPLY. To resolve the issue quickly, I disabled and re-enabled CARP on the master, and operation was re-established.

I'm quite sure that I could add interfaces without any problems "in the old days", but trouble increased over time (hiccup/interrupts for some seconds), now defunct VIFs. This might have several reasons:
- just a matter of many interfaces
- opnsense code to execute the interface operations was changed

Is this a known issue? Something I have to live with?

5
20.1 Legacy Series / IPSEC phase2 edit not working
« on: May 08, 2020, 03:16:14 pm »
On a firewall with 20.1.3 (now 20.1.6), I noticed that it's not possible to edit phase2 entries on an existing configuration (unchanged and running for a while, probably older than a year)

When editing, the remote address or network isn't shown, when duplicating a single address phase2 entry with a different target addr, I get "already exists".

Digging further, I find some entries displaying (and probably editing) correctly, others not. A look at config.xml shows the problem: then non-functional entries don't habe a uniqid.

6
19.1 Legacy Series / reordering packets under higher traffic
« on: July 05, 2019, 04:55:58 pm »
I'm running OpnSense 19.1 on Xen, connecting a DMZ host to its file server via NFS.
On rare occasions, when a big file is transferred, the nfs connection is broken, and a new tcp connection has to be started.

I've been tcpdumping the traffic in and out of the firewall (TCP segment offloading is disabled on all interfaces to avoid driver trouble), and found the following explanation:
Sometimes, a big PDU sent from the fileserver (split into 364 segments within 9.5ms) isn't forwarded to the destination DMZ host in-order, but instead in the middle of the flow segments are forwarded out-of-order, provoking out-of-order acks and resends, apparently driving the tcp stack mad and ultimately breaking the connection.

The server is a Xeon E5-2620V3, with 4 CPUs assigned to the firewall (low single digit cpu utilization, load rarely reaching 1), and no other machines running on the host. Typical state table size is 450, mbuf usage 800.

While the usage pattern of the system and general load hasn't changed over the last year, the problem started some months ago, which kind of coincides with the upgrade to 19.1 and the hardened kernel.

Why does the firewall start reordering, what can I do to prevent that?

Regards
Andreas

7
19.1 Legacy Series / system crashes
« on: June 21, 2019, 05:33:42 pm »
In the weeks, I observe spontaneous reboots every 3 days or so. This happens on the current master of a carp router pair (slightly different hardware after repair of one router), so this doesn't seem to be hardware related but appears caused by some kernel update in the last months.
The system.log doesn't help, it just logs the fresh boot. External monitoring doesn't show any anormal traffic or load (0.5 load, 2-5% CPU) on the Atom C3758 system with 19.1.9.

Any hints how to narrow this down?

Regards
Andreas

8
19.1 Legacy Series / flowd.log not rotated
« on: May 31, 2019, 10:14:12 am »
I've been running netflow logging a while now flawlessly. Since about two weeks (I guess post-19.1.5), rotation of flowd.log stalls, resulting in a fillled disk and consequently stopping some services for out-of-diskspace reasons. This includes flowd and flowd_aggregate.py. The firewall is monitored, so I could check that flowd_aggregate is killed from out-of-disk, not before.

I upgraded to 19.1.8, and see some strange behaviour of flowd_aggregate:
Usually, flowd.log will reach between 11MB and 13MB before rotating; it takes about 5 minutes for this size. But I also can observe this:

-rw-------  1 root  wheel   3.2M May 31 10:01 flowd.log
-rw-------  1 root  wheel    12M May 31 10:00 flowd.log.000001
-rw-------  1 root  wheel    13M May 31 09:56 flowd.log.000002
-rw-------  1 root  wheel    56M May 31 09:51 flowd.log.000003
-rw-------  1 root  wheel    13M May 31 09:29 flowd.log.000004
-rw-------  1 root  wheel    23M May 31 09:24 flowd.log.000005
-rw-------  1 root  wheel    12M May 31 09:15 flowd.log.000006
-rw-------  1 root  wheel    13M May 31 09:11 flowd.log.000007
-rw-------  1 root  wheel    11M May 31 09:06 flowd.log.000008
-rw-------  1 root  wheel    12M May 31 09:01 flowd.log.000009
-rw-------  1 root  wheel    12M May 31 08:56 flowd.log.000010

So obviously flowd_aggregate stalls sometimes for some minutes, and then continues to work. I can't see any anomalies in cpu load or usage during this period. Nothing in the system log, the last flowd related message is
 May 31 06:47:26 fw05a flowd_aggregate.py: vacuum done

While this is not the total stop of flowd.log rotation I've been suffering from in the last weeks, it still seems suspicious to me. What's going wrong here?

Added: stalling is happening right now, and I can see the flowd_aggregate proces consuming a lot of CPU.

9
19.1 Legacy Series / CARP over LAGG problems
« on: May 03, 2019, 09:25:38 am »
I usually do my opnsense upgrades by first updating the usually-backup machine, disabling carp on the master and updating it as well.
Now when upgrading from 19.1.2 to 19.1.6 (which needs reboot), I found that some VHIDs would go to master and some to backup (net.inet.carp.preempt=0, should be 1 but helpful for debugging here) afterwards. The VHIDs that became master all are on a LAGG interface (directly or VLAN), the others remaining on backup are on physical interfaces. When disabling and enabling carp on the master machine, the situation was resolved. Apparently, the LAGG interface didn't receive carp packets from the master in-time when booting up, so the rebooted machine suspected it needed to become master itself.

After my HA setup was settled and working normally, I started to upgrade the switches one by one. With one switch down, the LAGG interface is still workable, since only one of both physical interfaces looses connection, but CARP seems to increase demotion based on the physical interface, not the resulting LAGG interface. In order to not have CARP failing over unnecessarily (which would affect eg. OpenVPN connections), CARP on the backup needs to be disabled temporarily.

So there seem to be two issues here: CARP expecting traffic before LAGG is ready, and CARP demotion reacting to LAGG slave interfaces instead of the LAGG interface itself.



10
General Discussion / Multicast storm created by firewall
« on: April 09, 2019, 12:08:07 pm »
From time to time, we're suffering from some strange issue:
Triggered by a workstation on LAN1 sending a ws-discovery multicast on port 3702 (or some other service, just as example), some thousand duplicated packets can be seen on LAN2 (with LAN1-address as sender and mcast as destination), with the source MAC address of the backup firewall of a CARP pair.

Or in other words:
The carp backup firewall, which should be listening passively, creates IP Multicast packets with its own LAN2 MAC source address, LAN1 IP source Address of a client, with a rate of about 5000/s and will not stop until the firewall is kicked with pfctl -d;pfctl -e

Hotfix is to drop UDP traffic to specific ports (such as 3702) on the LAN1 network, but a firewall shouldn't create such packets on its own, right? It's 19.1 (had this already with 18.1/18.7), no specific Multicast/IGMP settings or modules.

11
19.1 Legacy Series / 19.1.x ipsec phase2 edit problem with old entries
« on: March 29, 2019, 04:07:15 pm »
I'd like to add a third phase2 entry to an ipsec definition, so I pressed "clone" on an existing phase2 entry. The fields of the page coming up are not prefilled, and if filled and saved the entry won't show up. Looking at a config backup, the entry misses the ikeid, but has a uniqueid instead.

Investigating further, apparently editing is affected as well: parameters shown are not the one reflecting the phase2 entry to edit.
Only adding a fresh entry seems to work.

Version 19.1.2 and 19.1.4 tested, after refreshing with F5.

12
18.7 Legacy Series / unbound domain overrides not evaluated
« on: December 28, 2018, 10:56:19 am »
I've been upgrading from 18.1.x to 18.7.9, and had some trouble with unbound not resolving some domain overrides after that. It would resolve some addresses from cache, and tried some from root servers. I had to edit and save a domain override unmodified to get unbound back to normal work.
This happened on two machines (master and slave).

13
18.7 Legacy Series / Broadcast flood generated by firewall
« on: November 23, 2018, 05:46:24 pm »
There are some smartphones that will connect via wireless to one LAN or another, depending on app needs. Apparently, IOS phones may remember the old IP address, and sending out UDP broadcasts for quite some stuff (SMB, dropbox, spotify) using the old IP address (network A) on a LAN that has another network B.
Even if the iPhone is disconnected, about 4000 packets/s are still broadcasted, originating from the firewall's B network, but broadcasting A-sourced packets.
I have invented block rules
- for specific UDP ports
- for 255.255.255.255 destination
- for any packets that don't originate from that interface's network

Still, these broadcast storms from the firewall persist.
To stop the storm, I need to issue pfctl -d ; pfctl -e

I'm running out of ideas.

card/pfsync pair of opnsense, sometimes the master is the source of the broadcasts, sometimes the backup.

Anybody a clue for me?
Regards
Andreas

14
18.7 Legacy Series / os-bind too minimalistic, corrupting manual config
« on: September 27, 2018, 08:02:14 pm »
For quite a while I had bind9 running on my firewalls (from FreeBsd repo), acting as secondary.
After Upgrade to 18.7, named was gone, and I installed os-bind, not immediately noticing the OpnSense configuration options so I went on configuring as usual.
Unfortunately, my configuration won't survive a reboot, and the config pages for bind are not sufficient to configure.
A free text "custom options" field as in unbound config or "advanced" as in dnsmasq would be very helpful.

15
18.1 Legacy Series / outgoing NAT with interface_address uses carp ip
« on: February 15, 2018, 12:12:10 pm »
After upgrading to 17.x to 18.1.2, the outgoing NAT address translation doesn't work any more as expected.

I have outgoing nat configured to use the interface address on a CARP cluster, which used do use the physical ip address of each machine.
After the upgrade, outgoing traffic uses all VIF ip addresses randomly, making some sites' session handling nonfunctional.

Pages: [1] 2
OPNsense is an OSS project © Deciso B.V. 2015 - 2023 All rights reserved
  • SMF 2.0.19 | SMF © 2021, Simple Machines
    Privacy Policy
    | XHTML | RSS | WAP2