Home
Help
Search
Login
Register
OPNsense Forum
»
Archive
»
19.1 Legacy Series
»
Observations on 19.1.1 (including possible CARP fix)
« previous
next »
Print
Pages: [
1
]
Author
Topic: Observations on 19.1.1 (including possible CARP fix) (Read 6833 times)
olgeni
Newbie
Posts: 10
Karma: 2
Observations on 19.1.1 (including possible CARP fix)
«
on:
February 21, 2019, 01:54:30 am »
Hello there,
Recently I have been migrating my network infrastructure to OPNsense from a random array of equipment. Most things went well but I bumped into a few issues, so I am here to report
CARP over VLAN over LAGG
No matter what I did, after each reboot I would get the following situation: out of 8 VLANs in total (running over LAGG) at least a couple ended up as master while the other were backups, on the same host. The backup host had the same situation, mirrored. At least one host, but sometimes both, would complain that "CARP has detected a problem and this unit has been demoted to BACKUP status. Check link status on all interfaces with configured CARP VIPs."
One issue is that the CARP statuses are mixed; the other is that I had no clue how to kick CARP back into shape after seeing the message, as a reboot would not solve it.
It seems that CARP tends to get very annoyed at LAGG, possibly due to the fact that it tries to transmit while LAGG is still figuring out the world.
My
net.inet.carp.demotion
had a crazy value of 1920. 1920/240 = 8 = total number of VLANs for which the initial transmission failed, I guess.
If I bumped it back to 0 the warning message disappeared, so I created an
etc/rc.syshook.d/start/92-demotion
script doing the following:
#!/bin/sh
sysctl net.inet.carp.senderr_demotion_factor=0
sysctl net.inet.carp.demotion=-$(sysctl -n net.inet.carp.demotion)
Ugly as it is, it made CARP make peace with the world and now everything seems to be working fine after every reboot. Maybe
net.inet.carp.senderr_demotion_factor
could be added to the tunables, and perhaps a big button to reset
net.inet.carp.demotion
on the CARP page.
IPsec and the Palo Alto Networks PA-3050
Long story short, the PA-3050 seems to have no clue about IPsec in the real world and needs lots of hand holding, as it is apparently unable to negotiate multiple CHILD_SAs. I enabled the
Tunnel Isolation
option but even that was not enough
In the end, I went into
etc/inc/plugins.inc.d/ipsec.inc
and added this line in the midst of other settings:
$strongswanTree['charon']['reuse_ikesa'] = 'no';
I'd say that
charon.reuse_ikesa
should be added to the GUI near the Tunnel Isolation option, just in case somebody is lucky enough to bump into a PA firewall.
I also noticed that setting the IPsec to "start" would cause it to drop after a while, and never come back up. The default settings (which is translated to "route") works fine, so I'm a bit puzzled about the inner working of "start."
Crash while writing config.xml
During the initial configuration of a couple of APU4 boxes (still at 18.7, then upgraded) I got a lot of crashes while toying with VLANs, possibly because the switch configuration had other issues that were then fixed.
The problem is that sometimes the system would begin to write
config.xml
, then crash, then reboot with a partial
config.xml
, then stay dead.
I didn't look into how the file is written, but perhaps it could be written in a temporary
.config.xml
, synced, then moved atomically to
config.xml
- maybe it would help a bit.
I got some data from the panic, but it's about 18.7 (
https://gist.github.com/olgeni/30d7fec5d3d7f6871438bd1e9d08258e
)
Kernel crash and partial loss of file system
During a particularly bad broadcast storm, courtesy of me experimenting with jails on a different host and ending up with two Ethernet ports on the same bridge(4) interface
, one of the APU4s (now upgraded to 19.1.1) crashed for unknown reasons and was unable to boot, as many of the OPNsense files ended up in /lost+found (including most of the Python libraries.)
I managed to get things back in shape by using the serial console and creating OPNsense packages from the other working box. It might be a bit far fetched due to the many things updating files (suricata, etc.) but it would be super cool if OPNsense could run from a readonly partition and mount it rw only when there's something to update.
Serial console stuck in ttydcd state
While the physical serial console of the APU4 worked perfectly well during the long crash recovery, I noticed that on the same box it will get stuck in
ttydcd
on every login:
login: root
Password:
----------------------------------------------
| Hello, this is OPNsense 18.7 | @@@@@@@@@@@@@@@
| | @@@@ @@@@
| Website:
https://opnsense.org/
| @@@\\\ ///@@@
| Handbook:
https://docs.opnsense.org/
| )))))))) ((((((((
| Forums:
https://forum.opnsense.org/
| @@@/// \\\@@@
| Lists:
https://lists.opnsense.org/
| @@@@ @@@@
| Code:
https://github.com/opnsense
| @@@@@@@@@@@@@@@
----------------------------------------------
load: 0.39 cmd: sh 50626 [ttydcd] 0.98r 0.00u 0.01s 0% 3324k
load: 0.39 cmd: sh 50626 [ttydcd] 2.43r 0.00u 0.01s 0% 3324k
load: 0.36 cmd: sh 50626 [ttydcd] 2.57r 0.00u 0.01s 0% 3324k
And there's no way to get past it. Since it worked in single user there must be something bothering it during the login process; I read about ttydcd issues when pressing Ctrl-S on the console but nobody is pressing it
Login stuck in pfault state
At some point, for yet unknown reasons, an OPNsense under bhyve stopped responding from the web UI - giving timeouts. When attempting to log in from the bhyve console I got stuck on this:
FreeBSD/amd64 (a.b.c.d) (ttyu0)
login: root
Password:
load: 0.59 cmd: php 43287 [pfault] 2.02r 0.00u 0.00s 0% 1276k
I have no idea what caused the pfault state, and most importantly no idea why php was being invoked here, as I configured my login shell to be plain /bin/sh. So, no data for this so far. I could not find anything obvious in the profile scripts.
Tunable kern.ipc.nmbclusters
I could not find it in the tunables - I have a 4 port
igb
card and apparently it is recommended to bump it up a bit. I put it in
loader.conf.local
.
All in all, everything seems to be working fine and I can finally get the ASA5505s out of my network
Logged
olgeni
Newbie
Posts: 10
Karma: 2
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #1 on:
February 21, 2019, 03:13:18 pm »
> I enabled the Tunnel Isolation option
It confuses the PA firewall even more if multiple right networks are configured... Just
reuse_ikesa
is enough.
One more note, I could not enable ecp384 from the UI and had to change my ipsec.conf to add it like this:
esp = aes256-sha512-ecp384,aes192-sha512,aes128-sha512!
Looks like a few algorithms are missing from the UI
Logged
olgeni
Newbie
Posts: 10
Karma: 2
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #2 on:
February 23, 2019, 05:14:35 pm »
More about the
ttydcd
state.
In OPNsense I have
getty
running like this:
/usr/libexec/getty std.115200 ttyu0
But on a stock installation of FreeBSD I have this:
/usr/libexec/getty 3wire ttyu0
According to
/etc/gettytab
, 3wire's description is "Entries for 3-wire serial terminals. These don't supply carrier, so clocal needs to be set, and crtscts needs to be unset."
Maybe that's related to the
ttydcd
state?
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #3 on:
February 23, 2019, 06:07:44 pm »
CARP/Vlan/Lagg works .. just installed one last week. Can you dump the carp packets on both sides? Probably a switch in between doing some crazy igmp snooping? Spanning tree blocking the ports on startup?
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
olgeni
Newbie
Posts: 10
Karma: 2
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #4 on:
February 25, 2019, 11:22:16 am »
Now it seems to be working fine - probably LACP on the switch was taking some time to negotiate stuff. I had all masters/backups on the correct side since the sysctl change.
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #5 on:
February 25, 2019, 02:09:06 pm »
What exactly did you change? I'm collecting some user reports to update the official docs for some hints regarding LACP problems ..
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
olgeni
Newbie
Posts: 10
Karma: 2
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #6 on:
February 25, 2019, 04:06:04 pm »
I set
net.inet.carp.senderr_demotion_factor=0
at boot time so that the host would not demote itself if the first CARP transmissions fail - but I had to do it from a script 'cause there's no tunable for that
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #7 on:
February 26, 2019, 11:25:21 am »
You can add the tunable manually via UI, no problem.
So, to summarize. You have a problem when startup and the port/vlan might not be ready, so you have a senderr error, causing demote value to increase. After some moments the carp packets see each other so you just use this script for a startup issue, no matter if vlan/spanning-tree/lacp related.
Correct?
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
olgeni
Newbie
Posts: 10
Karma: 2
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #8 on:
February 26, 2019, 12:55:40 pm »
That's true, I didn't even see the "add" button
Now I moved it to tunables and it works all the same.
All the rest is correct, it seems to be a startup issue and I used the sysctl as a quick fix.
Logged
mimugmail
Hero Member
Posts: 6766
Karma: 494
Re: Observations on 19.1.1 (including possible CARP fix)
«
Reply #9 on:
February 26, 2019, 04:11:57 pm »
Thanks for reporting, I'll try to add it to the docs section
Logged
WWW:
www.routerperformance.net
Support plans:
https://www.max-it.de/en/it-services/opnsense/
Commercial Plugins (German):
https://opnsense.max-it.de/
Print
Pages: [
1
]
« previous
next »
OPNsense Forum
»
Archive
»
19.1 Legacy Series
»
Observations on 19.1.1 (including possible CARP fix)