Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - mfedv

#1
Hi,

playing around with wireguard on a CARP HA installation, I tried the carp syshook script from https://forum.opnsense.org/index.php?topic=25993.msg129864#msg129864.

In many cases, the syshook script gets properly invoked and the wireguard-go process is started/stopped accordingly.

But using "Temporarily Disable CARP" ( Interfaces / Virtual IPs / Status ) on the current MASTER, the script is not getting called and wireguard-ko keeps running on previous MASTER while also being started on previous BACKUP. When clicking "Enable CARP" again, the script is first called for "BACKUP" state, then for "MASTER" state in short succession.

So you can't really rely on carp hook invocations alone, you would also need to do additional regular monitoring (e.g. via cron). This is rather cumbersome.

If devd does not fire in this situation, perhaps this can be simulated by carp_status.php?

Regards
Matthias

(even if wireguard should not actually be suited for HA failover, these missing hook notifications are a more general problem not restricted to wireguard alone)
#2
Quote from: mimugmail on September 29, 2021, 04:10:23 PM
You can also go to System : Trust : Authorities, remove the old CA which expires today, then go to LE plugin and renew all, then go to your sevices and look if they are correctly linked and restart.

Thats a good one.
Removing the expiring R3 cert was the first thing I tried, but with all my LE certs gone from System:Trust:Certificates I panicked and grabbed a backup config. Did not think of renewing them at that point.

Will be a busy day at letsencrypt when everybody renews all of their certs on the same day :-)

Matthias
#3
Hi,

opnsense/acme still uses an old Let's Encrypt R3 intermediate
certificate, pointing to a root CA (DST Root CA X3) that is about to
expire tomorrow (Sep. 30):

    https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/

Ubuntu decided to jump ahead and removed the DST Root CA X3 already in
yesterday's update. While Firefox uses its own truststore and thus still
accepts these certificates, many cli commands on Ubuntu now don't accept
them anymore. Lost some of tonight's backups (restic) because of that.
Other, non-Ubuntu systems might show the same problems on/after
September 30.

Old trust path:

  local cert
    -> C = US, O = Let's Encrypt, CN = R3
       ->  O = Digital Signature Trust Co., CN = DST Root CA X3

New trust path:

local cert
    -> C = US, O = Let's Encrypt, CN = R3 (same entity as above, but different signature)
       ->  C = US, O = Internet Security Research Group, CN = ISRG Root X1

In System / Trust / Authorities I had both versions of the R3
intermediate certificate, but all of the local certs referred to the
old, now untrusted one.
It seems not to be possible in the GUI to just remove the old
certificate without also removing all those local certs referring to it.

I had to resort to manually editing /conf/config.xml, replacing all
occurances of
    <caref>600b59276e541</caref>
with
    <caref>60ac21f018263</caref>
and then rebooting (there is probably some less disrupting way).

Note: the IDs _will_ be different on every installation. You can find
the IDs for your installation on the command line using

    # grep -B 1 '<descr>R3 ' /conf/config.xml
        <refid>5fd0f040a02cd</refid>
        <descr>R3 (Let's Encrypt)</descr>
    --
        <refid>6093156cc2158</refid>
        <descr>R3 (ACME Client)</descr>

The one labeld "ACME Client" will be the current version of the R3
intermediate certificate.


You might want to check with your opnsense installations, too, if you
use the ACME plugin.

Regards
Matthias
#4
Hi,

the 21.7.3 announcement at

    https://forum.opnsense.org/index.php?topic=24864.0

fails to mention haproxy, but 21.7.3 updates haproxy to 2.2.17, which
contains a fix for the recently discovered HTTP Smuggling vulnerability
(CVE-2021-40346):

    https://jfrog.com/blog/critical-vulnerability-in-haproxy-cve-2021-40346-integer-overflow-enables-http-smuggling/

One more reason to install the upgrade (btw, thanks for all the good
work!)

Matthias
#5
(21.1.7)

Hi,

being able to store custom live view filters as templates is a very
welcome feature (thanks!).

But I can not get these templates to sync to the other firewall in a HA
setting. Do I need to tick some option in System / High Availability /
Settings? Or is HA sync for these templates not (yet) implemented?


Regards
Matthias
#6
Hi,

since letsencrypt certificate (acme) plugin works-for-me most of the
time, I rarely find myself working with it. So I keep forgetting the
exact meaning of the buttons in this overview page, and I always whished
it would give a hint when hovering over the buttons.

Tried to imitate the tooltip style of System / Trust / Certificates, but
failed. So this is just using a title="..." attribute instead.

Regards
Matthias
#7
Hi,

this fold is much more elegant, I had missed the optional third argument
to legacy_interface_deladdress().

Yes, the fixed 2 second delay is almost like cheating. It only works for
me and perhaps most other installations because the default carp
interval is 1 second, so after 2 seconds (+ gui network latency) carp
should be settled.

Not sure what a proper solution would look like, there are so many
tuning knobs to consider.

Perhaps postpone the client redirect until
    1 + min(configured carp intervals)
seconds have passed, then waiting for some limited additional time after
that for a consistent carp state. But carp state might even become
inconsistent while we wait because of some real failure.

Or have a few (3 - 5) client refreshs, live-watching carp state changes.

Regards
Matthias Ferdinand
#8
(20.7.8, also older opnsense versions)

Hi,
   
when using any IPv6 for CARP Virtual IPs, clicking "Temporarily Disable
CARP" ( Interfaces / Virtual IPs / Status ) on the MASTER machine
produces the following error message in the GUI:
   
    CARP has detected a problem and this unit has been demoted to BACKUP status.
    Check link status on all interfaces with configured CARP VIPs.
           
and the following in /var/log/system.log:
           
    Jan 20 22:08:32 opnsense1 opnsense[10333]: /carp_status.php: The command `/sbin/ifconfig 'vtnet2' -alias '2001:db8:381c:abc::3'' failed to execute
    Jan 20 22:08:32 opnsense1 opnsense[10333]: /carp_status.php: The command `/sbin/ifconfig 'vtnet3' -alias '2001:db8:391c:abc::3'' failed to execute

and ipv6 carp addresses stay configured:

    # ifconfig | grep -e "^vtnet" -e vhid | grep -B 1 vhid
    vtnet2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
            inet6 2001:db8:381c:abc::3 prefixlen 64 vhid 4
            carp: INIT vhid 4 advbase 1 advskew 0
    vtnet3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
            inet6 2001:db8:391c:abc::3 prefixlen 64 vhid 6
            carp: INIT vhid 6 advbase 1 advskew 0


the patch below fixes this, copying ipv6 handling from the 'ipalias'
case to the 'carp' case (ifconfig syntax is different for ipv4 and
ipv6).

I also added a 2 second delay before redirecting the client browser to
refresh the status overview, to allow carp to settle. This avoids
display of a transient state (sometimes shows a mix of "BACKUP" and
"MASTER" states).
             
Regards         
Matthias Ferdinand   
#9
Hi,

just a reminder: to activate all library fixes (e.g. openssl) you must reboot the firewall (or manually restart the relevant services) after the upgrade.

To check which processes need restarting:

root@opnsense2:~ # procstat -a -v | awk '$10=="vn" && NF==10 { print $1; }' | sort -u | xargs -r ps -wwd -p
  PID TT  STAT    TIME COMMAND
1417  -  S    0:50.21 /usr/local/sbin/lighttpd -f /var/etc/lighttpd-acme-challenge.conf
3912  -  Is   0:00.04 /usr/local/libexec/ipsec/charon --use-syslog
17403  -  Is   0:05.39 sshd: /usr/local/sbin/sshd [listener] 0 of 10-100 startups (sshd)
45947  -  Is   0:00.33 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
56509  -  Is   0:01.40 nginx: master process /usr/local/sbin/nginx
57741  -  I    0:00.80 - nginx: worker process (nginx)
69408  -  Ss   1:53.87 /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
77425  -  Ss   0:16.05 /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
77917  -  Ss   1:43.32 /usr/local/sbin/haproxy -q -f /usr/local/etc/haproxy.conf -p /var/run/haproxy.pid
93445  -  Ss   1:32.28 php-fpm: master process (/usr/local/etc/php-fpm.conf) (php-fpm)
33559  -  I    0:00.00 - php-fpm: pool webgui (php-fpm)
49267  -  I    0:00.00 - php-fpm: pool www (php-fpm)
56089  -  I    0:00.00 - php-fpm: pool webgui (php-fpm)
90619  -  I    0:00.00 - php-fpm: pool www (php-fpm)
95724  -  I    0:00.00 /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid
48476  -  Ss   4:33.02 - /usr/local/sbin/syslog-ng -f /usr/local/etc/syslog-ng.conf -p /var/run/syslog-ng.pid


#10
(opnsense 20.7.5)

Hi,

tried to set up IPsec parameters better suitable for my old atom netbook
which lacks aes-ni (hardware support for AES). Without AES in hardware,
the best crypto suite for Authenticated Encryption would be
ChaCha20-Poly1305.

It is not available in Openvpn GUI, but I could manually compose a
strongswan connection definition at
    /usr/local/etc/ipsec.opnsense.d/xyz.conf
The GUI shows this connection at VPN / IPsec / Status Overview (nice!)

Establishing an IKE_SA (using AES) works, but setup of CHILD_SA (using
ChaCha20) fails on opnsense with this message:

    algorithm CHACHA20_POLY1305 not supported by kernel!

I found a message from 2015 that HardenedBSD removed ChaCha20:
    https://hardenedbsd.org/article/shawn-webb/2015-02-05/removal-chacha20-import

Anybody know of plans to add it back?


Regards
Matthias
#11
on opnsense 20.7.3, in VPN / IPsec / Tunnel Settings:

using AH instead of ESP leads to a syntax error in
/usr/local/etc/ipsec.conf:

  ah = -modp2048!

the selected hash algorithm is missing. There is a typo in
/usr/local/etc/inc/plugins.inc.d/ipsec.inc, where the DH group config
overwrites the config string instead of appending to it.

patch is attached


Also, for AH connections, the tunnel settings overview displays
encryption settings (not used with AH). Not sure if this is a bug in the
display code (not checking for AH) or if these settings should not be
put into XML config for AH connections.

Best regards
Matthias Ferdinand
#12
on 20.1.8_1

TL;DR:

lines 386 and 387 in system_usermanager.php should be flipped
(create unix system user before changing its group settings)

--------------------

long story:

In System / Access / Users (system_usermanager.php), adding a new user
with activated entries in "Group Memberships" fails to add the
corresponding new unix system user to the respective unix system group
in /etc/group.

To remedy this, you would need to remove group membership in the GUI,
"Save", add group membership again, "Save".

Calls to "local_user_set_groups()" and "local_user_set()" in
system_usermanager.php apparently have the wrong order, trying to modify
group settings for the unix system user before it even exists:

/usr/local/www/system_usermanager.php:
     36 function get_user_privdesc(& $user)
    ...
    386             local_user_set_groups($userent, $pconfig['groups']);
    387             local_user_set($userent);
    388             write_config();

If the user has not existed before, this results in an error message in
/var/log/system.log:

    Jul 22 17:22:42 opnsense1 opnsense: /system_usermanager.php: The command '/usr/sbin/pw 'groupmod' 'admins' -g '1999' -M '0,2000,2006,2007,2010,2011,2012,2014,2015,2016'' returned exit code '67', the output was 'pw: user `2016' does not exist'


In a HA config, this bug is _not_ propagated to slave firewalls, they
will do the right thing:

master, wrong group settings:
    root@opnsense1:~ # grep testuser9 /etc/passwd /etc/group
    /etc/passwd:testuser9:*:2016:65534:testuser9:/home/testuser9:/bin/sh

slave, correct group settings:
    root@opnsense2:~ # grep testuser9 /etc/passwd /etc/group
    /etc/passwd:testuser9:*:2016:0:testuser9:/home/testuser9:/bin/sh
    /etc/group:admins:*:1999:root,mfedv,testuser9

Flipping lines 386 and 387 in system_usermanager.php is sufficient to
get the unix system user added to the right group in /etc/group the
first time.

Note however that in the special case of the "admins" group, the "gid"
field in /etc/passwd still has different values on master node
(nobody=65534) and slave node (wheel=0).

Hitting "Save" again on master node (even with no modifications at all)
then brings the "gid" to 0 on the master node, too.

Not sure though if gid="wheel" is needed/warranted, I have been using
(GUI) group "admins" ssh accounts on cluster nodes with different gid
settings for some time and have not noticed any difference in behaviour.


Regards
Matthias Ferdinand
#13
Hi Ad,

thank you, that was quick, I'm impressed!

Just one thing:
> o remove <kill_states/> from our default config, since it was
> evaluated as empty (feature enabled), we might as well remove the
> option to reach the same effect.
It does indeed achieve the same effect.
I just think this to be an unpleasant and surprising effect to have
enabled by default.

Regards
Matthias
#14
Hi Ad,

> If you open a PR on GitHub we'll take a look.
Alas, I don't have a github account. Tried signing up again to no avail.
([1]).

> We shouldn't try to fix the consumers in this case, since we use the
> same construction on multiple places , but changing the default config
> and changing the initial isset() is fine by me.
"changing the default config": yes, but existing configs (when
containing "<kill_states/>") will need to be modified too, otherwise
existing installations would continue with incorrect behaviour.

"the initial isset()": not sure what this means

And then there is still "Bug 2) flushing state on ruleset update".

Best regards
Matthias Ferdinand

[1]: https://forum.opnsense.org/index.php?topic=16220.msg74460#msg74460
#15
Hi Ad,

thanks for replying.

Yes, exactly: "only evaluated when kill_states is empty".

It is just not the right condition to use here.

Instead it should be evaluated if and only if the attribute does not
exist at all (!array_key_exists()).

Perhaps the actual bug lies in the fact that the default config contains
"<kill_states/>" (variant a) instead of "<kill_states>1</kill_states>"
(variant c).
But with all the existing installations, this cannot be easily rolled
back and inc/filter.inc should be modified to correctly handle variant
a).

In Firewall / Settings / Advanced, variant a) counts as "checked", i.e.
the same as kill_states=1.

In lib/filter.inc the same variant currently counts as "empty", i.e. 
"not checked", so states get flushed when they shouldn't.

Unchecking "kill_states" in Firewall / Settings / Advanced completely
removes the "kill_states" attribute, it does not just leave it empty.
And only with this (attribute-does-not-exist) setting state flushes
should happen.