Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - eugenmayer

#1
data-ciphers-fallback does work, we tried that. And this was the issue here.
The reason 23.1.3 failed initially was, that cipher was replaced using data-ciphers (only), which will not work for 2.3 OpenVPN clients if those exists (only from 2.4+). And additional issue is, that the default allowed cyphers changed with the server 2.5, blocking AES-128-CBC - and this was the other issue why people with 2.4 clients could potentially not connect with the old 23.1.3 (pre-patched;v.

AFAICS even with 23.1.7 ciphers is still used in the server config - we removed that by using 'none' and using data-ciphers instead in the custom section, with a list of cyphers are clients need (and thus a road to upgrade ciphers) - this allows all our clients to connect and would be the proper fix for the variant introduced 21.1.3 (since as stated, ciphers itself is deprecated and will be removed with 2.7 AFAIR)

The other issue i meanioned, that the VPN server is crashing under 2.6.3 is something new and not related - i just was not aware while I was investigating. It is a new issue and related to 2.6.x upgrade with 23.1.7.

I created https://forum.opnsense.org/index.php?topic=34052.0 to separate the issues.



#2
AFAICS this is related to the upgrade to OpenVPN 2.6.3, which is included in 23.1.7 (compared to 2.5.8 in 23.1.6) - the server does crash for us (the entire daemon) when linux clients connect (about 100). 2.5.8 does work just fine combined with 23.1.6

One has to downgrade opnsense to 23.1.6 to use OpenVPN 2.5.8, since it seems like 23.1.7 changed the config for OpenVPN (so it is compatible with 2.6.3?)

I did not find any logs on how the OpenVPN server crashed, even with verb9. The daemon was just killed after a couple of clients connected OR after a specific time (30 seconds).
#3
AFAICS this is related to the upgrade to OpenVPN 2.6.3, which is included in 23.1.7 (compared to 2.5.8 in 23.1.6) - the server does crash for us (the entire daemon) when linux clients connect (about 100). 2.5.8 does work just fine combined with 23.1.6

One has to downgrade opnsense to 23.1.6, since it seems like 23.1.7 changed the config for OpenVPN (so it is compatible with 2.6.3?)
#4
For me, upgrading from 23.1.6 to 23.1.7 broke the OpenVPN authentication again. I have seen that there is a legacy feature maintained for cipher in the patch.

From the logs i would say it is the same / very similar issue.
#5
I have the same issue with the exact same upgrade path.

Since i'am using LDAP as an authentication, it cannot be the local-auth only. Also using Remote Access "User auth". So beside we have the same issue, we have different configurations.

I did tripple check that the LDAP authentication is working under access, also using the test.

Downgraded via

opnsense-revert -r 23.1.1_2 opnsense


and everything is working again.
#6
Also running on Proxmox 5.x, on a lot of boxes.

This is my VM config (maybe this helps):

boot: dcn
bootdisk: scsi0
cores: 2
cpu: host
ide2: none,media=cdrom
memory: 2400
name: gateway
net0: virtio=BYYYYYYYYYYYYYYYYYYY=vmbr30
net1: virtio=ZZZZZZZZZZZZZZZZZZbridge=vmbr0
net2: virtio=BBBBBBBBBBBBBBBBBB,bridge=vmbr0
numa: 0
onboot: 1
ostype: other
protection: 1
scsi0: local:500/vm-500-disk-1.qcow2,size=10G
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=XXXXXXXXXXXXXXXXXXXXXXXXX
sockets: 1
startup: order=1
tablet: 0
vga: serial0


Host ist a: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz

I got this KVM setup on about 7 boxes (on different hardware), never encountered what you experienced. Generally as
#7
We are currently invstigating a crash of opnsense every single week on Sat between 3-6 am.

While doing so, we looked at the health data. While traffic-health report is available for all 3 years, the system metrics are not.

In fact, after a normal reboot of the box, all system rrd data is lost, the last data available then is older then Aug 2018 - so it seem to have worked back in the days but since then every reboot of the box deletes the system rrd data - while traffic / packets are still there.

I double checked that the 2 checkboxes to

/var RAM disk    Use memory file system for /var
/tmp RAM disk    Use memory file system for /tmp

are not checked - so i have no RAM disk for rrd data.

Any idea how this happens? Interesting fact is, that Aug 06 2018 was the release date auf 18.7 - and i most probably instantly upgraded. So that upgrade could have caused this ( https://opnsense.org/opnsense-18-7-released/ ) ..

I am currently on 19.1.2, cannot go higher due to the LDAP bugs introduced in 19.1.3+

Thanks for any hints
#8
Maybe some stats, i run an external uptime tracker so i have at least some timings of when the boxes are going down for now - at least fully for the AWS box ( see screenshot )

It seems like the "every 2 weeks is not perfectly right, seems like it was ok for e.g. nearly 1 month now, then crashed.

The pattern for the KVM boxes is 100% predictable though, every week, on saturday.

---

Also something interesting, out of those 5 KVM boxes, only 2 run HAproxy - those 2 which are crashing. Also i migrated away from HAproxy on the other 3 and it seems like this might be the reason they stopped crashing.

The AWS box has HAproxy too - also crashing.

---

Could that be HAproxy related or maybe something with the ACME plugin which runs a companion there? Not sure, do not want to misguide, but it seems like an interesting pattern here.

- when do the ACME task run usually? ( the one in cron are rather daily )
- are there any HAproxy related tasks?
#9
Hello,

fighting this for some time already now and i am really out of ideas.

Setup
- I have about 5 KVM based OPNsense boxes, 1 AWS and 2 apu2c2 boxes. (18.7.8)
- Those 5 KVM boxes are basically identical, running: DHCP, Unbound, OpenVPN, Tinc, HAproxy, ACME (18.7.8)
- 1 AWS is running DHCP, Unbound, OpenVPN, HAproxy, ACME, webproxy (18.1 latest)

Problem
2 of those 5 AWS keep stalling on Saturday every single week ( for 5 and more weeks no). Right now its always the same boxes, it used to be randomly for those 5.

The AWS box seems to stall every week, also Saturday.

What i mean by "stall":
it seems some traffic is still passing through the OPNsense box it looks like NAT is still working as also stateful connections. It seems like the boxes behind OPNsense though cannot access WAN anylonger (outbound issue?)

Also i cannot connect using SSH or terminal, in both cases i can enter the user, but then instead of asking for the password - it just "hangs" there.

What i deducted
For several weeks now, after i detected that the auto-upgrade did not work and they are stuck at 18.7.4, i upgraded them to 18.7.7 ( then .8 ). Now always the same get stuck. I suspected that it is the upgrade so i deactivated the upgrade cron tasks - but this week no update was available, still those 2 stalled and the AWS box.

I also suspected the KVM boxes to "stall" on proxmox backups, i disabled them but that did not help either. Also since the AWS box is not backup using that at all, i expect that was not the right assumption anyway.

Also, 18.1 and 18.7 boxes are affected by this  - host on totally different hypervisors (AWS/kvm proxmox).

While the KVM boxes have about a every similar duty, the AWS box is rather different, still affected.


Help
Could anyway help me getting to the bottom of this - this becomes a real blocker for me in a sense that i might also consider to migrate away if i cannot solve this at all at some point.

If i can get any logs or can let the boxes log additional things while stale out, let me know. Maybe some rrd graph could be interesting or whatever, let me know. Thanks!
#10
for anybody running into that, use "typetransparent" instead of "transparent" in unbound
#11
thank you @fabian. Not sure what you refer to in the commit .. those things? https://github.com/opnsense/core/pull/2097/files#diff-a89985242e1eea6a91d3e103e3353d5cR584 .. Thanks
#12
we added our 50Euro thank you donation. Thank you for the hard work
#13
I am using a public TLD for which i use the private-domain flag in unbound and also a domain override.

So lets assume it company.com - i use the namespace <namspace>.company.com as a internal domain, so internal.company.com. (Domain override in unbound).

The problem now is, that i am using a tool form ACME DNS-01 challenges which will do a dns lookup on the default DNS server ( OPNsense in this question ) searching for a NS record ( primary nameserver for company.com ) like

dig mysub1.internal.company.com NS

during the challenge. If it finds a NS record, it will poll the primary server for a TXT record created durin DNS-01- if it does not find a NS server it will fail.

Apperently with OPNsense + unbound + domain override that NS responses are all empty. I ask myself how could i potentially fix that.

So

dig mysub1.internal.company.com NS

and

dig internal.company.com NS are emt

are empty, since the domain override is on internal.company.com

dig company.com NS

will return the problem primary NS (public server)

Any hints on how to solve this?
#14
Ich habe 2 APU2C4 Boxen, diese laufen seit etwa 3 Monaten ( oder mehr? ) komplett fehlerfrei. Tinc drauf und einige andere Sachen, 2-5 Leute sind jederzeit dahinter aktiv ( durchaus auch viel Traffic ).

Also mit dem APU2C4 sollte das Problem nicht direkt bestehen - habe das hier als Appliance damals bestellt -
http://varia-store.com/Ready-Systems/OPNSense/Varia-Bundles/OPNsense-ready-system-with-APU2C4-board-4GB-RAM-black::29139.html

sollte aber genau dein Board sein.

Fahr die neuste Firmware.
#15
Ich habe OPNsense auf Proxmox 5 laufen, hier nun aber vielmehr hinter einem DSL Modem im PPPOE Passthrough mode (und das funktioniert)

Als ich noch bei Unitymedia war (gleiches Setup) habe ich per-se kein Bridged Modus hinbekommen, egal mit welchem Modem ( 3 probiert ) - war auch der Grund meiner Kündigung ( von dem "am Abend haben sie dann nur noch 1Mbit" Themen mal abgesehen ). Ich glaube im Moment funktioniert das einfach nicht, weil die dir erst gar keine native ipv4 mehr geben. Du musst opnsense potentiell mit ipv6 konfigurieren oder welche Stunts auch immer machen - unfassbar intransparent und nicht die Mühe Wert.