unstable on proxmox ?

Started by JL, September 27, 2020, 07:10:21 PM

Previous topic - Next topic
September 27, 2020, 07:10:21 PM Last Edit: September 28, 2020, 11:46:59 PM by J. Lambrecht
Dear,

Using opnsense since release 17 or so i find it unstable to work with on Proxmox VE 6.2

the disk i/o is troublesome to the point only selecting IDE with SSD emulation appears to work well (for speed), choosing a differen kind of controller results in a lot of swap fail notification.

on shutdown there are a plethora of errors thrown which appear low level, regardless of the controller chosen

in all i don't feel like 20.7 is as production ready as one typically assumes

memory consumption appears quite high out of the box, the VM has 2.5GB of ram and frequently starts complaining it is out of swap space shutting down multiple services without warning

Hi J. Lambrecht,

Well, it's hard to say what's wrong here.
First, OPNsense runs very stable on Proxmox, at least that is my experience with it, in fact there should be no noticeable difference or very little at the most, I even run my Proxmox itself behind the OPNsense VM, it runs very stable and predictable.
Second, unless there is something wrong with your hardware or Proxmox setup you should be able to choose 'SCSI/VirtIO SCSI' even with 'SSD emulation' if you like.
Third, don't say there is 'a plethora of errors', show something for the kind people on this forum :) to work with, we don't have Crystal Balls...
Fourth, you really need to tell us more about your setup, there could be reasons why the system is swapping or uses lots of memory, though, I don't see this on mine it's hard to compare, it all depends on your setup...

Greetings, mark

September 28, 2020, 11:46:42 PM #2 Last Edit: September 28, 2020, 11:49:16 PM by J. Lambrecht
Hey Mark,

This time i got lucky, so to speak. The opnsense VM went all goobly goo again.

The IDS service crashed and rebooting showed a massive amount of errors and flaws. The fw had been running peachy for hours upto the mistake of assigning an invalid ip as dns server in a dhcp scope.

It is the only change i can think of that happened at the time. The console was again filed with swap fail messages.What happened hours before is i had


1) enabled the 2GB swap space flag to make sure i  would not have any memory issues. The VM has 2.5GB of ram to run dhcpd, suricate, ntpd, unbound which i think should be adequate. Since the services only appear to crash on memory depletion enable swap seemed to be a good idea.

2) set the VM to run with SEABIOS and 440fx (i just noticed it had QXL set as displa which i don't think is sensible but it I have now powered off the opnsense VM and assigned virtio/scsi single and have set display to standard vga.

If anything goes wrong again it will take more hours for this to happen. What i do notice is during this time the memory consumption soars from around 800MB to 2.1GB and more.



Still, 'show the errors', it's impossible to say anything about them this way. For now OPNsense isn't to blame and they're all user faults, until you prove otherwise with some evidence....

Please, show the 'options list'(Proxmox) you used to install OPNsense.

If you have more memory to spare, give OPNsense more!

I'm using Suricata with some 20000+ rules, some blocking most in alert mode, including all services you mention +OVPN, system mostly using less than 1G RAM.
Are you sure you need all the rule-sets you(probably) enabled?, more than 2G RAM use sounds as quite a lot for the enabled services + Suricata you mention...

My advice is, start with a basic OPNsense and watch the used resources enabling services one by one.
Again, Show the options(list) you used to create your OPNsense VM and the errors you encountered... ;)

Just to add, I've been running opnsense under proxmox with no issue for years.
At startup or on suracata reloads, I frequently see memory spike past 3GB.

YMMV

Just another anecdotal report. It's running fine under proxmox for me as well. I think it's been almost 2 years now running it under proxmox with absolutely no stability issues at all. The only problem I had was a conflict with FreeBSD and proxmox running as QXL and passing through a network card. Ended up going back to 440fx to solve that. That issue affected all my FreeBSD VMs with proxmox, not just opnsense.

BTW, that issue was a failure to boot at all, not a stability issue.

October 02, 2020, 10:12:53 PM #6 Last Edit: October 02, 2020, 10:26:57 PM by J. Lambrecht
Think i cracked the problem.

core issue


1) dhcp scope did have a gateway set but not
2) manually setting dhcp option 3 to type IP and the ip address for the LAN interface appears to work

depending issues

1) IDS crash on rule update fail = to all appearances, is fixed now
2) unbound flapping = improvement, not fixed



October 07, 2020, 06:00:51 PM #7 Last Edit: October 07, 2020, 06:07:35 PM by J. Lambrecht
Quote from: J. Lambrecht on October 02, 2020, 10:12:53 PM
Think i cracked the problem.

core issue


1) dhcp scope did have a gateway set but not a router
2) manually setting dhcp option 3 to type IP and the ip address for the LAN interface appears to work

depending issues

1) IDS crash on rule update fail = to all appearances, is fixed now (crash because of DNS fail !)
2) unbound flapping = improvement, not fixed

This approach did indeed remediate all issues mentioned.

Unbound remains flakey due to some configuration glitch between proxmox and unbound wrt route preferences.


Quote from: J. Lambrecht on September 28, 2020, 11:46:42 PM
Hey Mark,

This time i got lucky, so to speak. The opnsense VM went all goobly goo again.

The IDS service crashed and rebooting showed a massive amount of errors and flaws. The fw had been running peachy for hours upto the mistake of assigning an invalid ip as dns server in a dhcp scope.

It is the only change i can think of that happened at the time. The console was again filed with swap fail messages.What happened hours before is i had


1) enabled the 2GB swap space flag to make sure i  would not have any memory issues. The VM has 2.5GB of ram to run dhcpd, suricate, ntpd, unbound which i think should be adequate. Since the services only appear to crash on memory depletion enable swap seemed to be a good idea.

2) set the VM to run with SEABIOS and 440fx (i just noticed it had QXL set as displa which i don't think is sensible but it I have now powered off the opnsense VM and assigned virtio/scsi single and have set display to standard vga.

If anything goes wrong again it will take more hours for this to happen. What i do notice is during this time the memory consumption soars from around 800MB to 2.1GB and more.



You really run IPS with only 2.5GB RAM?? 4GB is min to run it in production ...

i too have problems with OPNSense on proxmox, although different ones ;-)

For me, whenever i decide to torrent some new linux DVD's, the connectiopn to the router seems to drop, and i can only get back on the internet when i reboot the OPNSense VM.. not sure if its Proxmox or OPNSense related just yet ;-)

Quote from: Jhjacobs81 on October 12, 2020, 10:57:06 AM
i too have problems with OPNSense on proxmox, although different ones ;-)

For me, whenever i decide to torrent some new linux DVD's, the connectiopn to the router seems to drop, and i can only get back on the internet when i reboot the OPNSense VM.. not sure if its Proxmox or OPNSense related just yet ;-)

Or a drop rule for torrent?

But that shouldnt drop the whole internet connection, right?

Yep, but there are too many details missing so we can just guess.
First I would stop torrent and see if it works again after couple of minutes, if not, go to console of proxmox and check the logs on console or system.log

Experiencing the same since two weeks.
Had 2 crashes tonight under load. I think I saw a reference to HUADVS or something while rebooting. It mentioned 5 crashes. It didn't log to my syslog server however.

Quote from: cloudz on December 09, 2020, 10:54:55 PM
Experiencing the same since two weeks.
Had 2 crashes tonight under load. I think I saw a reference to HUADVS or something while rebooting. It mentioned 5 crashes. It didn't log to my syslog server however.

also too many details missing ... :/