These days, there are many folks who use OpnSense under a virtualisation host, like Proxmox, for example.
This configuration has its own pitfalls, therefore I wanted to have this guide. The first part starts with common settings needed, the second part will deal with a setup where the virtualisation host is to be deployed remotely (e.g. in a datacenter) and holds other VMs besides OpnSense.
RAM, CPU and systemUse at least 8 GByte, better 16 GBytes of RAM and do not enable ballooning. Although OpnSense does not need that much RAM, it can be beneficial in case you put /var/log in RAM (see below).
Obviously, you should use "host" CPU type in order not to sacrifice performance by emulation. However, you should not install the microcode update packages in OpnSense - they would be useless anyway. Instead, install the appropriate microcode packages on the virtualisation host.
That being said, just for good measure, set tuneables "hw.ibrs_disable=1" and "vm.pmap.pti=0". This wil avoid performance bottlenecks because of Spectre and Meltdown mitigations. I trust the other VMs in my setup, but YMMV...
The system architecture is arbitrary, as OpnSense can boot both in legacy (BIOS) or UEFI mode.
Filesystem peculiaritiesFirst off, when you create an OpnSense VM, what should you choose as file system? If you have Proxmox, it will likely use ZFS, so you need to choose between UFS and ZFS for OpnSense itself. Although it is often said that ZFS underneath ZFS is a little more overhead, I would use it regardless, just because UFS fails more often. Also, OpnSense does not stress the filesystem much, anyway (
unless you use excessive logging, RRD or Netflow).
32 GBytes is a minimum I would recommend for disk size. It may be difficult to increase the size later on.
After a while, you will notice, that the space you have allocated for the OpnSense disk will grow to use 100%, despite that within OpnSense, the disk may be mostly unused. That is a side-effect of the copy-on-write feature of ZFS: writing logs and RRD data and other statistics always writes new data and the old data does not get dismissed against the underlying (virtual) block device.
That is, if the ZFS "autotrim" feature is not set manually. You can either set this via the OpnSense CLI with "zpool set autotrim=on zroot" or, better, add a daily cron job to to this (System: Settings: Cron) with "zroot" as parameter.
You can trim your zpool once via CLI with "zpool trim zroot".
That being said, you should always avoid to fill up the space for the disk by having verbose logging. If you do not need to keep your logs, you can also put them on a RAM disk (System: Settings: Miscellaneous).
Network "hardware"With modern FreeBSD, there should not be any more discussion about pass-through vs. emulated VTNET adapters: the latter are often faster. This is because Linux drivers are often more optimized than the FreeBSD ones. There are exceptions to the rule, but not many.
In some situations, you basically have no choice than to use vtnet anyway, e.g.:
- If FreeBSD has no driver for your NIC hardware
- If the adapter must be bridged, e.g. in a datacenter with a single NIC machine
Also, some FreeBSD drivers are known to have caused problems in the past, e.g. for RealTek NICs. By using vtnet, you rely on the often better Linux drivers for such chips.
With vtnet, you should make sure that hardware checksumming is off ("hw.vtnet.csum_disable=1", which is the default on new OpnSense installations anyway because of a FreeBSD interoperability bug with KVM (https://forum.opnsense.org/index.php?msg=216918)). Note, however, that this setting will be slower than using hardware offloading (https://forum.opnsense.org/index.php?topic=45870.0), which you will notice at very high speeds, especially on weak hardware.
You can also enable multiqueue on the VM NIC interfaces, especially, if you have multiple threads active. There is no need for enabling this in OpnSense.
For some Broadcom adapters (and possibly other, too), it is neccessary to disable GRO by using:
iface enp2s0f0np0 inet manual
up ethtool --offload $IFACE generic-receive-offload off
See: https://forum.opnsense.org/index.php?msg=233131, https://help.ovhcloud.com/csm/en-dedicated-servers-proxmox-network-troubleshoot?id=kb_article_view&sysparm_article=KB0066095 and https://www.thomas-krenn.com/de/wiki/Broadcom_P2100G_schlechte_Netzwerk_Performance_innerhalb_Docker.
When you use bridging with vtnet, there is a known Linux bug with IPv6 multicasting (https://forum.proxmox.com/threads/ipv6-neighbor-solicitation-not-forwarded-to-vm.96758/), that breaks IPv6 after a few minutes. It can be avoided by disabling multicast snooping in /etc/network/interfaces of the Proxmox host like:
auto vmbr0
iface vmbr0 inet manual
bridge-ports eth0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
bridge-mcsnoop 0
If you plan to enlarge your MTU size (https://forum.opnsense.org/index.php?topic=45658) on VirtIO network interfaces, note that you must do so on the Proxmox bridge device first.
Also, you probably should disable the firewall checkbox for the network interfaces in the OpnSense VM.
Guest utilitiesIn order to be able to control and monitor OpnSense from the VM host, you can install the
os-qemu-guest-agent plugin.
Problems with rolling backOne of the main advantages of using a virtualisation platform is that you can roll back your installation.
There are two problems with this:
1. DHCP leases that have been handed since the time of last roll back are still known to the client devices, but not to the OpnSense VM. Usually, this will not cause IP conflicts, but DNS for affected devices may be off intermediately.
2. If you switch back and forth, you can cause problems with backups done via os-backup-git. This plugin keeps track on both the OpnSense VM and the backup repository. If both are of a different opinion about the correct revision of the backup, subsequent backups will fail. Basically, you will ned to setup the backup again with a new, empty repository.
If you want to avoid such problems, you can roll back single packages with opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert).
TL;DR- Have at least 8 GByte of RAM, non-balooning
- Use "host" type CPU and disable Spectre and Meltdown mitigations
- Use ZFS, dummy
- Keep 20% free space
- Add a trim job to your zpool
- Use vtnet, unless you have a good reason not to
- Check if hardware checksumming is off on OpnSense
- Disable multicast snooping and Proxmox firewall
- Install os-qemu-guest-agent plugin
That is all for now, recommendations welcome!
Caveat, emptor: This is unfinished!Setup for OpnSense and Proxmox for a datacenterA frequently used variant is to work with two bridges on Proxmox:
- vmrb0 as a bridge to which Proxmox itself, OpnSense WAN interface and VMs with a separate IP can connect (even if you don't use it)
- vmbr1 as a LAN or separated VLANs from which all VMs, OpnSense and Proxmox can be managed via VPN
That means you probably need two IPv4s for this setup. You should also get at least a /56 IPv6 prefix, which you need for SLAAC on up to 256 different subnets.
While it is possible to have just one IPv4 for both OpnSense and Proxmox, I would advise against it. You would have to use a port-forward on Proxmox, which results in an RFC1918 WAN IPv4 for OpnSense, which in turn has implications on NAT reflection that you would not want to deal with.
The configuration then looks something like this:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
iface lo inet6 loopback
auto eth0
iface eth0 inet manual
iface eth0 inet6 manual
auto vmbr0
iface vmbr0 inet static
address x.y.z.86/32
gateway x.y.z.65
bridge-ports eth0
bridge-stp off
bridge-fd 0
bridge-mcsnoop 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp
post-up echo 1 > /proc/sys/net/ipv6/conf/eth0/forwarding
#up ip route add x.y.z.76/32 dev vmbr0
#up ip route add x.y.z.77/32 dev vmbr0
#Proxmox WAN Bridge
iface vmbr0 inet6 static
address 2a01:x:y:z:5423::15/80
address 2a01:x:y:z:87::2/80
address 2a01:x:y:z:88::2/80
address 2a01:x:y:z:89::2/80
address 2a01:x:y:z:172::2/80
gateway fe80::1
post-up ip -6 route add 2a01:x:y:f600::/64 via 2a01:x:y:z:172::1
auto vmbr1
iface vmbr1 inet static
address 192.168.123.2/24
bridge-ports none
bridge-stp off
bridge-fd 0
bridge-mcsnoop 0
post-up ip route add 192.168.0.0/16 via 192.168.123.1 dev vmbr1
#LAN bridge
iface vmbr1 inet6 static
source /etc/network/interfaces.d/*
This includes:
x.y.z.86 main Proxmox IP with x.y.z.65 as gateway (note that this is pointopoint!),
x.y.z.87 WAN IPv4 of OpnSense,
x.y.z.88 and x.y.z.z.89 additional IPs on vmbr0. These use x.y.z.86 as gateway so that your MAC is not visible to the ISP. Hetzner, for example, would need virtual MACs for this.
192.168.123.2 is the LAN IP for Proxmox so that it can be reached via VPN. The route is set so that the VPN responses are also routed via OpnSense and not to the default gateway.
IPv6 is a little more complex:
2a01:x:y:z:: is a /64 prefix that you can get from your ISP, for example. It is further subdivided with /80 to:
2a01:x:y:z:1234::/80 for vmbr0 with 2a01:x:y:z:1234::15/128 as external IPv6 for Proxmox.
2a01:x:y:z:172::15/128 as point-to-point IPv6 in vmbr1 for the OpnSense WAN with 2a01:x:y:z:172::1/128.
2a01:x:y:z:124::/80 as a subnet for vmbr1, namely as an IPv6 LAN for the OpnSense.
The OpnSense thus manages your LAN with 192.168.123.1/24 and can do DHCPv4 there. It is the gateway and DNS server and does NAT to the Internet via its WAN address x.y.z.87. It can also serve as a gateway for IPv4 with the IPv6 2a01:x:y:z:123::1/64.
VMs would have to get a static IPv6 or be served via SLAAC. That only works with a whole /64 subnet. The prefix, 2a01:x:y:rr00::/56, is used for this, which can then be split into individual /64 prefixes on the OpnSense and distributed to the LAN(s) via SLAAC (e.g. Hetzner offers something like this for a one-off fee of €15).
You can use the additional IPs, but you don't have to. These "directly connected" VMs could, for example, also use IPv6 in 2a01:x:y:rr00::/64.
Some more points1. You can/should close the Proxmox ports, at least for IPv4, of course, but you can still keep them accessible via IPv6. This means you can access the Proxmox even without OpnSense running. There is hardly any risk if nobody knows the external IPv6, as port scans in IPv6 hardly seem to make sense. But be careful: entries in the DNS could be visible and every ACME certificate is exposed, so if you do, only use wildcards!
2. I would also set up a client VM that is located exclusively in the LAN and has a graphical interface and a browser and that is always running. As long as the Proxmox works and its GUI is accessible via port 8006, you have a VM with LAN access and a browser. This also applies if the OpnSense is messed up and no VPN is currently working. The call chain is then: Browser -> https://[2a01:x:y:z:1234::15]:8006, there is a console to the client VM, there access https://192.168.123.1/ (OpnSense LAN IP) with the browser.
3. Be careful with (asymmetric) routes! Proxmox, for example, has several interfaces, so it is important to set the routes correctly if necessary. Note that I have not set an address for IPv6 on vmbr1 because it is actually only intended to be used for access via VPN over the LAN. However, if the OpnSense makes router advertisements on the LAN interface, you quickly have an alternative route for Proxmox...
4. You can use fe80::1/64 as virtual IPs for any (V)LAN interface on OpnSense. That way, you can set fe80::1 as IPv6 gateway for the VMs.
5. The WAN gateway is pointopoint, which means the netmask of the WAN interface is /32, thus any traffic for IPs within your subnet, which your ISP may specify otherwise (like /26) will still go over the gateway. This is sometimes neccessary, because of isolation between clients on the network layer. It would help if your fellow ISP neighbors did the same, because otherwise, you could not reach those IPs.
6. Since some ISPs (including Hetzner) still use layer 2 switches in their datacenters, you could see unexpected traffic on your WAN interface for IPs in your subnet, that you do not own. OpnSense is able to filter such traffic, but you should look to filter that at the ISPs firewall, beforehand.
VPNIt is up to your preference on which VPN you should use to access the LAN or VLANs behind your OpnSense. I use Wireguard site to site.
There are tutorials on how to do this, but as an outline:
- Choose a port to make the connection and open it.
- Set up the Wireguard instance to listen on that port.
- Connect a peer by settings the secrets.
- Allow the VPN traffic (but wisely!)
- Check the routes if you cannot reach the other side.
VLAN setupIn order to isolate traffic between the VMs, you can also choose to have vmbr1 to be VLAN-aware. In that case, you will have to assign each VM a separate VLAN, define VLAN interfaces on OpnSense and break up small portions of the RFC1918 LAN network to use at leat 2 IPv4s for OpnSense and the specific VM.
You can do the same with your IPv6 range, because you have 256 IPv6 prefixes - so each VM can have its own /64 range and could even use IPv6 privacy extensions.
Since OpnSense is the main router for anything, you will still be able to access each VM via the VPN by using rules for the surrounding RFC1918 network.
Reverse proxiesIf you want to make use of your OpnSense's capabilities, you will have to place your VMs behind it, anyway. If you are like me, and want to save on cost for additiional IPv4s, you can make use of a reverse proxy.
On HAProxy vs. Caddy (there is a discussion about this starting here (https://forum.opnsense.org/index.php?topic=38714.msg217354#msg217354)):
QuoteToday I took the opportunity to try out Caddy reverse proxy instead of HAproxy, mostly because of a very specific problem with HAproxy...
I must say I reverted after trying it thoroughly. My 2cents on this are as follows:
- Caddy is suited to home setups and inexperienced users. HAproxy is much more complex.
- For example, the certificate setup is much easier, because you just have to specify the domain and it just works (tm).
- However, if you have more than just one domain, Caddy setup gets a little tedious:
* you have to create one domain/certificate plus a http backend for any domain, which includes creating different ones for www.domain.de and domain.de. You cannot combine certificates for multiple domains unless they are subdomains.
* You do not have much control over what type of certificate(s) are created - you cannot specifiy strength or ECC vs. RSA (much less both) and I have not found a means to control if ZeroSSL vs. LetsEncrypt is used.
* The ciphers being employed cannot be controlled easily - or, for TLS 1.3, at all. That results in an ssllabs.com score which is suboptimal, because 128bit ciphers are allowed. This cannot be changed because of Go limitations.
* You cannot use more than one type of DNS-01 verification if you use wildcard domains.
* The Auto HTTPS feature looks nice first, but indeed it uses a 308 instead of a 301 code, which breaks some monitoring and can only be modified via custom include files.
So, if you just want to reverse-proxy some services in your home network, go with Caddy. For an OpnSense guarding your internet site with several services/domains, stay with HAproxy.
There are nice tutorials for both HAproxy (https://forum.opnsense.org/index.php?topic=23339.0) and Caddy (https://forum.opnsense.org/index.php?topic=38714.0), so use them for reference.
A few words on securityWeb applications are inherently unsafe - even more so when they handle infrastructure, like is the case with both Proxmox and OpnSense. If you expose their interfaces on the open internet, even with 2FA enabled, you are waiting for an accident to happen.
Basically, you have these choices to protect the web interfaces:
a. Change default ports
b. Use a VPN
c. Hide behind an non-exposed DNS name (either via IPv6 only or via a reverse proxy)
Variant a. is becoming more and more useless: I had around 30000 invalid login attempts on a non-default SSH port in just a month!
While I always recommend variant b., you will have to rely on a working OpnSense to do it. That is why I have a hot standby available, that can be booted instead of the normal OpnSense instance in case I bork its configuration.
But even for that you need access to your Proxmox and how do you get that without a working OpnSense?
The answer cannot be a reverse proxy either, because that will also run on your OpnSense.
That is why I recommend using an IPv6-only fallback. This is possible, because an interface can have more than one IPv6 address, so you can use a separate address just for specific services like SSH.
If you have a /56 or /64 IPv6 prefix, the number of potential IPs is so huge that port scanning is infeasible. However, there are some pitfalls to this:
1. You must use a really random address, not one that could be guessed easily.
2. Beware of outbound connections via IPv6: Usually, they will give away your IPv6 -
unless you use IPv6 privacy extensions (see below).
3. If you want to make that address more easy to remember for yourself, you can use a DNS entry, but check if zone-transfers of your domain are really disabled and do not use guessable names like "pve.yourdomain.com", "opnsense.yourdomain.com" or "proxmox.yourdomain.com".
4. Also, keep in mind, that if you issue certificates on that domain name, almost EVERY certificate gets published, because of certificate transparency (https://certificate.transparency.dev/). So, use wildcard certificates!
You can do likewise for your VMs:
- For LXC containers, the network configuration is kept in /etc/network/interfaces, but it gets re-created from the LXC definition. Alas, you can only set one IPv6 (or use DHCPv6 or SLAAC). That is no problem if the container is behind OpnSense using a reverse proxy, via IPv4 only, since then, the container's IPv6 can get used for SSH only, if you configure OpnSense to let it through. For IPv6 privacy, add this to /etc/sysctl.conf:
net.ipv6.conf.eth0.autoconf=1
net.ipv6.conf.eth0.accept_ra=1
net.ipv6.conf.all.use_tempaddr=2
net.ipv6.conf.default.use_tempaddr=2
net.ipv6.conf.eth0.use_tempaddr=2
- For Linux VMs with old-style configuration, you can change /etc/network/interfaces. For new-style configurations using cloudinit with netplan, you can create an override for /etc/netplan/50-cloud-init.yaml, like /etc/netplan/61-ipv6-privacy with this content (using SLAAC / radvd):
network:
version: 2
renderer: networkd
ethernets:
eth0:
accept-ra: true
ipv6-privacy: true
By using /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with this content: "network: {config: disabled}", you can also disable overwriting the network configuration via cloudinit altogether and configure netplan yourself.
Many thanks for these "best practices".
I plan to deploy 2nd OPNsense on Proxmox it will be helpful.
Regards,
S.
I could have used this a few weeks ago. ;)
I'm a bit surprised by the ZFS on ZFS recommendation, as well as the one regarding passthrough vs bridges.
They seem to go against other recommendations I had found at the time (home network guy?).
At least I can test the 2nd one. I guess I'll learn how to move my configuration to another VM/host in the process...
How about a paragraph on firewalls (Proxmox's and OPNsense's) and potential conflicts between the two?
Thanks!
Since OpnSense does write only a few logfiles (and even that should be reduced to a minimum anyway to avoid running out of space), the performance impact on ZFS under ZFS is neglegible. Most of it comes from double compression, which is ineffective and could be disabled as well on OpnSense.
Most recommendations on NIC passthrough come from the past, vtnet is much better these days. You might get a performance benefit on >= 10GBit hardware - that is, IFF it is supported under FreeBSD. Some people have to resort to running under Proxmox because their NICs are badly supported (or not at all).
There are lots of recommendations that were valid in the past, like "do not mix tagged and untagged VLANs" - I had no problems with that, whatsoever.
There are no conflicts with the PVE firewall unless you enable it in the datacenter and for the OpnSense VM. BTW: the initial default is off for the datacenter. If you need it for other VMs (and why should you, as they are probably behind your OpnSense anyway?) or for the PVE host itself, you should disable it for your OpnSense VM - but that goes without saying.
The real impact of using vtnet is mostly limited to the IPv6 multicast and the hardware offloading problems.
An Idea here, maybe its stupid maybe not but...
What if this is included into the Official OPNsense docs?
Currently the docs do not have any Guide how to deploy OPNsense into Proxmox. Its easy to spin off OPNsense in Proxmox but "best practices" are another thing.
Would it be beneficial for the people to have something like that in the Official docs?
Regards,
S.
Quote from: Seimus on November 24, 2024, 05:35:43 PM
An Idea here, maybe its stupid maybe not but...
What if this is included into the Official OPNsense docs?
Currently the docs do not have any Guide how to deploy OPNsense into Proxmox. Its easy to spin off OPNsense in Proxmox but "best practices" are another thing.
Would it be beneficial for the people to have something like that in the Official docs?
Regards,
S.
This is well above the know how of most people. Doubt many people run a datacenter-level opnsense with the VMs on the same server at home to this degree.
Good dive though, much appreciated 👌 Now I have to rebuild everything... again 😒
A few questions...
1) Do you enable the Spectre option for Intel or AMD cpus in Proxmox VM definition?
2) Do you activate AES for HW acceleration in Proxmox VM definition?
3) Host CPU type? Where is this located?
4) If I choose ZFS for OPNsense VM should I define 2 disks for resiliency in Proxmox VM definition?
1. As explained here (https://docs.opnsense.org/troubleshooting/hardening.html), there are two settings:
PTI is something that can only be done on the host anyway. Whether you enable IBRS depends on if you expect your other VMs to try to attack your OpnSense. In other words: Do you use virtualisation to separate VMs like in a datacenter or do you want to use your hardware for other things in your homelab? Since there is a huge performance penalty, I would not use that mitigation in a homelab. In a datacenter, I would not virtualize OpnSense anyway, so no, I would not use those mitigations.
2. Sure. That goes without saying, because "host" CPU type does that anyway.
3. CPU host type - see attachment.
4. No. ZFS features like RAID-Z1 can only effectively be used on the VM host. If the host storage fails, having two separate disk files does not actually help. ZFS is, like I descibe, only to have snapshots within the OpnSense itself. You can use ZFS snapshots on the Proxmox host instead, but I still would not trust UFS under FreeBSD anyway, so the choice is purely for filesystem stability reasons. That does not get any better by using mirroring.
Thank you for the great guide, and explanation of settings!
I am one of those strange people with Proxmox running OPNsense in a DC. I currently don't have the rack space, or the budget to get a dedicated device for OPNsense, but that is on the list of things to do. I have been having some intermittent issues with my VMs and will try this and see if it helps.
I do have one question however. When doing some research I ended up looking at Multiqueue, what that is and if it may help. Networking is admittedly my weakest aspect in computers (well networking other then layer 1, I do hardware all day), as I understand it when using VirtIO (same as vtnet correct?) it only supports one RX/TX so the guest can only receive or send 1 packet at a time (over simplified trying to keep it short and concise). Now with modern hardware NICs can essentially make a packet queue for each CPU core (or Vcore). Will setting a Multiqueue value in Proxmox have any benefit? if yes I would assume it should be set to the number of cores the OPNsense VM has?
Thank you again for the great guide!
There is an explanation of this here: https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_network_device
Short answer: It enables multiple queues for a networks card that are distributed over multiple CPU threads, which can have its benefits if you have high loads induced by a big number of clients. AFAIU, you will have to enable this in the OpnSense VM guest, too. I never tried it and probably, YMMV depending on actual hardware (and also on driver support for vtnet in OpnSense).
Note that when you change network settings in Proxmox while OpnSense is running, your connection drops and may need a reboot to get back online.
Great article.
I was curious; In my VM under Proxmox, I have 32GB RAM [Ballooning off] and in Proxmox it shows 31/32 RAM Used in RED but in OPNSense GUI shows 1.4% 900M/3200M. Is this a concern or just Proxmox not registering it correctly?
Take a look at "top" in your OpnSense VM - you will find that ~95% of memory is "wired" by FreeBSD. Part of this is that all free memory is used for ARC cache. Proxmox shows this all as used memory.
Excellent write-up, thank you.
One question, and a possible suggestion:
For a home user who already has a firewall appliance and wants to add a Proxmox node for app hosting, is there a need to virtualize OPNsense (besides having a convenient backup for the main router)? Does it avoid VM/CT traffic having to traverse the network for inter-VLAN routing?
Regarding ZFS-on-ZFS, it seems that ZFS sync is the prominent contributor to write amplification and SSD wear without a dedicated SLOG device (source: https://www.youtube.com/watch?v=V7V3kmJDHTA). Assuming the host is already protected with backup power and good ZFS hygiene, might it make sense to disable ZFS sync on the guest?
I am not promoting use of virtualised OpnSense at all, even less so for situations where a physical firewall is possible. I only use virtualised setups on cloud based setups to save a second physical instance.
That being said, I can understand when someone says they already have a Proxmox instance and want to run OpnSense on that to save power.
As for ZFS: OpnSense does not produce that high of a write load that I think this would matter, but YMMV. When I use SSDs on a Proxmox host, I know I must use enterprise-grade SSDs anyway, regardless of the type of guest.
Quote from: OPNenthu on February 09, 2025, 07:36:57 PMwrite amplification and SSD wear without a dedicated SLOG device (source: https://www.youtube.com/watch?v=V7V3kmJDHTA)
This statement is just plain wrong.
An SLOG vdev
- is not a write cache
- will not reduce a single write operation to the data vdevs
- is in normal operation only ever written to and never read
Normal ZFS operation is sync for metadata and async for data. Async meaning collected in a transaction group in memory which is flushed to disk every 5 seconds.
Kind regards,
Patrick
Thank you for the correction-
Quote from: Patrick M. Hausen on February 10, 2025, 12:01:54 AMNormal ZFS operation is sync for metadata and async for data.
I take from this that even metadata does not get written to disk more than once. I believe that you know what you're talking about on this subject so I take your word, but the video I linked makes a contradictory claim at 06:05.
I'm paraphrasing, but he claims that for a single-disk scenario (such as mine) ZFS sync writes data (or metadata, technically) twice: once for the log, and once for the commit. He presents some measurements that seem to corroborate the claim although I can't verify it.
My thinking is that modest home labs might be running 1L / mini PCs with very limited storage options so maybe there was a potential pitfall to be avoided here.
Oh, I'm sorry. Yes, synchronous writes are written twice. But they are the exception, not the rule.
If you use any consumer SSD storage option for Proxmox, you are waiting for an accident to happen anyway. Many home users may use things like Plex or Home Assistant or have a Docker instance running as VMs and those
Suffice it to say that you can reduce the write load by a huge amount just by enabling "Use memory file system for /tmp" and disabling Netflow and RRD data collection, alongside with excessive firewall logging (with an external syslog server). Also, the metadata flushes have been reduced in OpnSense to every 5 minutes instead of 30s from 23.7 on (https://forum.opnsense.org/index.php?msg=195970). In the linked thread, there is some discussion of actual induced write load. I used up ~50% worth of my first NVME disks life on a brand new DEC750 within one year - but that is totally clear when you think of it and has nothing to do with ZFS-on-ZFS.
P.S.: There are some really bad videos about ZFS out there, like this one (https://www.youtube.com/watch?v=V7V3kmJDHTA), which I just commented on:
QuoteGood intention, alas, badly executed. You should have looked at the actual hardware information instead of relying on what the Linux kernel thinks it did (i.e. use smartctl instead of /proc/diskstats).
The problem with your recommendation of ashift=9 is that Linux shows less writes, but in reality, most SSDs use a physical blocksize of >=128 KBytes. By reducing the blocksize to 512, you actually write the same 128K block multiple times. In order to really minimize the writes to the drive, you should enlarge the ashift to 17 instead of reducing it to 9.
P.P.S.: My NVME drives show a usage of 2 and 4% respectively after ~2 years of use in Proxmox. At that rate, I can still use them another 48 years, which is probably well beyond their MTTF. Back when SSDs became popular, it has been rumored that they could not be used for database use because of limited write capability. A friend of mine used some enterprise-grade SATA SSDs for a 10 TByte weather database that was being written to by thousands of clients and the SSDs were still only at 40% after 5 years of 24/7 use.
Quote from: meyergru on February 10, 2025, 09:30:28 AMmost SSDs use a physical blocksize of >=128 KBytes
I've not seen a block size that large, but then again I only have consumer drives. All of mine (a few Samsungs, a Kingston, and an SK Hynix currently) report 512 bytes in S.M.A.R.T tools:
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 1
1 - 4096 0 0
I completely agree that disks will last a long time regardless, but I thought we should at least be
aware of the possible compounding effects of block writes in a ZFS-on-ZFS scenario and factor that in to any cost projections. Unless I'm mistaken about how virtualization works, whatever inefficiencies ZFS has would be doubled in ZFS-on-ZFS.
I though this was common knowledge: I am talking about the real, physical block size (aka erase block size (https://spdk.io/doc/ssd_internals.html)) of the underlying NAND flash, not the logical one that is being reported over an API that wants to be backwards-compatible to spinning disks. Alone the fact that you can change that logical blocksize should make it clear that this has nothing to do with reality.
It basically was the same with the 4K block size, which was invented for spinning disks in order to reduce gap overhead, but most spinning disks also allowed for a backwards-compatible 512 bytes sector size, because many OSes could not handle 4K at that time.
Basically, 512 bytes and 4K are a mere convention nowadays.
About the overhead: The video I linked that was making false assumptions about the block sizes shows that the write amplification was basically nonexistent after the ashift was "optimized". This goes to show that basically, for any write of data blocks, there will be a write of metadata like checksums. On a normal ZFS, this will almost always be evened out by compression, but not on ZFS-on-ZFS, because the outer layer cannot compress any more. So, yes, there is a little overhead, and for SSDs, this write amplification will be worse with small writes. Then again, that is true for pure ZFS as well.
With projected MTTFs of decently overprovisioned SSDs that are much longer than potential failure because of other reasons, that should not be much of a problem. At least not one that I would give a recommendation to switch off the very features that ZFS stands for, namely to disable ZFS sync.
Quote from: meyergru on February 11, 2025, 09:47:14 AMI am talking about the real, physical block size (aka erase block size (https://spdk.io/doc/ssd_internals.html)) of the underlying NAND flash, not the logical one that is being reported over an API that wants to be backwards-compatible to spinning disks.
Got it, thanks for that. The link doesn't work for me, but I found some alternate sources.
Sadly it seems that the erase block size is not reported in userspace tools and unless it's published by the SSD manufacturer it is guesswork. I think that's reason enough to not worry about ashift tuning, then.
I do not change the default of ashift=12, either. However, something you can do is to avoid any SSDs that do not explicitely note to have RAM cache - even some "pro" drives do not have that. With RAM cache, you can delay the block erase until the whole block or at least more than a minuscule part of it must be written, thus avoiding many unneccessary writes even for small logical block writes.
This is something Deciso did not take into account with their choice of the Transcend TS256GMTE652T2 in the DEC750 line, resulting in this:
# smartctl -a /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.2-RELEASE amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: TS256GMTE652T2
Serial Number: G956480208
Firmware Version: 52B9T7OA
PCI Vendor/Subsystem ID: 0x1d79
IEEE OUI Identifier: 0x000000
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 256,060,514,304 [256 GB]
Namespace 1 Utilization: 37,854,445,568 [37.8 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Wed Feb 12 10:42:37 2025 CET
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 90 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 48 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 80%
Data Units Read: 2,278,992 [1.16 TB]
Data Units Written: 157,783,961 [80.7 TB]
Host Read Commands: 79,558,036
Host Write Commands: 3,553,960,590
Controller Busy Time: 58,190
Power Cycles: 88
Power On Hours: 17,318
Unsafe Shutdowns: 44
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged
As you can see, the drive has only 20% life left at only 2 years (17318 hours) of use.
This is as well interesting
Quote from: crankshaft on December 29, 2024, 12:42:46 PMFinally, after 2 weeks of testing just about every tunable possible I found the solution:
iface enp1s0f0np0 inet manual
pre-up ethtool --offload enp1s0f0np0 generic-receive-offload off
Generic Receive Offload (GRO)
- GRO is a network optimization feature that allows the NIC to combine multiple incoming packets into larger ones before passing them to the kernel.
- This reduces CPU overhead by decreasing the number of packets the kernel processes.
- It is particularly useful in high-throughput environments as it optimizes performance.
GRO may cause issues in certain scenarios, such as:
1. Poor network performance due to packet reordering or handling issues in virtualized environments.
2. Debugging network traffic where unaltered packets are required (e.g., using `tcpdump` or `Wireshark`).
3. Compatibility issues with some software or specific network setups.
This is OVH Advance Server with Broadcom BCM57502 NetXtreme-E.
Hope this will save somebody else a lot of wasted time.
Regards,
S.
Did you try this?
I'm currently just moving over to opnsense from pfsense, and not finished yet - So can't comment, but always had higher latency than I'd expect
It is extremely simple to Virtualize OPNsense in Proxmox I did it in my recent setup using PCI Passthrough and then Virtualization in Proxmox. OPnsense works great here is Step by Step guide to Install OPNsense on Proxmox
It would be good to know more about this GRO setting
I've just finished my setup (at least ported from pfsense, finished) and am pleased to see multi-queue is just a case of setting the host, as outlined here https://forum.opnsense.org/index.php?topic=33700.0
@amjid: Your setup is different by using pass-through. This has several disadvantages:
1. You need additional ports (at least 3 in total), which is often a no-go in environments where you want this on rented hardware in a datacenter - they often have only one physical interface which has to be shared (i.e. bridged) across OpnSense and Proxmox.
2. Some people use Proxmox for the sole reason to use their badly-supported NICs from Realtek, because the Linux drivers are way better than FreeBSD. By using pass-through, you use the FreeBSD drivers again, so this will work just as bad as FreeBSD alone.
@wrongly1686: Usually, you do not need to change the GRO setting. This problem will
only show on certain high-end Broadcom adapters.
I will repeat my message from here (https://forum.opnsense.org/index.php?msg=233131):
QuoteInteresting. Seems like a NIC-specific problem. OVH now has that in their FAQs: https://help.ovhcloud.com/csm/en-dedicated-servers-proxmox-network-troubleshoot?id=kb_article_view&sysparm_article=KB0066095
This was detected even earlier: https://www.thomas-krenn.com/de/wiki/Broadcom_P2100G_schlechte_Netzwerk_Performance_innerhalb_Docker
Nevertheless, I added it above.
And I did mention multiqueue, didn't I?
Apologies, you did.
I just didn't think it could ever be so easy after giving up on pfsense!
Quote from: meyergru on March 27, 2025, 08:45:19 AMAnd I did mention multiqueue, didn't I?
I think it may be worth sticking something in about cpu affinity/cpu units.
I'm moving all my setup around at the moment, but I noticed that my RTT have recently shot up on my gateways - Making my networking feel slow. They've effectively doubled.
I'm keeping an eye on this, but putting opnsense cpu units up to 10,000, Adsense - 8,000 brought them straight back down.
I do wonder if there is some way for proxmox to prioritise bridging, also
Awesome stuff, thank you! This will help
Greetings all, in the process of going through this myself, and I'm pondering the question of HA
My setup is just home/soho with DHCP on the WAN and a wireless WAN (USB dongle) as backup - the Proxmox host can handle the 'USBness' and present a standard network interface the hosts can use.
My query is around HA... I have it in my head that running a pair of OPNsense VMs on the same hardware would allow for failover between the two virtualised devices, which obviously doesn't protect from hardware failures but can allow for upgrades/maintenance/etc without interruption. I've seen a few threads around on CARP + DHCP on the WAN interface (which I'd need to address), but I'm wondering if overall I'm vastly overcomplicating things... The wireless backup does masquerading in itself and has comms on 192.168/16 so that's happy to just live on a linux bridge with the VMs, and I can live with the double-NAT for that backup scenario.. The primary wired WAN though is a standard ISP DHCP (single lease available) so as I understand it, CARP there would be a problem - I've seen there are scripts around though to handle that failover via shutting down the WAN on the backup, which uses a duplicated MAC to carry the lease and ARP over.
As I said, I kinda feel like I might be overcomplicating things... I'm also considering that the host's 16GB of RAM being split to 2x 8G VMs may be a limitation if I start dropping in additional features like Suricata/Zenarmor/etc..
Are there any sort of recommendations/advice around on whether there's a "smart" way to do this or if I'm just being stupid?
Isn't HA primarily supposed to help against hardware failures? By putting both VMs on the same host, you won't gain much.
Quote from: meyergru on May 20, 2025, 08:53:11 AMIsn't HA primarily supposed to help against hardware failures? By putting both VMs on the same host, you won't gain much.
That's certainly the main benefit.. Though I'm thinking the "nice to haves" like seamless restarts for patches/upgrades/etc would be a useful addition. It's just quite a bit more complicated a configuration.
I might just 'start it simple' to get a single node up and running, then if I'm still feeling keen I can replicate it and work on getting a second going. My concern there was that I might need to put some more serious thought into how the hosts interfaces are setup and forwarded to the VM(s), but the more I look at it the more it seems like a simple linux bridge interface would do fine.
With VMs, you can just clone the machine, detach it from the network, upgrade it and then switch it with the original one. Also, a restart is really fast on VMs.
Using HA, you might introduce complexity that actually causes more harm than it prevents - yet, I don't know, since I have not tried that before.
Quote from: MicN on May 20, 2025, 08:29:28 AMGreetings all, in the process of going through this myself, and I'm pondering the question of HA
My setup is just home/soho with DHCP on the WAN and a wireless WAN (USB dongle) as backup - the Proxmox host can handle the 'USBness' and present a standard network interface the hosts can use.
My query is around HA... I have it in my head that running a pair of OPNsense VMs on the same hardware would allow for failover between the two virtualised devices, which obviously doesn't protect from hardware failures but can allow for upgrades/maintenance/etc without interruption. I've seen a few threads around on CARP + DHCP on the WAN interface (which I'd need to address), but I'm wondering if overall I'm vastly overcomplicating things... The wireless backup does masquerading in itself and has comms on 192.168/16 so that's happy to just live on a linux bridge with the VMs, and I can live with the double-NAT for that backup scenario.. The primary wired WAN though is a standard ISP DHCP (single lease available) so as I understand it, CARP there would be a problem - I've seen there are scripts around though to handle that failover via shutting down the WAN on the backup, which uses a duplicated MAC to carry the lease and ARP over.
As I said, I kinda feel like I might be overcomplicating things... I'm also considering that the host's 16GB of RAM being split to 2x 8G VMs may be a limitation if I start dropping in additional features like Suricata/Zenarmor/etc..
Are there any sort of recommendations/advice around on whether there's a "smart" way to do this or if I'm just being stupid?
I think you have thought all pretty well on getting the right setup on the same host. You already have a view of complexity and challenges against the benefits and not much that can be added for you to make your mind up. You're being smart to think about it.
I also run it virtual on Proxmox without a fallback. I've thought about the HA element but not come to a strategy yet; I have another hardware but is not even close to the VM settings, so best I can do is put it in series (firewall behind a firewall). I have another proxmox node (not three for a cluster), my only third is an ESXi host, so thought about a second VM, BUT, the complexity has put it to later. I have other niggles I want to address first.
So my current emergency strategy if the host was to die is to have backups: I do an irregular clonezilla image of the (single) disk + backups of VMs to a separate storage. Definitively not a fall back but this is home not a business so I can _just_ get away with it.
If it was to take too long for us to be up and running, I probably would put the ISP's ethernet cable into the main eero that is in bridge mode. Put it in router mode.
Or, connect the WiFi clients to the other WAN coming into the house.
Short of it is, I think like you at the moment. That HA in the same virtualisation host is too complicated for the benefit it gives.
I intended to setup HA at some point but with VMs on 2 separate hosts, mostly because I use commodity hardware.
It was one of the reasons why I switched to bridging from PCIe passthrough (getting identical interface assignments).
I gave up because of CARP since I'm getting a single IPv4 (2 allowed in theory to handle router switching, but they are not always in the same subnet).
OTOH, I just stumbled on this reddit post (https://www.reddit.com/r/ZiplyFiber/comments/1311jz5/ziply_10g_install_pics/ (https://www.reddit.com/r/ZiplyFiber/comments/1311jz5/ziply_10g_install_pics/)) and the OP claims he can use RFC1918 CARP VIPs with a GW pointing to his static public IP!
That was on pfSense but the same feature seems to exist on OPN as well. Is that a valid configuration?
In the meantime, I switched to bare metal and rely on backups (entire /conf directory for now, might extend to config of some plug-ins as well).
Backup HW is a VM...
On a single host, I relied on VM snapshots to handle potentially bad updates.
The downtime associated with updates is so small that I would not have dealt with HA just because of downtime!
Well maybe wrongly but I assumed that HA on OPN was possible for a single WAN. The docs https://docs.opnsense.org/manual/how-tos/carp.html show all IPs used for the HA setup are non-routable and show a single WAN link at the front of the router/switch.
So clearly it needs a router to route from WAN to LAN(s)/VIPs but I admit having revisited now, I'm unclear.
This is for me a theoretical exercise though, I have only one managed switch and as said before, my hardware is nowhere near ready to even contemplate it :)
Quote from: EricPerl on May 20, 2025, 07:55:41 PMIn the meantime, I switched to bare metal and rely on backups (entire /conf directory for now, might extend to config of some plug-ins as well).
Backup HW is a VM...
On a single host, I relied on VM snapshots to handle potentially bad updates.
The downtime associated with updates is so small that I would not have dealt with HA just because of downtime!
Yes exactly. The only addition I have is the Clonezilla image of the disk if the host's hard drive with it dies. Easy to replace with a spare and reapply it. I have to do it this way because my host for this VM is a commodity (brilliant little mini machine) that has only space for one hard drive inside, so no mirror possible. Lives in the living room, so a full PC won't do; a server even less of a chance or I get divorce papers served.
Quote from: cookiemonster on May 21, 2025, 12:50:50 AMWell maybe wrongly but I assumed that HA on OPN was possible for a single WAN. The docs https://docs.opnsense.org/manual/how-tos/carp.html show all IPs used for the HA setup are non-routable and show a single WAN link at the front of the router/switch.
So clearly it needs a router to route from WAN to LAN(s)/VIPs but I admit having revisited now, I'm unclear.
There's a thread here (https://forum.opnsense.org/index.php?topic=20972.0) that has a bunch of folks trying/working on it (and a git repo with a few forks) using scripts to overcome the need for CARP on the WAN interface when only a single DHCP IP is available there (such as a home internet connection) - essentially you copy the WAN mac of the primary to the secondary and leave that interface shutdown. When a CARP failover is triggered, the interface is brought up and the same DHCP lease is still valid. There'd still need to be an ARP on the broadcast domain to update the forwarding tables in the local switch (/bridge in the case of a VM) for the new port, but there'd still be minimal impact.
Quote from: MicN on May 21, 2025, 01:54:20 AMQuote from: cookiemonster on May 21, 2025, 12:50:50 AMWell maybe wrongly but I assumed that HA on OPN was possible for a single WAN. The docs https://docs.opnsense.org/manual/how-tos/carp.html show all IPs used for the HA setup are non-routable and show a single WAN link at the front of the router/switch.
So clearly it needs a router to route from WAN to LAN(s)/VIPs but I admit having revisited now, I'm unclear.
There's a thread here (https://forum.opnsense.org/index.php?topic=20972.0) that has a bunch of folks trying/working on it (and a git repo with a few forks) using scripts to overcome the need for CARP on the WAN interface when only a single DHCP IP is available there (such as a home internet connection) - essentially you copy the WAN mac of the primary to the secondary and leave that interface shutdown. When a CARP failover is triggered, the interface is brought up and the same DHCP lease is still valid. There'd still need to be an ARP on the broadcast domain to update the forwarding tables in the local switch (/bridge in the case of a VM) for the new port, but there'd still be minimal impact.
Thanks for that. I'm glad to hear I've not gone mad just yet.
That thread looks like a hack to me. I'm not doing that.
The how-to on CARP leaves me with one big question: what's the WAN public IP range?
I suspect the article uses 172.18.0.0/24 as a placeholder.
In more realistic scenarios, you'd use a very small IP range (but a /30 is too small).
In the reddit thread I mentioned, the author uses an advanced setting of VIPs (the gateway).
He uses a CARP VIP in 192.0.2.0/24 (Edit: reserved per https://www.rfc-editor.org/rfc/rfc5737 (https://www.rfc-editor.org/rfc/rfc5737)) and the GW is set to his static IPv4 WAN IP.
If that's a valid setup, all I'm missing is a static IP.
Using my current DHCP IP as static would be hacky (even if in practice it has never changed unless I swap gear).
> The how-to on CARP leaves me with one big question: what's the WAN public IP range?
I think, just a single public ip.
From reading both myself, it appears to me that the 172.x.y.z/24 indeed is just a placeholder but for a non-routable range But the "problem" is still there for this usage i.e. can be another but must be RFC1918. Docs are now clear to me. Just another range but 1918.
And if I read the gists and reddit thread correctly, it seems no static ip needed. The scripts (there are at least two) there is an older version that used stop/starts to services, more freebsd style (shell exec) and another which seems to be more opn-aware with pluginctl.
They do the request to renewal of the dhcp lease to the isp on the single existing one - the whole purpose of the exercise.
Definitively hacky as is a clubbing together the bits to "make it happen", not stateful though.
I'm not doing it either, just saving it in my "useful to check out" if/when I decide to give it a go. Potentially.
Hi everyone, thanks for this howto. I am having some troubles on the following hardware:
- Intel N5105
- Intel I226-V rev. 4
I have installed proxmox (latest version) and virtualised OPNSense (last version as well).
I am using VirtIO method, so both WAN and LAN are a linux bridge, not passed trough.
My ISP gives me 2.5/1G down/up plan, but this is the result of my speedtest:
From pve host itself:
Retrieving speedtest.net configuration...
Testing from Telecom Italia (79.17.151.204)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Cloudfire Srl (Milano) [0.90 km]: 19.152 ms
Testing download speed................................................................................
Download: 281.67 Mbit/s
Testing upload speed...
Upload: 205.35 Mbit/s
From OPNSense:
root@opnsense:~ # speedtest
Speedtest by Ookla
Server: Sky Wifi - Milano (id: 50954)
ISP: TIM
Idle Latency: 5.46 ms (jitter: 0.17ms, low: 5.31ms, high: 5.55ms)
Download: 539.83 Mbps (data used: 915.9 MB)
46.37 ms (jitter: 73.23ms, low: 3.21ms, high: 433.53ms)
Upload: 623.16 Mbps (data used: 1.0 GB)
48.06 ms (jitter: 35.79ms, low: 3.95ms, high: 465.33ms)
Packet Loss: 0.0%
Tried as well an iperf3 test between a LXC container on the same LAN bridge and OPNSense:
Accepted connection from 192.168.2.8, port 40784
[ 5] local 192.168.2.1 port 5201 connected to 192.168.2.8 port 40786
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 65.8 MBytes 549 Mbits/sec
[ 5] 1.00-2.00 sec 65.6 MBytes 552 Mbits/sec
[ 5] 2.00-3.00 sec 75.4 MBytes 632 Mbits/sec
[ 5] 3.00-4.01 sec 68.5 MBytes 568 Mbits/sec
[ 5] 4.01-5.00 sec 72.8 MBytes 618 Mbits/sec
[ 5] 5.00-6.01 sec 68.6 MBytes 571 Mbits/sec
[ 5] 6.01-7.00 sec 67.1 MBytes 567 Mbits/sec
[ 5] 7.00-8.00 sec 76.1 MBytes 639 Mbits/sec
[ 5] 8.00-9.00 sec 71.4 MBytes 599 Mbits/sec
[ 5] 9.00-10.00 sec 77.0 MBytes 647 Mbits/sec
[ 5] 10.00-10.01 sec 1.12 MBytes 759 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.01 sec 709 MBytes 594 Mbits/sec receiver
Does anyone know what could be the issue?
Tunables: https://imgur.com/a/tdpPeWr
Offloading: https://imgur.com/2fMhQQW
Thanks in advance.
Did you enable multiqueue on the VM NIC interfaces in Proxmox? The throughput you are getting suggests, you did not.
Quote from: meyergru on May 22, 2025, 09:20:43 AMDid you enable multiqueue on the VM NIC interfaces in Proxmox? The throughput you are getting suggests, you did not.
If you refer to this settings, I have: https://imgur.com/a/K3upFP1
I always use 4 cores and 4 queues. iperf needs a -P4 as well, a single thread will max out at ~600 Mbps for these CPUs.
I am kinda limited by the host cpu. Being a N5105 i only have 4 cores available, hence, i've given firewall vm 2 cores.
However, I did bump the queues to 4, but it's actually the same. Cpu spikes near 100% and same speed...
OPNSense:
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.2.204, port 54384
[ 5] local 192.168.2.1 port 5201 connected to 192.168.2.204 port 54390
[ 8] local 192.168.2.1 port 5201 connected to 192.168.2.204 port 54398
[ 10] local 192.168.2.1 port 5201 connected to 192.168.2.204 port 54412
[ 12] local 192.168.2.1 port 5201 connected to 192.168.2.204 port 54428
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 4.12 MBytes 34.5 Mbits/sec
[ 8] 0.00-1.00 sec 3.88 MBytes 32.4 Mbits/sec
[ 10] 0.00-1.00 sec 36.0 MBytes 301 Mbits/sec
[ 12] 0.00-1.00 sec 3.75 MBytes 31.4 Mbits/sec
[SUM] 0.00-1.00 sec 47.8 MBytes 400 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.01 sec 10.8 MBytes 89.2 Mbits/sec
[ 8] 1.00-2.01 sec 9.88 MBytes 81.9 Mbits/sec
[ 10] 1.00-2.01 sec 3.25 MBytes 27.0 Mbits/sec
[ 12] 1.00-2.01 sec 10.1 MBytes 84.0 Mbits/sec
[SUM] 1.00-2.01 sec 34.0 MBytes 282 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.01-3.01 sec 13.5 MBytes 113 Mbits/sec
[ 8] 2.01-3.01 sec 3.38 MBytes 28.3 Mbits/sec
[ 10] 2.01-3.01 sec 25.9 MBytes 217 Mbits/sec
[ 12] 2.01-3.01 sec 2.75 MBytes 23.1 Mbits/sec
[SUM] 2.01-3.01 sec 45.5 MBytes 382 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.01-4.01 sec 15.8 MBytes 132 Mbits/sec
[ 8] 3.01-4.01 sec 14.2 MBytes 120 Mbits/sec
[ 10] 3.01-4.01 sec 24.1 MBytes 202 Mbits/sec
[ 12] 3.01-4.01 sec 4.00 MBytes 33.6 Mbits/sec
[SUM] 3.01-4.01 sec 58.1 MBytes 488 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.01-5.01 sec 10.6 MBytes 89.1 Mbits/sec
[ 8] 4.01-5.01 sec 22.2 MBytes 187 Mbits/sec
[ 10] 4.01-5.01 sec 896 KBytes 7.34 Mbits/sec
[ 12] 4.01-5.01 sec 3.25 MBytes 27.3 Mbits/sec
[SUM] 4.01-5.01 sec 37.0 MBytes 310 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 5.01-6.20 sec 1.75 MBytes 12.3 Mbits/sec
[ 8] 5.01-6.26 sec 31.4 MBytes 211 Mbits/sec
[ 10] 5.01-6.26 sec 384 KBytes 2.53 Mbits/sec
[ 12] 5.01-6.26 sec 9.38 MBytes 63.2 Mbits/sec
[SUM] 5.01-6.20 sec 42.9 MBytes 302 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 6.20-7.01 sec 6.00 MBytes 62.6 Mbits/sec
[ 8] 6.26-7.01 sec 21.4 MBytes 239 Mbits/sec
[ 10] 6.26-7.01 sec 0.00 Bytes 0.00 bits/sec
[ 12] 6.26-7.01 sec 2.50 MBytes 28.0 Mbits/sec
[SUM] 6.20-7.01 sec 29.9 MBytes 312 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 7.01-8.05 sec 12.6 MBytes 101 Mbits/sec
[ 8] 7.01-8.05 sec 10.2 MBytes 82.1 Mbits/sec
[ 10] 7.01-8.05 sec 25.5 MBytes 204 Mbits/sec
[ 12] 7.01-8.06 sec 9.00 MBytes 72.0 Mbits/sec
[SUM] 7.01-8.05 sec 57.4 MBytes 460 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 8.05-9.01 sec 13.0 MBytes 114 Mbits/sec
[ 8] 8.05-9.01 sec 6.38 MBytes 55.8 Mbits/sec
[ 10] 8.05-9.01 sec 1.00 MBytes 8.75 Mbits/sec
[ 12] 8.06-9.01 sec 12.2 MBytes 107 Mbits/sec
[SUM] 8.05-9.01 sec 32.6 MBytes 286 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 9.01-10.01 sec 21.1 MBytes 177 Mbits/sec
[ 8] 9.01-10.01 sec 0.00 Bytes 0.00 bits/sec
[ 10] 9.01-10.01 sec 0.00 Bytes 0.00 bits/sec
[ 12] 9.01-10.01 sec 16.5 MBytes 138 Mbits/sec
[SUM] 9.01-10.01 sec 37.6 MBytes 316 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 10.01-10.02 sec 128 KBytes 136 Mbits/sec
[ 8] 10.01-10.02 sec 0.00 Bytes 0.00 bits/sec
[ 10] 10.01-10.02 sec 0.00 Bytes 0.00 bits/sec
[ 12] 10.01-10.02 sec 128 KBytes 134 Mbits/sec
[SUM] 10.01-10.02 sec 256 KBytes 272 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.02 sec 109 MBytes 91.6 Mbits/sec receiver
[ 8] 0.00-10.02 sec 123 MBytes 103 Mbits/sec receiver
[ 10] 0.00-10.02 sec 117 MBytes 97.9 Mbits/sec receiver
[ 12] 0.00-10.02 sec 73.6 MBytes 61.6 Mbits/sec receiver
[SUM] 0.00-10.02 sec 423 MBytes 354 Mbits/sec receiver
host proxmox:
[root@pve-02]: ~ $ iperf3 -c 192.168.2.1 -P4
Connecting to host 192.168.2.1, port 5201
[ 5] local 192.168.2.204 port 54390 connected to 192.168.2.1 port 5201
[ 7] local 192.168.2.204 port 54398 connected to 192.168.2.1 port 5201
[ 9] local 192.168.2.204 port 54412 connected to 192.168.2.1 port 5201
[ 11] local 192.168.2.204 port 54428 connected to 192.168.2.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 5.23 MBytes 43.8 Mbits/sec 1 1.41 KBytes
[ 7] 0.00-1.00 sec 5.23 MBytes 43.8 Mbits/sec 1 1.41 KBytes
[ 9] 0.00-1.00 sec 38.9 MBytes 326 Mbits/sec 197 782 KBytes
[ 11] 0.00-1.00 sec 4.95 MBytes 41.4 Mbits/sec 1 1.41 KBytes
[SUM] 0.00-1.00 sec 54.3 MBytes 455 Mbits/sec 200
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.00 sec 12.2 MBytes 103 Mbits/sec 1 400 KBytes
[ 7] 1.00-2.00 sec 10.9 MBytes 91.9 Mbits/sec 7 376 KBytes
[ 9] 1.00-2.00 sec 3.75 MBytes 31.5 Mbits/sec 1 1.41 KBytes
[ 11] 1.00-2.00 sec 11.4 MBytes 95.5 Mbits/sec 17 373 KBytes
[SUM] 1.00-2.00 sec 38.3 MBytes 322 Mbits/sec 26
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 13.3 MBytes 111 Mbits/sec 11 529 KBytes
[ 7] 2.00-3.00 sec 4.23 MBytes 35.4 Mbits/sec 11 379 KBytes
[ 9] 2.00-3.00 sec 27.5 MBytes 231 Mbits/sec 157 803 KBytes
[ 11] 2.00-3.00 sec 3.11 MBytes 26.1 Mbits/sec 10 366 KBytes
[SUM] 2.00-3.00 sec 48.1 MBytes 404 Mbits/sec 189
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-4.00 sec 16.2 MBytes 136 Mbits/sec 5 293 KBytes
[ 7] 3.00-4.00 sec 13.4 MBytes 113 Mbits/sec 1 427 KBytes
[ 9] 3.00-4.00 sec 22.5 MBytes 189 Mbits/sec 232 416 KBytes
[ 11] 3.00-4.00 sec 4.35 MBytes 36.5 Mbits/sec 7 192 KBytes
[SUM] 3.00-4.00 sec 56.5 MBytes 474 Mbits/sec 245
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 12.5 MBytes 105 Mbits/sec 0 324 KBytes
[ 7] 4.00-5.00 sec 22.4 MBytes 188 Mbits/sec 0 478 KBytes
[ 9] 4.00-5.00 sec 2.50 MBytes 21.0 Mbits/sec 0 419 KBytes
[ 11] 4.00-5.00 sec 3.17 MBytes 26.6 Mbits/sec 1 204 KBytes
[SUM] 4.00-5.00 sec 40.5 MBytes 340 Mbits/sec 1
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 5.00-6.00 sec 2.50 MBytes 21.0 Mbits/sec 0 328 KBytes
[ 7] 5.00-6.00 sec 25.5 MBytes 214 Mbits/sec 0 510 KBytes
[ 9] 5.00-6.00 sec 1.25 MBytes 10.5 Mbits/sec 0 416 KBytes
[ 11] 5.00-6.00 sec 6.40 MBytes 53.7 Mbits/sec 0 225 KBytes
[SUM] 5.00-6.00 sec 35.6 MBytes 299 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 6.00-7.00 sec 3.75 MBytes 31.5 Mbits/sec 0 338 KBytes
[ 7] 6.00-7.00 sec 29.0 MBytes 243 Mbits/sec 0 530 KBytes
[ 9] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 416 KBytes
[ 11] 6.00-7.00 sec 5.34 MBytes 44.8 Mbits/sec 0 242 KBytes
[SUM] 6.00-7.00 sec 38.1 MBytes 319 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 7.00-8.00 sec 12.5 MBytes 105 Mbits/sec 1 365 KBytes
[ 7] 7.00-8.00 sec 10.1 MBytes 85.0 Mbits/sec 0 537 KBytes
[ 9] 7.00-8.00 sec 22.5 MBytes 189 Mbits/sec 0 460 KBytes
[ 11] 7.00-8.00 sec 8.76 MBytes 73.5 Mbits/sec 0 267 KBytes
[SUM] 7.00-8.00 sec 53.9 MBytes 452 Mbits/sec 1
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 8.00-9.00 sec 12.5 MBytes 105 Mbits/sec 0 390 KBytes
[ 7] 8.00-9.00 sec 6.71 MBytes 56.3 Mbits/sec 0 547 KBytes
[ 9] 8.00-9.00 sec 3.75 MBytes 31.5 Mbits/sec 0 464 KBytes
[ 11] 8.00-9.00 sec 11.7 MBytes 98.0 Mbits/sec 0 301 KBytes
[SUM] 8.00-9.00 sec 34.6 MBytes 291 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 9.00-10.00 sec 21.2 MBytes 178 Mbits/sec 0 431 KBytes
[ 7] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 547 KBytes
[ 9] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 464 KBytes
[ 11] 9.00-10.00 sec 16.9 MBytes 142 Mbits/sec 0 341 KBytes
[SUM] 9.00-10.00 sec 38.2 MBytes 320 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 112 MBytes 94.0 Mbits/sec 19 sender
[ 5] 0.00-10.02 sec 109 MBytes 91.6 Mbits/sec receiver
[ 7] 0.00-10.00 sec 127 MBytes 107 Mbits/sec 20 sender
[ 7] 0.00-10.02 sec 123 MBytes 103 Mbits/sec receiver
[ 9] 0.00-10.00 sec 123 MBytes 103 Mbits/sec 587 sender
[ 9] 0.00-10.02 sec 117 MBytes 97.9 Mbits/sec receiver
[ 11] 0.00-10.00 sec 76.0 MBytes 63.8 Mbits/sec 36 sender
[ 11] 0.00-10.02 sec 73.6 MBytes 61.6 Mbits/sec receiver
[SUM] 0.00-10.00 sec 438 MBytes 368 Mbits/sec 662 sender
[SUM] 0.00-10.02 sec 423 MBytes 354 Mbits/sec receiver
iperf Done.
(https://preview.redd.it/how-to-acces-vm-from-vlan-opnsense-in-proxmox-v0-qpp71os3w83f1.png?width=2302&format=png&auto=webp&s=6e264e4b39179bee7402abf0cb0dc40442af1ccc)
I have a Proxmox server with a single NIC that's connected to a MikroTik router.
In Proxmox, the default bridge is vmbr0.
On the MikroTik side, I created a VLAN (e.g., VLAN 100) and set it as a DHCP server.
On the Proxmox host, I added an interface vmbr0.100 (for VLAN 100), and it gets an IP automatically via DHCP from the MikroTik VLAN.
Also, the Proxmox host has a Cloudflare Tunnel set up, which gives remote access to all services running on the VMs, including the Proxmox web UI itself.
Now, I also have an OPNsense instance running.
What I want to do is:
Route all VM and LXC traffic in Proxmox through VLANs provided by OPNsense.
And I still want to access everything via the Cloudflare Tunnel, routed through the Proxmox host.
Is this kind of setup possible? Any best practices or recommendations?
That is a very specific setup that should be put into a thread on its own.