Share via


Best practices for virtual machine deployments on OpenShift Virtualization

This document provides guidance for optimizing performance and cost efficiency when deploying virtual machines (VMs) using OpenShift Virtualization on Azure Red Hat OpenShift. This guidance also addresses any concerns around application performance and provides actionable steps for successful deployment.

Approach to optimization

Note

GPU-dependent workloads are currently not supported on OpenShift Virtualization on Azure. Plan your deployments accordingly.

Optimizing VM deployments begins with understanding your application workloads and aligning infrastructure choices accordingly. OpenShift Virtualization deployment on Azure Boost machines, the cluster's worker nodes, introduces architectural overhead compared to native VM or pod deployments. Planning for capacity and performance should account for this overhead.

Workload identification

Before provisioning VMs, categorize your workloads to determine their performance and resource requirements. Common workload types include:

  • General purpose: Web servers, application servers, content management systems.
  • Database: Relational and NoSQL databases requiring consistent input/output operations per second (IOPS) and memory.
  • Real-time analytics: Low-latency data processing, operational dashboards.
  • AI/ML: Compute-intensive workloads requiring high CPU/GPU and memory.
  • Data streaming & messaging: High-throughput, low-latency event-driven architectures.
  • Batch processing: Periodic or on-demand jobs processing large data volumes.
  • High-performance computing (HPC): Scientific simulations, financial modeling.
  • Edge and IoT: Aggregating and processing data from distributed sensors.
  • Media processing: Video encoding/decoding, image transformation, streaming.
  • Dev/Test environments: Temporary environments for development and testing.

Each workload type has unique characteristics that influence VM sizing, storage configuration, and performance tuning strategies.

Right sizing your application Workloads

Key considerations for right sizing

  • Minimum core requirement: OpenShift Virtualization requires a minimum of eight (8) core Azure VMs for OpenShift worker nodes.
  • Architectural overhead: Performance might vary depending on the architectural decisions taken while configuring the environment, including instance types, storage, and network characteristics.
  • Scaling out: For demanding workloads, scaling out your Azure Red Hat OpenShift cluster by adding more nodes can help overcome resource contention and maintain throughput.
  • Benchmark your workloads: Avoid relying solely on on-premises sizing references; benchmark your own workloads to inform right sizing.
  • Cost factors: Consider Azure compute costs, OpenShift licensing, VM licensing, and scalability requirements.

Right sizing ensures that your VMs are provisioned with adequate resources to meet performance goals without overprovisioning. This process is critical in cloud environments, where resource efficiency directly impacts cost and performance.

Steps to right size workloads

  1. Define health metrics

    • CPU Utilization: Target 60–70% average usage.
    • Memory Pressure: Monitor swap usage, memory saturation, and page faults.
    • IO Strain: Measure disk latency, throughput, and queue depth.
  2. Set up monitoring

    • Use Prometheus and Grafana for real-time metric collection and visualization.
    • Enable KubeVirt metrics for VM-level insights.
    • Correlate infrastructure-level metrics with application performance by integrating with Azure Monitor, via Azure Arc.
  3. Analyze historical data

    • Review performance trends over time.
    • Identify peak usage periods and resource saturation events.
    • Use historical baselines to guide future autoscaling decisions.
  4. Adjust VM specifications

  5. Test and validate

    • Perform load testing using tools like Apache JMeter, Locust, or stress-ng.
    • Validate against defined health metrics and performance targets.
    • Iterate on configuration changes and retest to confirm improvements.

Observed performance characteristics

Below you can find observed performance metrics for OpenShift Virtualization workloads on Azure Red Hat OpenShift for specified configurations and workloads. Actual observed performance may vary depending on the workload chosen and specific cluster configuration, but the below tables will give you an idea of what can be expected. Due to the nature of running Virtual Machines in Azure Red Hat OpenShift, you will find that it will not be as performant as running the workloads in a pod. Performance will be improved in the near future with improved instance types becoming available and new technologies such as Direct Virtualization.

Compute

  • Worker node type: Standard_D96ds_v5 (with Azure Boost)
  • OpenShift version: 4.20
  • Virtualization operator: 4.20
OCP Virtualization VM Pod With Direct Virtualization
Events 525,022 546,997 Coming soon
Latency (ms) 0.70 0.65 Coming soon

Storage

  • Disk: Premium SSD v2 SCSI/SATA Disks
  • Worker node type: Standard_D96ds_v5 (with Azure Boost)
  • OpenShift version: 4.20
  • Virtualization operator: 4.20
  • ODF storage operator: 4.19
  • Single AZ cluster
Threads OCP Virtualization VM (TPM) Pod (TPM) With Direct Virtualization
1 4,332 6,303 Coming soon
2 9,266 12,371 Coming soon
4 17,006 23,422 Coming soon
8 31,148 43,314 Coming soon
16 44,904 68,872 Coming soon
32 64,294 103,359 Coming soon

Network

  • Worker node type: Standard_D96ds_v5 (with Azure Boost)
  • NIC: 35 GB
  • Single AZ Cluster
Message size - threads OCP Virtualization VM Latency (μs) OCP Virtualization VM Throughput (Gbps) Pod Latency (μs) Pod Throughput (Gbps) With Direct Virtualization
64B - 1 thr 94.58 0.4 45.84 0.9 Coming soon
64B - 8 thr 87.93 3.4 49.90 7.5 Coming soon
1024B - 1 thr 90.6 6.1 48.32 7.0 Coming soon
1024B - 8 thr 93.57 24.7 48.59 28.9 Coming soon
8192B - 1thr 151.4 7.6 104.43 10.9 Coming soon
8192B - 8 thr 157.27 20.7 90.96 27.0 Coming soon

Fine tuning your environment

Fine tuning your OpenShift Virtualization environment is essential to achieving optimal performance, especially for demanding workloads. The following best practices are derived from extensive benchmarking and real-world experience on Azure Boost VM series (Dsv5/Dsv6).

Performance optimization strategies

  • Scale out or up for demanding workloads: Add more nodes or upsize the nodes in your Azure Red Hat OpenShift cluster for high concurrency or resource-intensive applications.
  • Avoid strict resource limits: Set only guest memory for VMs; avoid strict resource limits unless required for governance.
  • Tune storage and network configurations: Select storage solutions and performance tiers that match your workload needs. For network-intensive workloads, tune settings such as NAPI and multiqueue, and monitor throughput and latency.
  • Monitor and benchmark regularly: Use Prometheus, Grafana, and Azure Monitor to track key metrics. Benchmark your own workloads to validate performance and guide further tuning.
  • Expect architectural overhead: Plan capacity and set expectations accordingly, especially for workloads with high I/O or network demands.

VM overcommit tuning

OpenShift Virtualization operator allows you to adjust CPU and memory overcommit ratios, letting you allocate more virtual resources than physically available. This change can improve density and resource utilization but might increase contention and affect performance.

Best practices for overcommit tuning:

  • Use conservative overcommit for production workloads.
  • Consider higher overcommit for dev/test environments.
  • Monitor resource usage and adjust ratios as needed.

For more information, see Configuring higher VM workload density

Best practices based on benchmarking

  • Database workloads: Avoid setting both resource requests and limits for VMs. Monitor performance closely when using fast storage and high concurrency. Scale out cluster nodes for large database deployments.
  • Network workloads: Tune network settings for optimal throughput. Scale out as needed to achieve the desired network throughput.

Storage solution tuning

  • OpenShift Data Foundation (ODF): Use SSD-backed storage for low-latency access. Configure replication and erasure coding policies based on workload needs. To prevent competition for your application compute resources, consider creating a separate worker pool for ODF with smaller Azure VM sizes, Ds16v5 is a good starting point, and use taints/tolerations to ensure ODF is the only workload scheduled there. Monitor storage performance and adjust replication factors as needed.
  • Azure NetApp Files (ANF): Choose performance tiers based on IOPS and throughput requirements. Ensure proper mount options and network configuration for optimal performance. Use volume snapshots and backups to support data protection and recovery strategies.

OpenShift Virtualization for Azure Red Hat OpenShift.