Del via


Serverless GPU compute

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

Overview of Serverless GPU Compute

Serverless GPU Compute is a compute offering at Databricks intended for deep learning workloads, and brings GPU support for Databricks Serverless. You can use Serverless GPU Compute to train and fine-tune custom models using your favorite frameworks and get state-of-the-art efficiency, performance, and quality. For general information about serverless compute, see Connect to serverless compute.

Key features

  • Fully managed GPU infrastructure — Serverless, flexible access to GPUs and no cluster configuration, driver selection, or autoscaling policies to manage.
  • A runtime dedicated for deep learning — Choose either a minimal default base environment for maximum flexibility over dependencies or a full-featured AI environment pre-loaded with popular ML frameworks.
  • Natively integrated across notebooks, jobs, Unity Catalog, and MLflow for seamless development, data access, and experiment tracking.
  • Genie Code integration — When a notebook is connected to serverless GPU compute, Genie Code is optimized to help with machine learning tasks.

Hardware options

Accelerator Best For Multi-GPU
A10 Small to medium ML and deep learning tasks such as classic ML models or fine-tuning smaller language models No
H100 Large-scale AI workloads including training or fine-tuning massive models or running advanced deep learning tasks Yes (8 GPUs)

Databricks recommends Serverless GPU Compute for any custom model training use cases that involve deep learning, large-scale classic workloads, or GPUs.

For example:

  • LLM fine-tuning (LoRA, QLoRA, full fine-tuning)
  • Computer vision (object detection, image classification)
  • Deep-learning-based recommender systems
  • Reinforcement learning
  • Deep-learning-based time series forecasting

Requirements

  • A workspace in one of the following Azure-supported regions:
    • centralus
    • eastus
    • eastus2
    • northcentralus
    • westcentralus
    • westus
    • westus3

Limitations

  • Serverless GPU compute only supports A10 accelerators.
  • Serverless GPU compute is not supported for compliance security profile workspaces (like HIPAA or PCI). Processing regulated data is not supported.
  • Adding dependencies using the Environments panel is not supported for Serverless GPU Compute scheduled jobs. Install dependencies programmatically using %pip install in your notebook instead.
  • For scheduled jobs on Serverless GPU compute, auto recovery behavior for incompatible package versions that are associated with your notebook is not supported.
  • The maximum runtime for a workload is seven days. For model training jobs that exceed this limit, implement checkpointing and restart the job once the maximum runtime is reached.

Connecting to serverless compute

You can connect to Serverless GPU Compute interactively from notebooks, schedule notebooks as recurring jobs, or programmatically create jobs using the Jobs API and Databricks Asset Bundles. For step-by-step instructions, see Connect to Serverless GPU Compute.

Setting up my environment

Serverless GPU compute offers two managed Python environments: a minimal default base environment, and a full-featured Databricks AI environment that is pre-loaded with popular ML frameworks like PyTorch and Transformers. For details on choosing an environment, caching behavior, importing custom modules, and known limitations, see How to set up your environment.

Reading in my data

Understanding how data access works on Serverless GPU Compute is essential for a smooth experience. For details, see Load data on Serverless GPU compute.

Distributed training

Serverless GPU compute supports distributed training across multiple GPUs on the single node your notebook is connected to. Using the @distributed decorator from the serverless_gpu Python API (Beta), you can launch multi-GPU workloads with PyTorch DDP, FSDP, or DeepSpeed with minimal configuration. For details, see Multi-GPU workload.

Experiment tracking and observability

For MLflow integration, viewing logs, and model checkpoint management, see Experiment tracking and observability.

Genie Code for deep learning

Genie Code supports deep learning workloads on Serverless GPU Compute. It can help with generating training code, resolving library installation errors, suggesting optimizations, and debugging common issues. See Use Genie Code for data science.

Guides

For migration from classic workloads, example notebooks, and troubleshooting, see Guides for Serverless GPU Compute.