Share via


RStudio on Azure Databricks

You can use RStudio, a popular integrated development environment (IDE) for R, to connect to Azure Databricks compute resources within Azure Databricks workspaces from your local development machine.

Set up RStudio Desktop

To set up RStudio Desktop on your local development machine:

  1. Download and install R 3.3.0 or higher.
  2. Download and install RStudio Desktop.
  3. Start RStudio Desktop.

(Optional) To create an RStudio project:

  1. Start RStudio Desktop.
  2. Click File > New Project.
  3. Select New Directory > New Project.
  4. Choose a new directory for the project, and then click Create Project.

To create an R script:

  1. With the project open, click File > New File > R Script.
  2. Click File > Save As.
  3. Name the file, and then click Save.

Connect to Databricks

To use RStudio Desktop to connect to a remote Azure Databricks cluster or SQL warehouse from your local development machine, use an ODBC connection and call ODBC package functions for R.

Note

You cannot use packages such as SparkR or sparklyr in this RStudio Desktop scenario, unless you also use Databricks Connect.

To connect to the remote Azure Databricks cluster or SQL warehouse through ODBC for R:

  1. Get the Server hostname, Port, and HTTP path values for your remote cluster or SQL warehouse For a cluster, these values are on the JDBC/ODBC tab of Advanced options. For a SQL warehouse, these values are on the Connection details tab.

  2. Get an Azure Databricks personal access token.

    Note

    As a security best practice, when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.

  3. Install and configure the Databricks ODBC driver for your operating system.

  4. Set up an ODBC Data Source Name (DSN) to your remote cluster or SQL warehouse for Windows, macOS, or Linux, based on your local machine's operating system.

  5. From the RStudio console (View > Move Focus to Console), install the odbc and DBI packages from CRAN:

    require(devtools)
    
    install_version(
      package = "odbc",
      repos   = "http://cran.us.r-project.org"
    )
    
    install_version(
      package = "DBI",
      repos   = "http://cran.us.r-project.org"
    )
    
  6. In your R script (View > Move Focus to Source), load the installed odbc and DBI packages:

    library(odbc)
    library(DBI)
    
  7. Call the ODBC version of the dbConnect function in the DBI package, specifying the odbc driver in the odbc package and the ODBC DSN that you created, for example, an ODBC DSN of Databricks.

    conn = dbConnect(
      drv = odbc(),
      dsn = "Databricks"
    )
    
  8. Call an operation through the ODBC DSN, for example a SELECT statement through the dbGetQuery function in the DBI package, specifying the name of the connection variable and the SELECT statement itself, for example from a table named diamonds in a schema (database) named default:

    print(dbGetQuery(conn, "SELECT * FROM default.diamonds LIMIT 2"))
    

The complete R script is as follows:

library(odbc)
library(DBI)

conn = dbConnect(
  drv = odbc(),
  dsn = "Databricks"
)

print(dbGetQuery(conn, "SELECT * FROM default.diamonds LIMIT 2"))

To run the script, in source view, click Source. The results for the preceding R script are as follows:

  _c0 carat     cut color clarity depth table price    x    y    z
1   1  0.23   Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
2   2  0.21 Premium     E     SI1  59.8    61   326 3.89 3.84 2.31