Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page describes how Delta Lake column mapping enables metadata-only changes to mark columns as deleted or renamed without rewriting data files.
Azure Databricks supports column mapping for Delta Lake tables. Column mapping enables metadata-only changes to mark columns as deleted or renamed without rewriting data files. Column mapping also allows you to use characters not allowed by Parquet in column names, such as spaces. This lets you directly ingest CSV or JSON data into Delta without renaming columns.
Prerequisites and limitations
Before enabling column mapping, understand the following limitations:
- Tables with column mapping enabled can only be read in Databricks Runtime 10.4 LTS and above
- Enabling column mapping might break:
- Legacy workloads that rely on directory names for reading Delta tables. Partitioned tables with column mapping use random prefixes instead of column names for partition directories. See Do Delta Lake and Parquet share partitioning strategies?.
- Downstream operations using Delta change data feed. See Change data feed limitations for tables with column mapping.
- Streaming reads from the Delta table, including in Lakeflow Spark Declarative Pipelines. See Column mapping and streaming.
Enable column mapping
Use the following command to enable column mapping:
ALTER TABLE <table-name> SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name'
)
Column mapping requires the following Delta protocols:
- Reader version 2 or above
- Writer version 5 or above
See Delta Lake feature compatibility and protocols.
Rename a column
Note
Available in Databricks Runtime 10.4 LTS and above.
When column mapping is enabled for a Delta table, you can rename a column:
ALTER TABLE <table-name> RENAME COLUMN old_col_name TO new_col_name
For more examples, see Update table schema.
Drop columns
Note
Available in Databricks Runtime 11.3 LTS and above.
When column mapping is enabled for a Delta table, you can drop one or more columns:
ALTER TABLE table_name DROP COLUMN col_name
ALTER TABLE table_name DROP COLUMNS (col_name_1, col_name_2, ...)
For more details, see Update table schema.
Supported characters in column names
When column mapping is enabled for a Delta table, you can include spaces and any of these characters in column names: ,;{}()\n\t=.
Remove column mapping
You can remove column mapping from a table using the following command:
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.columnMapping.mode' = 'none')
Warning
Removing column mapping rewrites all data files to replace physical column names with logical names. This operation doesn't support row-level or physical conflict resolution.
Concurrent write operations will cause a ConcurrentModificationException. Before removing column mapping:
- Pause all concurrent write operations, including streaming jobs and ETL pipelines.
- Disable predictive optimization on the table.
- For large tables, schedule this operation during low-activity periods.
For an alternative approach that supports downgrading the table protocol, see Disable column mapping.
Disable column mapping
In Databricks Runtime 15.3 and above, you can use the DROP FEATURE command to remove column mapping and downgrade the table protocol. Use this approach instead of removing column mapping if you need to downgrade protocol versions for compatibility with older readers.
Important
Dropping column mapping from a table doesn't remove the random prefixes used in directory names for partitioned tables.
See Drop a Delta Lake table feature and downgrade table protocol.
Column mapping and streaming
You can provide a schema tracking location to enable streaming from Delta tables with column mapping enabled. This overcomes an issue where non-additive schema changes could result in broken streams.
Each streaming read against a data source must have its own schemaTrackingLocation specified. The specified schemaTrackingLocation must be contained within the directory specified for the checkpointLocation of the target table for streaming write. For streaming workloads that combine data from multiple source Delta tables, you must specify unique directories within the checkpointLocation for each source table.
Enable column mapping on a running job
Important
To enable column mapping on a running streaming job:
- Stop the job
- Enable column mapping on the table
- Restart the job (first restart - initializes column mapping)
- Restart the job again (second restart - enables schema changes)
Any further schema changes (adding or dropping columns, changing column types) also require restarting the job.
Specify schema tracking location
The following example shows how to specify a schemaTrackingLocation for a streaming read from a Delta table with column mapping:
checkpoint_path = "/path/to/checkpointLocation"
(spark.readStream
.option("schemaTrackingLocation", checkpoint_path)
.table("delta_source_table")
.writeStream
.option("checkpointLocation", checkpoint_path)
.toTable("output_table")
)