Databricks Delta Lake includes a number of time travel features to let you access any previous version of the extensive data that Delta automatically versions and stores in your data lake. This makes it simple to audit, roll back data in the event of unintentional poor writes or deletes, and reproduce reports and trials.

How does Delta Time Travel work exactly? Keep reading for a primer on time travel in Databricks.

What is Delta Lake Time Travel?

Delta Lake time travel features ensure that every operation you carry out when writing to a Delta table or directory is versioned automatically.

There are two methods by which you can access the various versions of the data:

1. Applying a timestamp

2. Using a versioning system

Delta Time Travel Syntax

What you need to do is add a clause after the table name specification to query a Delta table with time travel.

Timestamp_expression may be any of the following:

'2018-10-18T22:15:12.013Z', that is, a string that can be cast to a timestamp
cast('2018-10-18 13:36:32 CEST' as timestamp)
'2018-10-18', that is, a date string
current_timestamp() - interval 12 hours
date_sub(current_date(), 1)

Note that version and timestamp_expression are not subqueries.

Accepted formats are only date or timestamp strings: for example, "2019-01-01" and "2019-01-01T00:00:00.000Z".

Check out the code below for an illustration of time travel syntax:

Copy Code

SELECT * FROM people10m TIMESTAMP AS OF '2018-10-18T22:15:12.013Z';
SELECT * FROM people10m VERSION AS OF 123;

You may also include the version or timestamp in the table name using the @ syntax. The timestamp needs to be formatted in yyyyMMddHHmmssSSS. Adding a v before the version can specify a version after @.

Here’s a snippet showing this syntax:

Copy Code

SELECT * FROM people10m@20190101000000000
SELECT * FROM people10m@v123

Different Ways of Time Travel in Databricks

1. Timestamp

Scala

The timestamp or date string can be added to the DataFrame reader as an option:

*note that the sample code provided is in HTML/Markup format

Copy Code

val df = spark.read
    .format("delta")
    .option("timestampAsOf", "2019-01-01")
    .load("/path/to/my/table")

Python

Here’s an example of a timestamp in Python:

Copy Code

df = spark.read \
    .format("delta") \
    .option("timestampAsOf", "2019-01-01") \
    .load("/path/to/my/table")

SQL syntax

What about SQL? Here’s an example:

Copy Code

SELECT count(*) FROM my_table TIMESTAMP AS OF "2019-01-01"
SELECT count(*) FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1)
SELECT count(*) FROM my_table TIMESTAMP AS OF "2019-01-01 01:30:00.000"

You can still go back in time for a table by adding the timestamp in yyyyMMddHHmmssSSS format to the path:

Copy Code

val inputPath = "/path/to/my/table@20190101000000000"
val df = loadData(inputPath)

// Function in a library that you don't have access to
def loadData(inputPath : String) : DataFrame = {
    spark.read
    .format("delta")
    .load(inputPath)
}

inputPath = "/path/to/my/table@20190101000000000"
df = loadData(inputPath)

# Function in a library that you don't have access to
def loadData(inputPath):
    return spark.read \
        .format("delta") \
        .load(inputPath)
}

2. Version Number

Every write in Delta has a version number, which you can use to go back in time.

Scala

Here’s an example code snippet that would be used in Scala:

*note that the sample code provided is in HTML/Markup format

Copy Code

val df = spark.read
    .format("delta")
    .option("versionAsOf", "5238")
    .load("/path/to/my/table")

val df = spark.read
    .format("delta")
    .load("/path/to/my/table@v5238")

Python

This is what things look like in Python:

Copy Code

df = spark.read \
    .format("delta") \
    .option("versionAsOf", "5238") \
    .load("/path/to/my/table")

df = spark.read \
    .format("delta") \
    .load("/path/to/my/table@v5238")

SQL syntax

And SQL:

Copy Code

SELECT count(*) FROM my_table VERSION AS OF 5238
SELECT count(*) FROM my_table@v5238
SELECT count(*) FROM delta.`/path/to/my/table@v5238`

How to Recover a Delta Lake Table Using Databricks Time Travel

Identify the Correct Version

First things first, you need to know which Delta table version you want to recover.

Consider table_name first. This part identifies the Delta table version to be restored. Databricks forbids using a temporal specification in the table name.

Now let’s consider the timestamp_expression. It can be any of the following:

‘2018-10-18T22:15:12.013Z’, that is, a string that can be cast to a timestamp
cast(‘2018-10-18 13:36:32 CEST’ as timestamp)
‘2018-10-18’, that is, a date string
current_timestamp() – interval 12 hours
date_sub(current_date(), 1)
Any other expression that is or can be cast to a timestamp

Then there’s version. It’s a long value that can be obtained from the output of DESCRIBE HISTORY table_spec.

Note: Neither timestamp_expression nor version can be subqueries.

Restore the Table

To restore a Delta table to its previous state, you need to use the RESTORE command. To facilitate a table’s restoration to a previous state, a Delta table internally preserves older versions of the table.

Note: You can’t restore a table to a previous version in which the data files were manually or automatically removed. You may still partly restore to this version if spark.sql.files.ignoreMissingFiles is set to true.

When going back to a previous state, you need to use the timestamp format yyyy-MM-dd. HH:mm:ss. Alternatively, you can supply just a date string (yyyy-MM-dd).

Here’s what the command looks like in Databricks SQL:

Copy Code

RESTORE [ TABLE ] table_name [ TO ] time_travel_version

time_travel_version
    { TIMESTAMP AS OF timestamp_expression |
        VERSION AS OF version }

Don’t forget that restore is an operation that modifies data. DataChange is set to true in Delta Lake log entries that were added using the RESTORE command. The data change log entries produced by this operation are regarded as new data updates, and processing them may result in duplicate data if there is a downstream application, such as a Structured streaming job, that processes the updates to a Delta Lake table.

Using Databricks Time Travel: Examples

Fixing accidental deletes to a table

Let’s say that you want to fix accidental deletes to a table for user 222. Here’s the SQL code you need:

Copy Code

INSERT INTO my_table
    SELECT * FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1)
    WHERE userId = 222

Fix incorrect updates to a table

To fix an incorrect update to a table that happened accidentally, use this:

Copy Code

MERGE INTO my_table target
    USING my_table TIMESTAMP AS OF date_sub(current_date(), 1) source
    ON source.userId = target.userId
    WHEN MATCHED THEN UPDATE SET *

Common Challenges With Changing Data

Auditing Data Changes

Data compliance and basic debugging to comprehend how data has changed over time both depend on the auditing of data changes. In such cases, organizations transitioning from traditional data systems to big data technology and the cloud face many different challenges.

Reproducing Experiments & Reports

Data practitioners carry out multiple experiments with varying parameters on a particular piece of data during model training.

Usually, upstream pipelines change the original data when they return to their experiments later to replicate the models. Many times, these upstream data changes catch them off guard, making it difficult for them to repeat their research.

By making several copies of the data, some scientists and organizations create best practices, which raise the cost of storage. This also applies to analysts who are writing reports.

Rollbacks

Data pipelines may occasionally provide inaccurate data for users farther down the line. This can occur due to various problems, such as pipeline errors, sloppy data, and unstable infrastructure.

Date-based partitioning makes rollbacks easy for pipelines that do basic appends to directories or tables. This may get quite complicated when dealing with updates and deletes, and data engineers usually have to design a sophisticated pipeline to handle such circumstances.

Databricks Time Travel Key Functionalities

Audit Data Changes

The UI or the DESCRIBE HISTORY command allows you to view the history of table modifications.

Reproduce Experiments & Reports

Time travel is crucial in data science and machine learning. For data scientists, reproducibility of experiments and models is crucial since they may develop hundreds or even thousands of models before putting one into production.

Throughout that process, they may sometimes need to return to prior models. This is extremely difficult to do because data science tools and data management are often different.

By combining MLflow, an open-source platform for the machine learning lifecycle, with Delta’s time-travel capabilities, Databricks addresses the reproducibility issue.

You can easily monitor which version of the data was used for each training job by adding a timestamped URL to the route as an MLflow option, enabling reproducible ML training. This allows you to replicate previous models by returning to previous datasets and parameters.

You don’t have to worry about copying data for other trials or coordinating with teams upstream on the data.

Rollbacks

Time travel also opens the door to simple rollbacks in the event of poor writing. For instance, you can quickly repair the pipeline if a malfunction causes user data to be inadvertently erased during a GDPR pipeline job using this:

Copy Code

INSERT INTO my_table
SELECT * FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1)
WHERE userId = 111

And here’s how you can repair erroneous updates:

Copy Code

MERGE INTO my_table target
USING my_table TIMESTAMP AS OF date_sub(current_date(), 1) source
ON source.userId = target.userId
WHEN MATCHED THEN UPDATE SET *

Pinned View of a Continuously Updating Delta Table Across Multiple Downstream Jobs

You can now pin the snapshot of an ever-updating Delta table for several downstream tasks using AS OF queries.

Imagine a scenario where a downstream operation reads from a Delta table and updates several destinations regularly. The Delta table is updated continually, every 15 seconds. In these situations, you usually want the source Delta table to have a consistent view so that all destination tables show the same information.

You can deal with such situations easily:

Copy Code

version = spark.sql("SELECT max(version) FROM (DESCRIBE HISTORY my_table)").collect()

# Will use the latest version of the table for all operations below

data = spark.table("my_table@v%s" % version[0][0]

data.where("event_type = e1").write.jdbc("table1")
data.where("event_type = e2").write.jdbc("table2")
...
data.where("event_type = e10").write.jdbc("table10")

Queries for Time Series Analytics Made Simple

Time travel also makes time series analytics simpler. Here’s a very basic query you might use to find out how many new clients you brought on in the past week:

Copy Code

SELECT count(distinct userId) - (
SELECT count(distinct userId)
FROM my_table TIMESTAMP AS OF date_sub(current_date(), 7))
FROM my_table

Single Table Time Travel vs. Data Version Control with lakeFS

How does the single table time travel capability differ between Databricks and lakeFS?

lakeFS enables you to manage your data like code, whereas Delta tables let you travel back in time for a specific table. The former permits time travel of all those tables simultaneously and maintains a store of delta tables, which can even number in the hundreds of thousands. You can see a snapshot of every table in the repository at a given moment by returning to that particular point.

Moreover, changes made to your data may be committed, giving you always-accessible snapshots. To work with an isolated data environment, you can create a branch in a repository and merge data modifications to your primary production branch.

It’s also critical to remember that losing history is a prerequisite for Delta table compaction. With data version management systems, historical data is never lost until absolutely necessary. Compaction further reduces the amount of history that may be accessed in Delta time travel.

When using compaction on Delta tables with lakeFS, the history of those tables is preserved in all prior commits; only the history of the specific commit you made is lost.

Although Delta tables provide strong per-table time travel capabilities, lakeFS offers a more complete and adaptable data management solution. Its excellent capabilities for isolated settings, Write-Audit-Publish, debugging, and ML reproducibility, together with its format-agnostic nature, make it an indispensable solution for contemporary data science and data engineering operations.

lakeFS facilitates safe data promotion, isolated environments, and repeatable operations, expediting the creation and implementation of data/AI solutions. This promotes secure and effective cooperation while guaranteeing high data integrity and quality standards.

Conclusion

In Delta, time travel significantly increases developer productivity. It helps data scientists do their research more effectively. Data engineers remove errors from their writing and streamline their workflows, while data analysts make reporting simple.

Ultimately, time travel capabilities help teams standardize on a pristine, centralized, versioned big data repository in their cloud storage. And you can easily expand the single table time travel offered by Databricks with more data version control capabilities from lakeFS!
Check out lakeFS for Databricks to increase your data engineering velocity and reap benefits from full-scale version control for data.

Delta Time Travel in Databricks: How It Works

What is Delta Lake Time Travel?

Delta Time Travel Syntax

Different Ways of Time Travel in Databricks

1. Timestamp

2. Version Number

How to Recover a Delta Lake Table Using Databricks Time Travel

Identify the Correct Version

Restore the Table

Using Databricks Time Travel: Examples

Fixing accidental deletes to a table

Fix incorrect updates to a table

Common Challenges With Changing Data

Auditing Data Changes

Reproducing Experiments & Reports

Rollbacks

Databricks Time Travel Key Functionalities

Audit Data Changes

Reproduce Experiments & Reports

Rollbacks

Pinned View of a Continuously Updating Delta Table Across Multiple Downstream Jobs

Queries for Time Series Analytics Made Simple

Single Table Time Travel vs. Data Version Control with lakeFS

Conclusion

Pro Tip: Work with Databricks widgets and dynamically query different branches

Need help getting started?

lakeFS

Delta Time Travel in Databricks: How It Works

What is Delta Lake Time Travel?

Delta Time Travel Syntax

Different Ways of Time Travel in Databricks

1. Timestamp

2. Version Number

How to Recover a Delta Lake Table Using Databricks Time Travel

Identify the Correct Version

Restore the Table

Using Databricks Time Travel: Examples

Fixing accidental deletes to a table

Fix incorrect updates to a table

Common Challenges With Changing Data

Auditing Data Changes

Reproducing Experiments & Reports

Rollbacks

Databricks Time Travel Key Functionalities

Audit Data Changes

Reproduce Experiments & Reports

Rollbacks

Pinned View of a Continuously Updating Delta Table Across Multiple Downstream Jobs

Queries for Time Series Analytics Made Simple

Single Table Time Travel vs. Data Version Control with lakeFS

Conclusion

Related articles

Iceberg REST Catalog Alternatives: Top Options & How to Choose The Best One For Your Team

lakeFS Top 10 Defining Product Milestones in 2025

How CytoReason Streamlined Nextflow with lakeFS for Smarter Data Pipelines

Pro Tip: Work with Databricks widgets and dynamically query different branches

lakeFS

Pick up the Slack with lakeFS