Skip to content

feat: introduce VacuumMode::Full for cleaning up orphaned files #3368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 3, 2025

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Apr 7, 2025

This allows an optional but not-on-by-default mode of removing untracked
files in the delta table directory. Delta/Spark supports a "lite" and
"full" mode for vacuum. This change is intentionally not making "full"
the default as it is for Delta/Spark since that may have unintended
consequences for our users who have become accustomed to "lite" being
the default.

Fixes #2349

Signed-off-by: R. Tyler Croy [email protected]

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Apr 7, 2025
@rtyler rtyler force-pushed the feature/full-vacuum-engage branch from d56badf to 1cc4289 Compare April 7, 2025 00:29
@rtyler rtyler enabled auto-merge April 7, 2025 00:29
@rtyler rtyler marked this pull request as draft April 7, 2025 14:41
auto-merge was automatically disabled April 7, 2025 14:41

Pull request was converted to draft

@rtyler rtyler force-pushed the feature/full-vacuum-engage branch from 1cc4289 to c734263 Compare April 7, 2025 15:33
@rtyler rtyler assigned rtyler and unassigned rtyler Apr 7, 2025
@rtyler rtyler marked this pull request as ready for review April 7, 2025 15:35
@rtyler rtyler enabled auto-merge April 7, 2025 15:36
Copy link

codecov bot commented Apr 7, 2025

Codecov Report

Attention: Patch coverage is 74.24242% with 17 lines in your changes missing coverage. Please review.

Project coverage is 72.01%. Comparing base (b30c2ed) to head (010fb04).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/operations/vacuum.rs 80.32% 0 Missing and 12 partials ⚠️
python/src/lib.rs 0.00% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3368   +/-   ##
=======================================
  Coverage   72.01%   72.01%           
=======================================
  Files         148      148           
  Lines       46082    46137   +55     
  Branches    46082    46137   +55     
=======================================
+ Hits        33184    33225   +41     
- Misses      10791    10794    +3     
- Partials     2107     2118   +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 83 to 84
/// The `lite` mode will only remove files which are referenced in the `_delta_log` associagted
/// with `remove` action
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The databricks docs on this says something different and I still can't decipher what they truly mean, is there some spec on this?

/// The `lite` mode will only remove files which are referenced in the `_delta_log` associagted
/// with `remove` action
#[default]
Lite,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also expose the mode to python 😊

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be up for adding this exposure after we merge? 😈

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@rtyler rtyler force-pushed the feature/full-vacuum-engage branch from c734263 to bcd4960 Compare April 8, 2025 23:25
@rtyler rtyler requested a review from fvaleye as a code owner April 8, 2025 23:25
@github-actions github-actions bot added the binding/python Issues for the Python package label Apr 8, 2025
@rtyler rtyler force-pushed the feature/full-vacuum-engage branch from bcd4960 to a868406 Compare April 9, 2025 00:47
ion-elgreco
ion-elgreco previously approved these changes Apr 16, 2025
ion-elgreco
ion-elgreco previously approved these changes Apr 17, 2025
@rtyler rtyler added this pull request to the merge queue Apr 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Apr 17, 2025
roeap
roeap previously approved these changes Apr 17, 2025
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// Type of Vacuum operation to perform
#[derive(Debug, Default, Clone, PartialEq)]
pub enum VacuumMode {
/// The `lite` mode will only remove files which are referenced in the `_delta_log` associagted
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously not a big deal, but "associagted" -> "associated"

@rtyler rtyler dismissed stale reviews from roeap and ion-elgreco via b5a9b0e May 3, 2025 18:49
@rtyler rtyler force-pushed the feature/full-vacuum-engage branch 2 times, most recently from b5a9b0e to 9d2ffd8 Compare May 3, 2025 18:53
@rtyler rtyler enabled auto-merge May 3, 2025 19:03
This allows an optional but not-on-by-default mode of removing untracked
files in the delta table directory. Delta/Spark supports a "lite" and
"full" mode for [vacuum]. This change is intentionally not making "full"
the default as it is for Delta/Spark since that may have unintended
consequences for our users who have become accustomed to "lite" being
the default.

Fixes delta-io#2349

[vacuum]: https://docs.delta.io/latest/delta-utility.html#remove-files-no-longer-referenced-by-a-delta-table

Signed-off-by: R. Tyler Croy <[email protected]>
@rtyler rtyler disabled auto-merge May 3, 2025 19:04
@rtyler rtyler force-pushed the feature/full-vacuum-engage branch from 9d2ffd8 to 010fb04 Compare May 3, 2025 19:04
@rtyler rtyler enabled auto-merge May 3, 2025 19:05
@rtyler rtyler added this pull request to the merge queue May 3, 2025
@rtyler rtyler self-assigned this May 3, 2025
Merged via the queue into delta-io:main with commit 13acb2f May 3, 2025
29 checks passed
@rtyler rtyler deleted the feature/full-vacuum-engage branch May 3, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vacuum command should support "full" mode too!
4 participants