1 unstable release
| 0.1.1 | Jul 16, 2025 |
|---|
#1157 in Filesystem
160KB
3.5K
SLoC
🎓 Pragmatic AI Labs | Join 1M+ ML Engineers
🔥 Hot Course Offers:
- 🤖 Master GenAI Engineering - Build Production AI Systems
- 🦀 Learn Professional Rust - Industry-Grade Development
- 📊 AWS AI & Analytics - Scale Your ML in Cloud
- ⚡ Production GenAI on AWS - Deploy at Enterprise Scale
- 🛠️ Rust DevOps Mastery - Automate Everything
🚀 Level Up Your Career:
- 💼 Production ML Program - Complete MLOps & Cloud Mastery
- 🎯 Start Learning Now - Fast-Track Your ML Career
- 🏢 Trusted by Fortune 500 Teams
Learn end-to-end ML engineering from industry veterans at PAIML.COM
RClean
A high-performance Rust-based disk cleanup tool that finds duplicate files and storage outliers.
Features
- Duplicate Detection: Find duplicate files using MD5 hashing with parallel processing
- Similar File Detection: Identify similar files using fuzzy matching algorithms
- Storage Outliers: Detect large files, hidden space consumers, and file patterns
- Cluster Analysis: Find groups of similar large files using DBSCAN clustering
- Fast Performance: Leverages Rust's parallelization with Rayon
- Multiple Output Formats: Table, JSON, CSV reports
- MCP Support: Can be used as an MCP (Model Context Protocol) server

Installation
# From source
git clone https://github.com/paiml/rclean.git
cd rclean
cargo install --path .
# Or directly from GitHub
cargo install --git https://github.com/paiml/rclean.git
Quick Start
# Scan current directory for duplicates
rclean
# Scan specific directory
rclean /path/to/directory
# Filter by pattern
rclean ~/Documents --pattern "*.pdf" --pattern-type glob
# Generate CSV report
rclean . --csv duplicate_report.csv
# Find similar files (fuzzy matching) with 70% similarity threshold
rclean ~/Documents --similarity 70
Storage Outliers Detection (NEW!)
Find files that are consuming disproportionate disk space:
# Find large file outliers
rclean outliers /path --min-size 100MB
# Find hidden space consumers (node_modules, .git, etc.)
rclean outliers ~ --check-hidden --format json
# Find file patterns (backups, logs, etc.)
rclean outliers . --check-patterns
# Export outliers report
rclean outliers . --csv outliers_report.csv
# Combine all features
rclean outliers ~ --min-size 50MB --check-hidden --check-patterns --top 50
# Enable clustering to find groups of similar large files
rclean outliers /path --cluster --cluster-similarity 80 --min-cluster-size 3
Outliers Detection Features:
- Statistical Analysis: Files that are X standard deviations larger than the mean
- Hidden Consumers: Detects node_modules, .git, .cache, and other known space hogs
- Pattern Detection: Finds groups of similar files (backup-001, backup-002, etc.)
- Cluster Analysis: Uses DBSCAN to find clusters of similar large files (e.g., different versions of the same document)
- Smart Recommendations: Provides cleanup suggestions for each type of outlier
Fuzzy Matching (Similarity Detection)
Find files that are similar but not identical:
# Find files with 70% or higher similarity
rclean ~/Documents --similarity 70
# Find similar Python files
rclean ~/code --pattern "*.py" --pattern-type glob --similarity 80
# Generate CSV report including similar files
rclean . --similarity 60 --csv similarity_report.csv
Use Cases:
- Different versions of documents (v1, v2, draft, final)
- Slightly modified code files
- Images with minor edits
- Reports with small updates
Advanced Pattern Matching
RClean supports ripgrep-style pattern matching:
Pattern Types
-
Literal (default): Simple string contains matching
rclean search --path . --pattern ".txt" -
Glob: Shell-style patterns
rclean search --path . --pattern "*.txt" --pattern-type glob rclean search --path . --pattern "**/*.rs" --pattern-type glob -
Regex: Full regular expression support
rclean search --path . --pattern "test_.*\.rs$" --pattern-type regex
Additional Options
--hidden: Include hidden files--no-ignore: Ignore .gitignore rules--max-depth <N>: Maximum directory depth to traverse
MCP Server Mode
RClean can run as an MCP server for integration with AI assistants:
# Run as MCP server
rclean # Will auto-detect MCP mode when piped
Building and Development
Quality Standards ✅
All lint checks now pass! The project follows PMAT (Production Manufacturing and Assembly Technology) quality standards with zero tolerance for warnings.
# Build and test
make all
# Development commands
make format # Format code
make lint # Run clippy linting (FIXED - passes cleanly!)
make lint-extreme # Run extreme linting with PMAT standards
make test # Run all tests
make test-examples # Run example tests (NEW!)
# Build variants
make build-release # Release build for production
# Quality assurance
make quality-gate # Run all quality checks
make format-check # Verify formatting
Recent Improvements (v0.1.1)
- ✅ Fixed all clippy warnings -
make lintnow passes without errors - ✅ Added example test coverage - New
make test-examplestarget - ✅ Improved code quality - Reduced function complexity with better abstractions
- ✅ Enhanced CI/CD readiness - All quality gates pass consistently
OS X Install
- Install rust via rustup
- Add to
~/.cargo/config
[target.x86_64-apple-darwin]
rustflags = [
"-C", "link-arg=-undefined",
"-C", "link-arg=dynamic_lookup",
]
[target.aarch64-apple-darwin]
rustflags = [
"-C", "link-arg=-undefined",
"-C", "link-arg=dynamic_lookup",
]
- run
make allin rclean directory
License
MIT
Dependencies
~31–46MB
~723K SLoC