Advanced Duplicate File Finder for Python. Nothing is impossible to solve.
Use deplicate to find out all the duplicated files in one or more directories, you can also scan a bunch of files directly.
deplicate is written in Pure Python and requires just a couple of dependencies to work fine, depending on your system.
- Optimized for speed
- N-tree parsing for low memory consumation
- Multi-threaded (partially)
- Raw drive data access to maximize I/O performances
- xxHash algorithm for fast file identification
- File size and signature checking for quick duplicate exclusion
- Extended file attributes scanning
- Multi-filtering
- Error handling
- Unicode decoding
- Safe from recursion loop
- SSD detection
- Multi-processing
- Fully documented
- PyPy support
-
Exif data scanning
Type in your command shell with administrator/root privileges:
pip install deplicate[full]
In Unix-based systems, this is generally achieved by superseding
the command sudo.
sudo pip install deplicate[full]
The option full ensures that all the optional packages will downloaded and
installed as well as the mandatory dependencies.
You can install just the main package typing:
pip install deplicate
If the above commands fail, consider installing it with the option
--user:
pip install --user deplicate
Import in your python script the new available module duplicate
and call its function find.
Note: By default directory paths are scanned recursively.
Note: By default files smaller than 100 MiB are not scanned.
Scan a single directory for duplicates:
import duplicate
duplicate.find('/path/to/dir')
Scan more directories together:
import duplicate
duplicate.find('/path/to/dir1', '/path/to/dir2', '/path/to/dir3')
Scan from iterable:
import duplicate
iterable = ['/path/to/dir1', '/path/to/dir2', '/path/to/dir3']
duplicate.find.from_iterable(iterable)
Scan ignoring the minimum file size threshold:
import duplicate
duplicate.find('/path/to/dir', minsize=0)
Resulting output will be always a list of lists of file paths, where each list collects together all the same files (aka. the duplicates):
[
['/path/to/dir1/file1', '/path/to/file1', '/path/to/dir2/subdir1/file1]',
['/path/to/dir2/file3', '/path/to/dir2/subdir1/file3']
]
Note: File paths are returned in canonical form.
Note: Lists of file paths are sorted in descending order by length.
Scan single files, not-recursively:
import duplicate
duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/dir1',
recursive=False)
Note: In not-recursive mode, like the case above, directory paths are simply ignored.
Scan from iterable checking file names and hidden files:
import duplicate
iterable = ['/path/to/dir1', '/path/to/file1']
duplicate.find.from_iterable(iterable, comparename=True, scanhidden=True)
Scan excluding python files:
import duplicate
duplicate.find('/path/to/dir', exclude="*.py")
Scan including symbolic links of files:
import duplicate
duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/file3',
scanlinks=True)
-
duplicate.DEFAULT_MINSIZE
- Description: Default minimum file size in Bytes.
- Value:
102400
-
duplicate.DEFAULT_SIGNSIZE
- Description: Default file signature size in Bytes.
- Value:
512
-
duplicate.MAX_BLKSIZES_LEN
- Description: Default maximum number of cached block size values.
- Value:
128
-
duplicate.clear_blkcache()
- Description: Clear the internal blksizes cache.
- Return: None.
- Parameters: None.
-
duplicate.find(
paths, minsize=None, include=None, exclude=None, comparename=False, comparemtime=False, compareperms=False, recursive=True, followlinks=False, scanlinks=False, scanempties=False, scansystems=True, scanarchived=True, scanhidden=True, signsize=None, onerror=None)- Description: Find duplicate files.
- Return: Nested lists of paths of duplicate files.
- Parameters:
paths– Iterable of directory or file paths.minsize– (optional) Minimum size of files to include in scanning (default toDEFAULT_MINSIZE).include– (optional) Wildcard pattern of files to include in scanning.exclude– (optional) Wildcard pattern of files to exclude from scanning.comparename– (optional) Check file name.comparemtime– (optional) Check file modification time.compareperms– (optional) Check file mode (permissions).recursive– (optional) Scan directory recursively.followlinks– (optional) Follow symbolic links pointing to directory.scanlinks– (optional) Scan symbolic links pointing to file.scanempties– (optional) Scan empty files.scansystems– (optional) Scan OS files.scanarchived– (optional) Scan archived files.scanhidden– (optional) Scan hidden files.signsize– (optional) Size of Bytes to read from file as signature (default toDEFAULT_SIGNSIZE).onerror– (optional) .

