GitHub - vuolter/deplicate at v0.6.0

Name	Name	Last commit message	Last commit date
Latest commit History 56 Commits
duplicate	duplicate
.gitignore	.gitignore
.scrutinizer.yml	.scrutinizer.yml
.travis.yml	.travis.yml
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
README.md	README.md
README.rst	README.rst
VERSION	VERSION
banner.png	banner.png
setup.cfg	setup.cfg
setup.py	setup.py
tox.ini	tox.ini

Advanced Duplicate File Finder for Python. Nothing is impossible to solve.

Status
Description
Features
Installation
Usage
- Quick Examples
- Advanced Examples
API Reference
- Properties
- Methods

Status

Description

Use deplicate to find out all the duplicated files in one or more directories, you can also scan a bunch of files directly.

deplicate is written in Pure Python and requires just a couple of dependencies to work fine, depending on your system.

Features

Installation

Type in your command shell with administrator/root privileges:

pip install deplicate[full]

In Unix-based systems, this is generally achieved by superseding the command sudo.

sudo pip install deplicate[full]

The option full ensures that all the optional packages will downloaded and installed as well as the mandatory dependencies.

You can install just the main package typing:

pip install deplicate

If the above commands fail, consider installing it with the option --user:

pip install --user deplicate

Usage

Import in your python script the new available module duplicate and call its function find.

Quick Examples

Note: By default directory paths are scanned recursively.

Note: By default files smaller than 100 MiB are not scanned.

Scan a single directory for duplicates:

import duplicate

duplicate.find('/path/to/dir')

Scan more directories together:

import duplicate

duplicate.find('/path/to/dir1', '/path/to/dir2', '/path/to/dir3')

Scan from iterable:

import duplicate

iterable = ['/path/to/dir1', '/path/to/dir2', '/path/to/dir3']

duplicate.find.from_iterable(iterable)

Scan ignoring the minimum file size threshold:

import duplicate

duplicate.find('/path/to/dir', minsize=0)

Resulting output will be always a list of lists of file paths, where each list collects together all the same files (aka. the duplicates):

[
    ['/path/to/dir1/file1', '/path/to/file1', '/path/to/dir2/subdir1/file1]',
    ['/path/to/dir2/file3', '/path/to/dir2/subdir1/file3']
]

Note: File paths are returned in canonical form.

Note: Lists of file paths are sorted in descending order by length.

Advanced Examples

Scan single files, not-recursively:

import duplicate

duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/dir1',
               recursive=False)

Note: In not-recursive mode, like the case above, directory paths are simply ignored.

Scan from iterable checking file names and hidden files:

import duplicate

iterable = ['/path/to/dir1', '/path/to/file1']

duplicate.find.from_iterable(iterable, comparename=True, scanhidden=True)

Scan excluding python files:

import duplicate

duplicate.find('/path/to/dir', exclude="*.py")

Scan including symbolic links of files:

import duplicate

duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/file3',
               scanlinks=True)

API Reference

Properties

duplicate.DEFAULT_MINSIZE
- Description: Default minimum file size in Bytes.
- Value: 102400
duplicate.DEFAULT_SIGNSIZE
- Description: Default file signature size in Bytes.
- Value: 512
duplicate.MAX_BLKSIZES_LEN
- Description: Default maximum number of cached block size values.
- Value: 128

Methods

duplicate.clear_blkcache()
- Description: Clear the internal blksizes cache.
- Return: None.
- Parameters: None.
duplicate.find(paths, minsize=None, include=None, exclude=None, comparename=False, comparemtime=False, compareperms=False, recursive=True, followlinks=False, scanlinks=False, scanempties=False, scansystems=True, scanarchived=True, scanhidden=True, signsize=None, onerror=None)
- Description: Find duplicate files.
- Return: Nested lists of paths of duplicate files.
- Parameters:
  - paths – Iterable of directory or file paths.
  - minsize – (optional) Minimum size of files to include in scanning (default to DEFAULT_MINSIZE).
  - include – (optional) Wildcard pattern of files to include in scanning.
  - exclude – (optional) Wildcard pattern of files to exclude from scanning.
  - comparename – (optional) Check file name.
  - comparemtime – (optional) Check file modification time.
  - compareperms – (optional) Check file mode (permissions).
  - recursive – (optional) Scan directory recursively.
  - followlinks – (optional) Follow symbolic links pointing to directory.
  - scanlinks – (optional) Scan symbolic links pointing to file.
  - scanempties – (optional) Scan empty files.
  - scansystems – (optional) Scan OS files.
  - scanarchived – (optional) Scan archived files.
  - scanhidden – (optional) Scan hidden files.
  - signsize – (optional) Size of Bytes to read from file as signature (default to DEFAULT_SIGNSIZE).
  - onerror – (optional) .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of contents

Status

Description

Features

Installation

Usage

Quick Examples

Advanced Examples

API Reference

Properties

Methods

© 2017 Walter Purcaro [email protected]

About

Uh oh!

Releases 2

Uh oh!

Languages

License

vuolter/deplicate

Folders and files

Latest commit

History

Repository files navigation

Table of contents

Status

Description

Features

Installation

Usage

Quick Examples

Advanced Examples

API Reference

Properties

Methods

© 2017 Walter Purcaro [email protected]

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Languages