A simple yet powerful program that searches for unique duplicate hashed files on your Windows, mac0S or Linux computer

Import os, hashlib, argparse and time modules
Create a global dictionary of hashed files, a global dictionary of duplicate files, a global ignore list of directories (can add to this list if needed) and a timer
Create a function called compute_hash that computes and returns the hash value for given file by opening and reading the binary file, then computing SHA-1 hash. If file cannot be opened, return false
Verify if the path contains a directory that is in the ignore_directory_list
Split at path (os.sep) will automatically split the path at \ for Windows and / for macOS and Linux
Add the .lower() method to convert all directories of path to lowercase and remove empty items ('') associated with the starting and ending points of a path (\ or /)
Create a function called file_duplicates, which takes three parameters: starting_directory_name, duplicate_files_txt, and all_files_csv
Create a counter and if a file exists, create a separate text file which will store all of the duplicate_files
Get the complete path of file by looping through and using the walk function in the os module, with starting_directory_name as the parameter, then join the root and file
Process the counter and process the file size for each file
Find the hash of complete_file_path, and if that hashed_file_path does not exist, skip it
If hashed file path is not already in hashed_files_dict, append the complete_file_path (key) and the hashed_file_path (value) to duplicate_files_dict
If there are multiple hashed_file_paths (value), assign the complete_file_path (key) to matching_file
If the matching file key already exists in duplicate_files_dict, skip it
Append matching_file (key) with hashed_file_path (value) to duplicate_files_dict
Write to a text file that contains duplicate files using the key (complete_file_path) from duplicate_files_dict
Create an optional csv file and write all of the files in path to it (hashed files and duplicate files)
Merge the duplicate_files and hashed_files dictionaries
Return unique dictionary keys (complete_file_paths) of duplicate files
Create parsed arguments for four parameters: starting directory, duplicates file, an optional file for all files, and an optional parameter for user created ignored directories
If the ignore_directories argument was provided by the user, parse and append those ignored directories to the ignore_directories_list
Allow users to add separate ignored directories by ','
The file_duplicates function will have a starting directory name, a text file for duplicates, and an optional csv file for all of the files
Find out how long the program takes to run from start to finish via processing time, not clock time
Run program using sudo to run as super-user, granting full administrator privileges, then enter computer password

Notes:

*Must be running Python 3.6 or above to run this program

*Can be run on Windows, macOS and Linux

*If running via sudo, must use python3, unless an alias is created using the command: alias sudo='sudo '

*Run program by providing a starting directory (--directory), a written file for duplicates (--out_duplicate_files), an optional written file for all of the files(--out_processed_files), and an optional user created ignored directories parameter (--ignore_directories)

*This has been thoroughly tested on all of the operating systems listed above. The largest file size was on a Windows machine that had a 6GB file inside of its directory, and the total processing run time for the program was five minutes (100,000 files)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.gitignore		.gitignore
README.md		README.md
duplicate_files.py		duplicate_files.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A simple yet powerful program that searches for unique duplicate hashed files on your Windows, mac0S or Linux computer

About

Uh oh!

Releases

Packages

Languages

kushalbhatia/duplicate-files

Folders and files

Latest commit

History

Repository files navigation

A simple yet powerful program that searches for unique duplicate hashed files on your Windows, mac0S or Linux computer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages