Skip to content

Commit c550793

Browse files
authored
Added methodology, size mention, fixed typos
1 parent 739b4b3 commit c550793

File tree

1 file changed

+22
-11
lines changed

1 file changed

+22
-11
lines changed

README.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
1+
### To Potential Cloners and Master-Zip Downloaders
2+
This is a large repository. As of now, there are very few links and no torrents to the files contained within.
3+
The current repo size is 24GB - be aware!
14

25
![Probable Wordlists Logo](https://raw.githubusercontent.com/berzerk0/Probable-Wordlists/master/ProbableWordlistLogo.png)
36

7+
48
### Rev 1.1 Task List
5-
* [ ] Explain methodology
6-
* [ ] Make Quickfix for duplicates caused by newline and blankspace characters (not a full fix)
9+
* [x] Explain methodology
10+
* [ ] Make Quick-fix for duplicates caused by newline and blankspace characters (not a full fix)
711
* [ ] Upload WPA-Chunks
812
* [ ] Super-Zip the smaller files
913
* [ ] Make some torrents for the big files (didn't expect it to catch on!)
@@ -15,29 +19,34 @@
1519

1620
### Laser-Guided Wordlist Generator in the works as well.
1721

22+
There are some great wordlists out there, but I decided to amalgamate, edit, trim and create a few of my own.
23+
24+
I did not steal, phish, deceive or hack in any way to get hold of these passwords.
25+
All lines in these files were obtained through freely available means.
1826

1927
# Probable Wordlists
20-
Wordlists sorted by populatirty originally created for password generation and testing
28+
Wordlists sorted by popularity originally created for password generation and testing
2129

22-
### Why slog through an encylopedic, alphabetized wordlist when you can start with the words people are most likely to use?
30+
### Why slog through an encyclopedic, alphabetized wordlist when you can start with the words people are most likely to use?
31+
#### Methodology - The Why and How
2332

24-
There are some great wordlists out there, but I decided to amalgamate, edit, trim and create a few of my own.
33+
While I was able to locate a few Password Wordlists that were sorted by popularity, the vast majority of lists, especially the larger lists, were sorted alphabetically. This seems like a major practicality flaw! If we assume that the most common password is "password," and we are performing a dictionary attack using an English dictionary, we are going to have to slog from "aardvark" through "passover" to get to "password." Now I don't know off the top of my head just how common "aardvark" is as a password - but we could be wasting a lot of time by not starting with the most common password on our list!
2534

26-
I couldn't find many lists that weren't sorted alphabetically - it seems like a dictionary attack should reflect actual human behavior as opposed to the alphabet.
35+
I went to SecLists, Weakpass, and Hashes.org and downloaded nearly every single Wordlist containing real passwords I could find. These lists were huge, and I ended up with over 80 GB actual, human-generated and used passwords. These were split up among over 350 files of varying length, sorting scheme, character encoding, origin and other properties. I sorted these files, removed duplicates from within the files themselves, and prepared to join them all together.
2736

28-
I did not steal, phish, deceive or hack in any way to get hold of these passwords.
29-
All lines in these files were obtained through freely available means.
37+
Some of these lists were composed of the other lists, and some were exact duplicates. I took care to remove any exact duplicate files - we didn't need to have any avoidable false positives. __*If a password was found across multiple files, I considered this to be an approximation of its popularity.*__ If an entry was found in 5 files, it wasn't too popular. If an entry could be found in 300 files, it was very popular. Using Unix commands, I concatenated all the files into one giant file representing keys to over 4 billion secret areas on the web, and sorted them by number of appearances in the single file. From this, I was able to create a large wordlist sorted by popularity, not the alphabet.
3038

3139

3240
## Real-Passwords
33-
These are **REAL** passwords. Every once in a while, a popular site has a high-profile security leak and passwords are released freely across the internet.
41+
These are **REAL** passwords.
42+
Every once in a while, a popular site has a high-profile security leak and passwords are released freely across the internet.
3443
Some of these passwords can be found on aggregator sites where they are separated from usernames to protect the unfortunate victim.
3544

3645
The files in this folder come from https://github.com/danielmiessler/SecLists, https://weakpass.com/ and https://hashes.org/
37-
I've taken nearly all of the real passwords from these sites, combined them into a single list, removed duplicates and sorted by popularity.
3846

3947
*NOTE THAT FOR THIS FIRST VERSION, ALL NON-ASCII CHARACTERS HAVE BEEN REMOVED*
4048

49+
4150
A more inclusive, and thus, more accurate list is in the works.
4251

4352
Lists sorted by popularity will include "probable" in the filename
@@ -54,6 +63,8 @@ Wordlists including dictionaries, encyclopedic lists and miscellaneous.
5463
+ By using these lists, you agree to not hold the author responsible for your actions.
5564
+ By using these lists, you agree to these terms and are completely culpable for your own behavior.
5665

57-
+ The files contained in this repository are released "as is" without warranty, support, or guarantee of effectiveness. However, I am open to hearing about any issues found within these files and will be actively maintaining this repository for the forseeable future. If you find anything noteworthy, let me know and I'll see what I can do about it.
66+
+ The files contained in this repository are released "as is" without warranty, support, or guarantee of effectiveness. However, I am open to hearing about any issues found within these files and will be actively maintaining this repository for the foreseeable future. If you find anything noteworthy, let me know and I'll see what I can do about it.
5867

5968
+ This is released without license, but also without intent for commercial use.
69+
70+

0 commit comments

Comments
 (0)