2. forensics _- 22. bulk-extractor__
2. forensics _- 22. bulk-extractor__
bulk_extractor is a program that extracts features such as email addresses, credit card numbers, URLs, and other types of information from digital evidence
files. It is a useful forensic investigation tool for many tasks such as malware and intrusion investigations, identity investigations and cyber investigations, as
well as analyzing imagery and pass-word cracking. The program provides several unusual capabilities including:
It finds email addresses, URLs and credit card numbers that other tools miss because it can process compressed data (like ZIP, PDF and GZIP files) and
incomplete or partially corrupted data. It can carve JPEGs, office documents and other kinds of files out of fragments of compressed data. It will detect
and carve encrypted RAR files.
It builds word lists based on all of the words found within the data, even those in compressed files that are in unallocated space. Those word lists can be
useful for password cracking.
It is multi-threaded; running bulk_extractor on a computer with twice the number of cores typically makes it complete a run in half the time.
It creates histograms showing the most common email addresses, URLs, domains, search terms and other kinds of information on the drive.
bulk_extractor operates on disk images, files or a directory of files and extracts useful information without parsing the file system or file system structures. The
input is split into pages and processed by one or more scanners. The results are stored in feature files that can be easily inspected, parsed, or processed with
other automated tools.
bulk_extractor also creates histograms of features that it finds. This is useful because features such as email addresses and internet search terms that are
more common tend to be important.
In addition to the capabilities described above, bulk_extractor also includes:
A graphical user interface, Bulk Extractor Viewer, for browsing features stored in feature files and for launching bulk_extractor scans
A small number of python programs for performing additional analysis on feature files
Source: http://digitalcorpora.org/downloads/bulk_extractor/BEUsersManual.pdf
bulk-extractor Homepage | Kali bulk-extractor Repo
License: GPLv2
root@kali:~# bulk_extractor
bulk_extractor version 1.6.0-dev
Usage: bulk_extractor [options] imagefile
runs bulk extractor and outputs to stdout a summary of what was found where
Required parameters:
imagefile - the file to extract
or -R filedir - recurse through a directory of files
HAS SUPPORT FOR E01 FILES
HAS SUPPORT FOR AFF FILES
-o outdir - specifies output directory. Must not exist.
bulk_extractor creates this directory.
Options:
-i - INFO mode. Do a quick random sample and print a report.
-b banner.txt- Add banner.txt contents to the top of every output file.
-r alert_list.txt - a file containing the alert list of features to alert
(can be a feature file or a list of globs)
(can be repeated.)
-w stop_list.txt - a file containing the stop list of features (white list
(can be a feature file or a list of globs)s
(can be repeated.)
-F <rfile> - Read a list of regular expressions from <rfile> to find
-f <regex> - find occurrences of <regex>; may be repeated.
results go into find.txt
-q nn - Quiet Rate; only print every nn status reports. Default 0; -1 for no status at al
-s frac[:passes] - Set random sampling parameters
Tuning parameters:
-C NN - specifies the size of the context window (default 16)
-S fr:<name>:window=NN specifies context window for recorder to NN
-S fr:<name>:window_before=NN specifies context window before to NN for recorder
-S fr:<name>:window_after=NN specifies context window after to NN for recorder
-G NN - specify the page size (default 16777216)
-g NN - specify margin (default 4194304)
-j NN - Number of analysis threads to run (default 4)
-M nn - sets max recursion depth (default 7)
-m <max> - maximum number of minutes to wait after all data read
default is 60
Parallelizing:
-Y <o1> - Start processing at o1 (o1 may be 1, 1K, 1M or 1G)
-Y <o1>-<o2> - Process o1-o2
-A <off> - Add <off> to all reported feature offsets
Debugging:
-h - print this message
-H - print detailed info on the scanners
-V - print version number
-z nn - start on page nn
-dN - debug mode (see source code)
-Z - zap (erase) output directory
Control of Scanners:
-P <dir> - Specifies a plugin directory
Default dirs include /usr/local/lib/bulk_extractor /usr/lib/bulk_extractor and
BE_PATH environment variable
-e <scanner> enables <scanner> -- -e all enables all
-x <scanner> disable <scanner> -- -x all disables all
-E <scanner> - turn off all scanners except <scanner>
(Same as -x all -e <scanner>)
note: -e, -x and -E commands are executed in order
e.g.: '-E gzip -e facebook' runs only gzip and facebook
-S name=value - sets a bulk extractor option name to be value
Extract files to the output directory (-o bulk-out) after analyzing the image file (xp-laptop-2005-07-04-1430.img):