Skip to content

orangain/content_extraction

Repository files navigation

content_extraction

Comparison of libraries to extract content from HTML

Libraries to Compare

Usage

Setup

$ ./download.sh

This will store html files in html dir.

For Python 2

$ pip install -r requirements.txt
$ python extract.py

This will extract contents in content_* dir.

For Python 3

$ pip install -r requirements.py3.txt
$ python extract_py3.py

This will extract contents in py3_content_* dir.

About

Comparison of libraries to extract content from HTML

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published