Skip to content

Commit be72501

Browse files
committed
Updates for publishing.
1 parent 344ba9e commit be72501

File tree

5 files changed

+72
-82
lines changed

5 files changed

+72
-82
lines changed

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ PY := .venv/bin/python
44
PIP := .venv/bin/pip
55
PEP8 := .venv/bin/pep8
66
NOSE := .venv/bin/nosetests
7-
TWINE := twine
7+
TWINE := .venv/bin/twine
88

99
# ###########
1010
# Tests rule!

README.md

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
[![PyPI version](https://img.shields.io/pypi/v/readability-lxml.svg)](https://pypi.python.org/pypi/readability-lxml)
2+
3+
# python-readability
4+
5+
Given an HTML document, extract and clean up the main body text and title.
6+
7+
This is a Python port of a Ruby port of [arc90's Readability project](https://web.archive.org/web/20130519040221/http://www.readability.com/).
8+
9+
## Installation
10+
11+
It's easy using `pip`, just run:
12+
13+
```bash
14+
$ pip install readability-lxml
15+
```
16+
17+
As an alternative, you may also use conda to install, just run:
18+
19+
```bash
20+
$ conda install -c conda-forge readability-lxml
21+
```
22+
23+
## Usage
24+
25+
```python
26+
>>> import requests
27+
>>> from readability import Document
28+
29+
>>> response = requests.get('http://example.com')
30+
>>> doc = Document(response.content)
31+
>>> doc.title()
32+
'Example Domain'
33+
34+
>>> doc.summary()
35+
"""<html><body><div><body id="readabilityBody">\n<div>\n <h1>Example Domain</h1>\n
36+
<p>This domain is established to be used for illustrative examples in documents. You may
37+
use this\n domain in examples without prior coordination or asking for permission.</p>
38+
\n <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>
39+
\n</body>\n</div></body></html>"""
40+
```
41+
42+
## Change Log
43+
- 0.8.4 Better CJK support, thanks @cdhigh
44+
- 0.8.3.1 Support for python 3.8 - 3.13
45+
- 0.8.3 We can now save all images via keep_all_images=True (default is to save 1 main image), thanks @botlabsDev
46+
- 0.8.2 Added article author(s) (thanks @mattblaha)
47+
- 0.8.1 Fixed processing of non-ascii HTMLs via regexps.
48+
- 0.8 Replaced XHTML output with HTML5 output in summary() call.
49+
- 0.7.1 Support for Python 3.7 . Fixed a slowdown when processing documents with lots of spaces.
50+
- 0.7 Improved HTML5 tags handling. Fixed stripping unwanted HTML nodes (only first matching node was removed before).
51+
- 0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3 - 3.6
52+
- 0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and 3.4
53+
- 0.4 Added Videos loading and allowed more images per paragraph
54+
- 0.3 Added Document.encoding, positive\_keywords and negative\_keywords
55+
56+
## Licensing
57+
58+
This code is under [the Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0) license.
59+
60+
## Thanks to
61+
62+
- Latest [readability.js](https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js)
63+
- Ruby port by starrhorne and iterationlabs
64+
- [Python port](https://github.com/gfxmonk/python-readability) by gfxmonk
65+
- [Decruft effort](https://web.archive.org/web/20110214150709/https://www.minvolai.com/blog/decruft-arc90s-readability-in-python/) to move to lxml
66+
- "BR to P" fix from readability.js which improves quality for smaller texts
67+
- Github users contributions.

README.rst

-78
This file was deleted.

requirements-dev.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
nose
1+
nose
2+
twine

setup.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ def find_version(*file_paths):
3737
author_email="[email protected]",
3838
description="fast html to text parser (article readability tool) with python 3 support",
3939
test_suite="tests.test_article_only",
40-
long_description=open("README.rst").read(),
41-
long_description_content_type='text/x-rst',
40+
long_description=open("README.md").read(),
41+
long_description_content_type="text/markdown",
4242
license="Apache License 2.0",
4343
url="http://github.com/buriy/python-readability",
4444
packages=["readability"],

0 commit comments

Comments
 (0)