|
| 1 | +Python Tesseract |
| 2 | +================ |
| 3 | + |
| 4 | +Python-tesseract is an optical character recognition (OCR) tool for python. |
| 5 | +That is, it will recognize and "read" the text embedded in images. |
| 6 | + |
| 7 | +Python-tesseract is a wrapper for `Google's Tesseract-OCR Engine`_. It is also useful as a |
| 8 | +stand-alone invocation script to tesseract, as it can read all image types |
| 9 | +supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff, |
| 10 | +and others, whereas tesseract-ocr by default only supports tiff and bmp. |
| 11 | +Additionally, if used as a script, Python-tesseract will print the recognized |
| 12 | +text in stead of writing it to a file. Support for confidence estimates and |
| 13 | +bounding box data is planned for future releases. |
| 14 | + |
| 15 | +.. _Google's Tesseract-OCR Engine: https://github.com/tesseract-ocr/tesseract |
| 16 | + |
| 17 | +USAGE |
| 18 | +----- |
| 19 | +:: |
| 20 | + |
| 21 | + try: |
| 22 | + import Image |
| 23 | + except ImportError: |
| 24 | + from PIL import Image |
| 25 | + import pytesseract |
| 26 | + print(pytesseract.image_to_string(Image.open('test.png'))) |
| 27 | + print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra')) |
| 28 | + |
| 29 | +INSTALLATION |
| 30 | +------------ |
| 31 | + |
| 32 | +Prerequisites: |
| 33 | + |
| 34 | +- Python-tesseract requires python 2.5+ or python 3.x |
| 35 | +- You will need the Python Imaging Library (PIL) (or the Pillow fork). |
| 36 | + Under Debian/Ubuntu, this is the package **python-imaging** or **python3-imaging**. |
| 37 | +- Install `Google Tesseract OCR <https://github.com/tesseract-ocr/tesseract>`_ |
| 38 | + (additional info how to install the engine on Linux, Mac OSX and Windows). |
| 39 | + You must be able to invoke the tesseract command as *tesseract*. If this |
| 40 | + isn't the case, for example because tesseract isn't in your PATH, you will |
| 41 | + have to change the "tesseract_cmd" variable at the top of *tesseract.py*. |
| 42 | + Under Debian/Ubuntu you can use the package **tesseract-ocr**. |
| 43 | + |
| 44 | +Installing via pip: |
| 45 | +See the `pytesseract package page <https://pypi.python.org/pypi/pytesseract>`_. |
| 46 | +:: |
| 47 | + |
| 48 | + $ (env)> pip install pytesseract |
| 49 | + |
| 50 | +Installing from source: |
| 51 | +:: |
| 52 | + |
| 53 | + $> git clone [email protected]:madmaze/pytesseract.git |
| 54 | + $ (env)> python setup.py install |
| 55 | + |
| 56 | +LICENSE |
| 57 | +------- |
| 58 | +Python-tesseract is released under the GPL v3. |
| 59 | + |
| 60 | +CONTRIBUTERS |
| 61 | +------------ |
| 62 | +- Originally written by `Samuel Hoffstaetter <https://github.com/hoffstaetter>`_ |
| 63 | +- `Juarez Bochi <https://github.com/jbochi>`_ |
| 64 | +- `Matthias Lee <https://github.com/madmaze>`_ |
| 65 | +- `Lars Kistner <https://github.com/Sr4l>`_ |
0 commit comments