Useful Linux scripts for reference
- Download the latest leptonica from http://www.leptonica.org/download.html
- Extract, configure and install
./configure
sudo make
sudo checkinstall
For debian distros, packages can be downloaded from https://packages.ubuntu.com/
dpkg -x libXXX.deb $HOME/target_dir
The binaries, libs and pkgconfigs will now be in the target_dir. Make sure they are included in the PKG_CONFIG_PATH env variable (such as below) during installation
./autogen.sh
export LIBLEPT_HEADERSDIR=$HOME/local_packages/include
export PKG_CONFIG_PATH=$HOME/local_packages/lib/pkgconfig:$HOME/local_packages/usr/lib/x86_64-linux-gnu/pkgconfig:$HOME/local_packages/usr/share/pkgconfig
./configure --prefix=$HOME/local_packages/ --with-extra-libraries=$HOME/local_packages/lib --enable-debug
make -j10
make install -j10
export CPPFLAGS=-I$HOME/local_packages/include
export PKG_CONFIG_PATH=$HOME/local_packages/lib/pkgconfig
pip install tesserocr
LD_LIBRARY_PATH environment variable should be updated to point to where tesseract is installed. Eg. export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/local_packages/lib
To update environment variables for jupyter notebook in docker containers, modify kernel.json which is located here path_to_anaconda/envs/pytorch38/share/jupyter/kernels/python3. ( will be different depending on your setup).
kernel.json
"env": {"LD_LIBRARY_PATH":"$LD_LIBRARY_PATH:$HOME/local_packages/lib"}
The trained models can be found here. https://tesseract-ocr.github.io/tessdoc/Data-Files.html Download the desired language models and move them into the tessdata directory. Depending on where the tesseract installation path is, it should be located at $install_path/share/tessdata
- Set TESSDATA_PREFIX env variable to point to tessdata directory
- run make clean in tesstrain root to clean before training (may fix the issue with unable to read boxes in .tiff images)
pkg-config
pkg-config --cflags --libs pangocairo
Steps: need to install:
wget wget build-essential
can be quite buggy if image is loaded from existing .tar file
1. Create the container from image by running
docker run -it repo:version
2. Start the container
docker container start container_name
3. Attach to container
docker container attach container_name
Note:
docker run repo:version or docker run -d repo:version
causes some weird issues ( container freezes )
docker run --name tess ubuntu:focal
# Hack. Perform some action first, eg. create /app folder otherwise
# docker container can't be started in the background in detached mode
docker image ls -a
docker container ls -a
docker container start container_name
docker attach container_name (to get shell)
docker commit CONTAINER_NAME(or hash) NEW_IMAGE_NAME
docker save -o container.tar IMAGE_HASH
Container Images use:
- docker load
- docker save
Docker Containers:
- docker import
- docker export
docker images
docker run tess:5.0.1 (Creates container from image )
docker container ls
docker continer start container_name
# Attach to running container
docker attach container_name
singularity build tesseract-5.0.1.sif docker-archive://tesseract-5.0.1.tar
apptainer/singularity#5465
export TZ=Asia/Singapore
ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone