This Golang based project provides a microservice that offers a REST API and a Web view to convert PDF's and Images to Text, using Tesseract OCR scanner.
Just a proof-of-concept at this point. For future development it will be split in a multi-tier application architecture for better escalability - again for instructional purposes.
docker compose up --build
The service provides some minimalistic webviews to use the functionalities.
http://localhost:8080/web/pdf
http://localhost:8080/web/img
http://localhost:8080/api/v1/documents/pdf/ocr-scan
http://localhost:8080/api/v1/documents/img/ocr-scan
This projects uses the following SDK's:
- Tesseract OCR : OCR Engine
- GhostScript: PDF interpreter used to convert PDF to a set of images (per page)
(C) 2024 Simone Chiorazzo