OCRAnno is a text annotation tool designed to provide annotated data to improve the results of OCR systems. These annotations will help us to create the ground-truth that will be used to train machine learning models that will automatically correct OCR-ed text. OCRAnno is being developed as part of a collaboration between the Institute of Informatics at UFRGS and Petrobras.
- Interface to annotate texts extracted using OCR;
- The system shows the source PDF document and the sentence that may need correction;
- Annotators can search for the sentence to identify where it occurs on the page;
- Random documents by annotator;
- If the original document is of poor quality and impossible to read, annotators can classify it as illegible;
- Tour for the annotators to get familiar with the interface;
- Admin controller to follow the annotation progress, with CRUD operations over the documents and users list.
OCRAnno was developed using the MVC open-source PHP web framework, Laravel 7.0. The requirements are the same as the framework, found in the documentation (version 7.x). Laravel has support to different databases, the chosen one was MySQL.
To run the project you will need to have installed Composer.
The database ER diagram, generated considering default Laravel tables, can be found in documentation/db-model.png. The creation of all tables is made with Laravel migrations during the project initialization, except the own database that has to be created manually.
After cloning the project, inside the project folder, create the file .env
, by copying and renaming the .env.exemple
file. This file has basic settings for Laravel. If necessary, you will need to configure the database connection, default admin user, and mail settings (to recover the user's password).
Before running the commands, you have to create the MySQL database named ocranno
, the same DB_DATABASE
informed in .env
file.
Finally, you can run the following commands:
$ composer install
$ php artisan migrate
$ php artisan key:generate
$ php artisan serve
The last command will generate a locally URL to access the system into the browser.
This link may provide extra information if needed.
With the system running, the user will find a interactive tour in the first access (which can be accessed again later if need) providing the necessary information to follow up with the annotations. The admin user has the same functionalities as the default user, but also has controller with CRUD operations over the documents, it also can list and search all users, documents and sentences. There are some screenshots available in documentation/
folder, where the menu links 'Sentences', 'Pages', 'Project' and 'Users' are restrict admin areas.
Laravel is a web application framework with expressive, elegant syntax. We believe development must be an enjoyable and creative experience to be truly fulfilling. Laravel takes the pain out of development by easing common tasks used in many web projects.
Laravel is accessible, powerful, and provides tools required for large, robust applications.
Laravel has the most extensive and thorough documentation and video tutorial library of all modern web application frameworks, making it a breeze to get started with the framework.
If you don't feel like reading, Laracasts can help. Laracasts contains over 1500 video tutorials on a range of topics including Laravel, modern PHP, unit testing, and JavaScript. Boost your skills by digging into our comprehensive video library.
The Laravel framework is open-sourced software licensed under the MIT license.
Developers: Lucas L Oliveira ([email protected])
Coordination: Viviane P Moreira ([email protected])