0% found this document useful (0 votes)
87 views18 pages

Data Extraction From Hand Filled Forms Using Ocr

The document discusses extracting data from handwritten forms using optical character recognition (OCR). It proposes a methodology using UiPath studio to automate the extraction of data from scanned handwritten forms. The key steps include importing OCR packages, creating a taxonomy, digitizing documents, performing data extraction using scopes, and exporting the extracted data to Excel. Some limitations of OCR include difficulties in understanding formatting, content, and context from poor quality documents. Future work could explore combining OCR with artificial intelligence for more accurate extraction.

Uploaded by

Shinchan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views18 pages

Data Extraction From Hand Filled Forms Using Ocr

The document discusses extracting data from handwritten forms using optical character recognition (OCR). It proposes a methodology using UiPath studio to automate the extraction of data from scanned handwritten forms. The key steps include importing OCR packages, creating a taxonomy, digitizing documents, performing data extraction using scopes, and exporting the extracted data to Excel. Some limitations of OCR include difficulties in understanding formatting, content, and context from poor quality documents. Future work could explore combining OCR with artificial intelligence for more accurate extraction.

Uploaded by

Shinchan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

DATA EXTRACTION FROM HAND

FILLED FORMS USING OCR

Jamuna J (20BCC016) Parvathi P


Pavithrani S (20BCC027) Assistant Professor
II B.Sc. Computer Science with Department of Computer Science
Cognitive Systems with Cognitive Systems

1
Contents

 OCR
 RPA OCR
 Flowchart of Proposed work
 Proposed Methodology
 Limitations
 Future work
 References

Department of Cognitive Systems , PSGR Krishnammal College for Women 2


Optical Character Recognition

 A well-educated person can easily glance at a piece of paper and read its
contents, but having a computer do the same is far more difficult than most
people believe.
 To identify each individual letter, one must first have a digital image of the
text, process it to remove extraneous information, and then use a computer to
locate and segment the characters.
 Only then will it be able to generate a series of machine-readable characters
as an output[1].
 This procedure is known as optical character recognition (OCR).

Department of Cognitive Systems , PSGR Krishnammal College for Women 3


RPA OCR
 Robotic Process Automation (RPA) is a cutting-edge technology in the fields of
computer science that uses softbots to automate manual tasks.
 RPA Tools are pieces of software that allow you to set up jobs to be automated.
The leading tools are UiPath, Automation Anywhere, Blue Prism[2].
 Optical character recognition (OCR) is a key feature of any good robotic process
automation (RPA) solution[3].
 The time-consuming activities involved with manually turning these invoices into
legible data can be automated using an OCR engine that works alongside and
within the RPA platform.
 A practical example of an RPA OCR use case might be extracting information from
a scanned customer insurance claim form .
Department of Cognitive Systems , PSGR Krishnammal College for Women 4
Flowchart of Proposed Work

Figure 1. Flowchart of Proposed Work

Department of Cognitive Systems , PSGR Krishnammal College for Women 5


Proposed Methodology

Figure 2. Handwritten Form

Department of Cognitive Systems , PSGR Krishnammal College for Women 6


STEPS

 Step-1: Open UiPath studio and create a project with a name and
description.
 Step-2: Click on open main workflow.
 Step 3: Import both Uipath.documentunderstanding.ML.Activities and
Uipath.IntelligentOCR.Activities packages from manage packages.
 Step 4: Drag and drop the Sequence and Load Taxonomy activities to the
main workflow window and create Taxonomy variable.

Department of Cognitive Systems , PSGR Krishnammal College for Women 7


 Step 5: Open Taxonomy Manager and fill the name of group, category,
document type and fields you want to create as shown in Figure.3 .

Figure 3. Taxonomy Manager

Department of Cognitive Systems , PSGR Krishnammal College for Women 8


 Step 6: Drag the Digitize Document activity and create appropriate
variables. Also add UiPath Document OCR to it .
 Step 7: In the UiPath Document OCR’s property box, add the API key
copied from your UiPath website.
 Step 8: Drag the Data Extraction Scope activity and copy the Document
Type Id from the File Explorer in project tab and paste it.
 Step 9: Also add the Intelligent Form Extractor activity and paste the same
API key.

Department of Cognitive Systems , PSGR Krishnammal College for Women 9


 Step 10: Click on Manage Template then click create template and in the create
template wizard add all the details. Click Configure.

Figure 4. Workflow

Department of Cognitive Systems , PSGR Krishnammal College for Women 10


 Step 11: Add both Export Extraction Result and For Each activities to
the main workflow.
 Step 12: Within For Each activity, add Work Range activity and name
your file with extension.
 Step 13: Set the path of your hand written form as default in the
Document Path variable.
 Step 14: Click on Debug file and click run.
 Step 15: In the project tab under Document Processing you can find the
created Excel sheet.

Department of Cognitive Systems , PSGR Krishnammal College for Women 11


VALIDATION STATION

Figure 5. Validation Station

Department of Cognitive Systems , PSGR Krishnammal College for Women 12


Result

Figure 6. Excel Sheet

Department of Cognitive Systems , PSGR Krishnammal College for Women 13


LIMITATIONS
 OCR lacks to understand three most crucial aspects of data processing:
 Formatting
 Content
 Context
 More importantly, If the original document is of poor quality or the
handwriting is difficult to read, more mistakes will occur[4].
 For instance ,the handwritten form that we scanned had some errors like
the last letter in Policy number was scanned as 2 instead of the letter Z.
 These errors has to be corrected manually.

Department of Cognitive Systems , PSGR Krishnammal College for Women 14


FUTURE WORK
 Artificial Intelligence (AI) has a ability to have a spatial understanding of a
given document which means it understands what to look for in the document
and where exactly to find it, even if the position changes, similar to how a
human would.
 Basically AI works irrespective of the template & delivers a more accurate
result.
 Combining AI with OCR is proving to be a successful data collecting and
management method.
 While AI-based OCR solutions may not be as flashy as other transformational
technologies, they will undoubtedly have a significant influence on the bottom
line of businesses who use them[5].

Department of Cognitive Systems , PSGR Krishnammal College for Women 15


REFERENCES

[1] - K. A. Barchard and L. A. Pace, “Preventing human error: The impact of data entry
methods on data accuracy and statistical results,” Comput. Human Behav., vol. 27, no. 5, pp.
1834–1839, 2011, doi: 10.1016/j.chb.2011.04.004.
[2] - https://www.edureka.co/blog/what-is-robotic-process-automation/
[3] - https://www.nice.com/guide/rpa/rpa-ocr-elevating-process-automation
[4] - https://medium.com/@CereLabs/the-technology-that-is-better-than-ocr-354e989cb270
[5] - https://www.information-age.com/optical-character-recognition-tools-ocr-ai-123479324/

Department of Cognitive Systems , PSGR Krishnammal College for Women 16


QUERIES ?

Department of Cognitive Systems , PSGR Krishnammal College for Women 17


Department of Cognitive Systems , PSGR Krishnammal College for Women 18

You might also like