0% found this document useful (0 votes)

686 views

Tamil Morphological Analysis

The document summarizes a student project on developing an efficient rule-based system for morphological parsing of the Tamil language. It discusses the challenges of agglutination and inflections in Tamil morphology. The proposed solution uses a rule-based approach to analyze word inflections according to Tamil grammar rules, combined with a machine learning approach to resolve conflicts and optimize the analysis of recurring inflectional patterns. The project aims to enable downstream applications for Tamil like machine translation by performing accurate morphological parsing of text.

Uploaded by

Karthik Sankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

686 views

Tamil Morphological Analysis

Uploaded by

Karthik Sankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

An Efficient Rule-Based System for Morphological

Parsing of Tamil Language

தமிழ் உருபனியல் ஆய்வு

Final Semester Project

Department of Computer Science and Engineering
National Institute of Technology, Tiruchirappalli

May 2010

STUDENTS:
Karthik S 106106029
Praveen Kumar 106106045
Venkataraman GB 106106073

GUIDE:
Dr. V. Gopalakrishnan
Agenda
 Overview of the Project
 NLP Applications – The Stakeholders
 The problem at hand
 The proposed solution
◦ Rule – Based Morphological Analysis
◦ Machine Learning
 Where does it all fit in ?
 Need for Tamil Morphological Analysis
 Resources Obtained
 Implementation Details
 Demonstration
 Future Scope

1 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Overview of the Project
 Natural Language Processing
 Morphological Analysis
 Tamil Language

Morphing …

நடப்பான்
நடக்கின் நடக்கின்
… And in Tamil றான் றாள்
நடந்தான் நடந்தனர்

2 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
NLP Applications – The Stakeholders

WHO ARE THE STAKEHOLDERS ?

Natural Language Processing Applications like:
 Stemming
 Machine Translation
 Speech Recognition
 Information Retrieval

WHY ARE THESE APPLICATION THE STAKEHOLDERS ?

3 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
The problem at hand
Morphological Analysis of Tamil involves understanding the word structure and its
inflections
AGGLUTINATION IN TAMIL
 Agglutination is the morphological process of adding affixes to the base of a word
 Typical Tamil verb form will have a number of suffixes showing person, number,
mood, tense and voice.
INFLECTIONS IN TAMIL

திணை - Class பால் - Gender

எண் - Number

இடம் - Person காலம் - Tense

4 WHO WHAT WHERE WHY HOW

vAḷ - வாழ் intu - ந்து koṇṭu - கொண்டு irunta - இருந்த ēn - ஏன்

root voice marker tense marker aspect marker person marker

live past tense during past progressive first person,

object voice Singular

4 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
The proposed solution
There are two levels called lexical and surface levels. In the surface level, a
word is represented in its original orthographic form. In the lexical level, a
word is represented by denoting all of the functional components of the word.

SURFACE LEVEL LEXICAL LEVEL

RULE – BASED MORPHOLOGICAL ANALYSIS

Analyzing word inflections using rules specified in Tamil Grammar

அன் ஆன் அள் ஆள் அர் ஆர் பம்மார்

அஆ குடுதுறு என் ஏன் அல் அன்
அம் ஆம் எம் ஏம் ஓமொ டும்மூர் நன்னூல்
கடதற ஐ ஆய் இம்மின் இர்ஈர்
தொல்காப்பியம்
ஈயர் கயவு மென்பவும் பிறவும்
வினையின் விகுதி பெயரினும் சிலவே

5 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
The proposed solution
MACHINE LEARNING APPROACH
While checking for suffixes in a given word, more than one suffix might be
possible, if the rules are strictly followed. But only one suffix is semantically
possible.
விகுதி : படித்து – “உ” படித்தது – “து” or “உ” ???
1
M/L approach helps the system in “learning” the correct parsing method for the
word, and in the subsequent processing of the same word, the wrong
possibilities are automatically eliminated.

Two words might share the same inflectional part.

நடக்கின்றான் படிக்கின்றான்
2
The inflectional part of every word is learnt by the system. This helps in
optimization by eliminating the need to analyse the second word again from
scratch

6 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Where does it all fit in ?

Characters ப டி த் தா ன்

Word – Tokenization படித்தான்

Morphological Analysis படி - த்த் - ஆன்

Sentence Syntax Analysis அவன் புத்தகத்தைப்

படித்தான்

Semantic Analysis Meaning of the sentence ???

7 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Need for Tamil Morphological Analysis
ENGLISH vs. TAMIL
I came நான் வந்தேன்
You came நீ வந்தாய்
They came அவர்கள் வந்தனர்

He came அவன் வந்தான்

She came அவள் வந்தாள்

TRANSLATION AND SEMANTIC ANALYSIS

அவன் மதுரைக்கு வந்தாள் -- Semantically Wrong

To check semantic correctness of a sentence, morphological analysis is needed.

How to translate the above sentence ??

8 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Resources Obtained
EMILLE – CIIL TAMIL MONOLINGUAL CORPUS
 Enabling Minority Language Engineering
 Collaborative Venture of
◦ Lancaster University, UK
◦ Central Institute of Indian Languages (CIIL), Mysore, India
 Distributed by European Language Resources Association [ELRA]
TAMIL WORDNET
 The database is a semantic dictionary that is designed as a lexical network
 Developed by
◦ Department of Linguistics of Tamil University
◦ AU-KBC Research Centre, Chennai
 Tamil Wordnet resembles a traditional dictionary. It also contains valuable
information about morphologically related words

9 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Implementation Details - 1
Classify and Backward Scanning
Input Tamil Word
Remove Inflection of inflections

Check No Root
in DB verb ?

C-V Segmentation
Yes Yes

Output

Conflict Resolution
Machine Learning

10 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Implementation Details - 2
படித்தான்

ப டி த் தா ன்

ப் - அ ட் - இ த் த் - ஆ ன்

ப் அ ட் இ த் த் ஆ ன்

படி < VERB_ROOT >

த்த் < PAST TENSE >
ஆன் < 3SM >

11 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Implementation Details - 3
UNICODE SUPPORT FOR TAMIL
 U+0B80 – U+0BFF

GOOGLE TAMIL TRANSLITERATOR IME (Input Method)

 Google Transliteration IME is an input method editor which allows users to
enter text Tamil using a roman keyboard

PROGRAMMING LANGUAGE
 Java

DATABASES
 MySQL Databases, with JDBC to access the database

12 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Implementation Details - 3
TRANSLITERATION MODULE
 A simple Transliterator module - to enable conversion from Tamil to English
and vice-versa
 Example:
◦ அ - a
◦ ஆ - aa
◦ க - ka

HASH TABLE GENERATOR

 The application uses two data files, containing a list of vigudhi and idainilai.
 The Java Hash Generator Code loads the data from the workbooks, adds
them to a hash table, and serializes the data and outputs to an external data
file, which can be loaded whenever the application requires access.

13 WHO WHAT WHERE WHY HOW

12/08/2021 National Institute of Technology, Tiruchirappalli
Future Scope
 The algorithm can be extended to cover nouns and noun forms too.

 The algorithm can be improved to incorporate stricter rules so as to reduce

conflicts that arise in the output generated by the current system.

 The algorithm can be extended for other agglutinative languages.

 The various resources obtained as a part of this project, including the

EMILLE-CIIL ELRA Corpus, the Tamil Wordnet Database and other tools can
be used for further study, research and development in the field of Natural
Language Processing at our college in the years to come.

14
12/08/2021 National Institute of Technology, Tiruchirappalli
References
 A Novel Approach to Morphological Analysis for Tamil Language
◦ Anand kumar M1, Dhanalakshmi V1, Rajendran S2, Soman K P
 Nannool and Tholkaapiyam
◦ Tamil Grammar texts
 The Morphological Generator and Parsing Engine for Tamil Verb Forms.
◦ Ultimate Software Solution, Dindigul
 Morphological Analyzer for Tamil
◦ Anandan. P, Ranjani Parthasarathy, Geetha T.V. [2002]
◦ ICON 2002, RCILTS-Tamil, Anna University, India.
 Morphology. A Handbook on Inflection and Word Formation
◦ Daelemans Walter, G. Booij, Ch. Lehmann, and J. Mugdan (eds.) [2004]
 Tamil Part-of-Speech tagger based on SVMTool
◦ Dhanalakshmi V, Anandkumar M, Vijaya M.S, Loganathan R, Soman K.P, Rajendran S [2008]
◦ Proceedings of the COLIPS International Conference on Asian Language Processing 2008 (IALP).
 Unsupervised Learning of the Morphology of a Natural Language.
◦ John Goldsmith. [2001]
◦ Computational Linguistics, 27(2):153–198.
 Computational morphology of verbal complex
◦ Rajendran, S., Arulmozi, S., Ramesh Kumar, Viswanathan, S. [2001]
15
◦ Paper read in Conference at Dravidan University, Kuppam, December 26-29, 2001.
12/08/2021 National Institute of Technology, Tiruchirappalli
Thank you

12/08/2021 National Institute of Technology, Tiruchirappalli

Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Rubric of The Glossary
100% (2)
Rubric of The Glossary
1 page
Pitman Shorthand Guide English Tamil Dictionary
No ratings yet
Pitman Shorthand Guide English Tamil Dictionary
431 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
P and NP Problems
No ratings yet
P and NP Problems
4 pages
12.2.1 Resolution Principle (1) : - Resolution Refutation Proves A Theorem by
No ratings yet
12.2.1 Resolution Principle (1) : - Resolution Refutation Proves A Theorem by
31 pages
logicAssignAnswers PDF
No ratings yet
logicAssignAnswers PDF
3 pages
HW 8
No ratings yet
HW 8
2 pages
Closure Properties of Context Free Languages (Proof)
No ratings yet
Closure Properties of Context Free Languages (Proof)
2 pages
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
No ratings yet
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
105 pages
Lecture 6 - State Space Search - Uninformed Search
No ratings yet
Lecture 6 - State Space Search - Uninformed Search
43 pages
Unit-2 (TOC) GGSIPU Previous Year Questions
No ratings yet
Unit-2 (TOC) GGSIPU Previous Year Questions
4 pages
Course File Compiler Design
No ratings yet
Course File Compiler Design
41 pages
TOC Assignment No-1
No ratings yet
TOC Assignment No-1
5 pages
Adversarial Search 2020
No ratings yet
Adversarial Search 2020
34 pages
UNIT - II Part 1 LC& LP
No ratings yet
UNIT - II Part 1 LC& LP
39 pages
Experiment 3: Aim: Generate or Functions Using Mcculloch-Pitts Neural Net by A Matlab Program
No ratings yet
Experiment 3: Aim: Generate or Functions Using Mcculloch-Pitts Neural Net by A Matlab Program
3 pages
NFA To DFA Example
No ratings yet
NFA To DFA Example
27 pages
AI Lab Manual Prolog Programs
No ratings yet
AI Lab Manual Prolog Programs
22 pages
Question TOC
100% (1)
Question TOC
6 pages
21CSC206T Unit3
100% (1)
21CSC206T Unit3
138 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
String Matching
100% (1)
String Matching
12 pages
Practice Questions For Chapter 3 With Answers
No ratings yet
Practice Questions For Chapter 3 With Answers
9 pages
CH 9: Connectionist Models
No ratings yet
CH 9: Connectionist Models
35 pages
Computer Networks Prof. Sujoy Ghosh Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 9 Sonet/Sdh
No ratings yet
Computer Networks Prof. Sujoy Ghosh Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 9 Sonet/Sdh
38 pages
Dwbi Unit 4 & 5
No ratings yet
Dwbi Unit 4 & 5
26 pages
Chapter Three
No ratings yet
Chapter Three
37 pages
02 - Digital Image Processing
No ratings yet
02 - Digital Image Processing
38 pages
Introduction To Algorithms: Chapter 3: Growth of Functions
No ratings yet
Introduction To Algorithms: Chapter 3: Growth of Functions
29 pages
Artificial Intelligence Course Code ECE4 PDF
No ratings yet
Artificial Intelligence Course Code ECE4 PDF
72 pages
DAA Unit3 Notes and QBank
100% (1)
DAA Unit3 Notes and QBank
37 pages
Handling of Categorical Data
No ratings yet
Handling of Categorical Data
18 pages
Graphs Assignment
No ratings yet
Graphs Assignment
5 pages
Operator Precedence Parsing in Compiler Design
No ratings yet
Operator Precedence Parsing in Compiler Design
6 pages
Informed Search Algorithms: UNIT-2
No ratings yet
Informed Search Algorithms: UNIT-2
35 pages
Java Total Notes BSC V Sem Paper V
No ratings yet
Java Total Notes BSC V Sem Paper V
103 pages
Matrices and Matrix Operations
No ratings yet
Matrices and Matrix Operations
51 pages
Association Analysis: Basic Concepts and Algorithms
No ratings yet
Association Analysis: Basic Concepts and Algorithms
28 pages
CFG 2
No ratings yet
CFG 2
6 pages
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
No ratings yet
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
54 pages
CS602PC - Compiler - Design - Lecture Notes - Unit - 5
No ratings yet
CS602PC - Compiler - Design - Lecture Notes - Unit - 5
28 pages
Tcs Theory Notes by Kamal Sir
No ratings yet
Tcs Theory Notes by Kamal Sir
24 pages
cs3401-algorithm-unit5
No ratings yet
cs3401-algorithm-unit5
8 pages
How To Convert A Left Linear Grammar To A Right Linear Grammar
No ratings yet
How To Convert A Left Linear Grammar To A Right Linear Grammar
44 pages
DAA Dynamic Programming
No ratings yet
DAA Dynamic Programming
35 pages
Unit-1 Basics of Algorithms and Mathematics
No ratings yet
Unit-1 Basics of Algorithms and Mathematics
47 pages
Experiment-6: AIM-Write A Program To Implement XOR Gate Using Mcculloch-Pitts Neuron. Program
No ratings yet
Experiment-6: AIM-Write A Program To Implement XOR Gate Using Mcculloch-Pitts Neuron. Program
3 pages
Neuro Fuzzy Systems
100% (1)
Neuro Fuzzy Systems
27 pages
Uncertainty AI
No ratings yet
Uncertainty AI
45 pages
7.assignment2 DAA Answers Dsatm PDF
No ratings yet
7.assignment2 DAA Answers Dsatm PDF
19 pages
Lecture 21 - Sugeno Fuzzy Models
No ratings yet
Lecture 21 - Sugeno Fuzzy Models
5 pages
SRM Institute of Science and Technology
No ratings yet
SRM Institute of Science and Technology
6 pages
Chapter 13
No ratings yet
Chapter 13
21 pages
Lesson 10
No ratings yet
Lesson 10
27 pages
Inception Net
No ratings yet
Inception Net
88 pages
2D Array
No ratings yet
2D Array
38 pages
Chapter 3 - Solving Problems by Searching
No ratings yet
Chapter 3 - Solving Problems by Searching
71 pages
COSC 3100 Brute Force and Exhaustive Search: Instructor: Tanvir
No ratings yet
COSC 3100 Brute Force and Exhaustive Search: Instructor: Tanvir
44 pages
Computer Forensics Computer Crime Scene Investigation 2nd Edition Networking Series John R. Vacca - Own the ebook now and start reading instantly
100% (1)
Computer Forensics Computer Crime Scene Investigation 2nd Edition Networking Series John R. Vacca - Own the ebook now and start reading instantly
47 pages
Multiple Granularity Locking
No ratings yet
Multiple Granularity Locking
1 page
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Clearing The Air On Cloud Computing
No ratings yet
Clearing The Air On Cloud Computing
84 pages
The Science of Violin Harmonics With Special Focus On Articulation, Vibrato and Gamakas
100% (1)
The Science of Violin Harmonics With Special Focus On Articulation, Vibrato and Gamakas
63 pages
Rates of Exchange: Radhakishan V Karthik S Jaichander Yasser Farook
No ratings yet
Rates of Exchange: Radhakishan V Karthik S Jaichander Yasser Farook
15 pages
Cloud Computing
0% (1)
Cloud Computing
33 pages
Natural Language Processing and Machine Learning Basics
No ratings yet
Natural Language Processing and Machine Learning Basics
36 pages
Guia de Aprendizaje 15 Modern
100% (1)
Guia de Aprendizaje 15 Modern
26 pages
284 SpeakUp2
No ratings yet
284 SpeakUp2
2 pages
John Mikhail - Universal Moral Grammar - Theory, Evidence and The Future PDF
No ratings yet
John Mikhail - Universal Moral Grammar - Theory, Evidence and The Future PDF
10 pages
Simple Past Tense O Que Você Precisa Saber Sobre o ?
No ratings yet
Simple Past Tense O Que Você Precisa Saber Sobre o ?
5 pages
Flashcard Tips: Printing Flashcards From Esl-Lounge Premium
No ratings yet
Flashcard Tips: Printing Flashcards From Esl-Lounge Premium
5 pages
Chapter 4 - Social and Cultural Environments True/False
No ratings yet
Chapter 4 - Social and Cultural Environments True/False
5 pages
Rc166 010d ModularityPatterns 0
No ratings yet
Rc166 010d ModularityPatterns 0
7 pages
الحجاج اللغوي عند أبو بكر العزاوي
No ratings yet
الحجاج اللغوي عند أبو بكر العزاوي
14 pages
Humanities-Handbook 2015 To 2017.compressed
No ratings yet
Humanities-Handbook 2015 To 2017.compressed
481 pages
Peraturan Pemarkahan Modul Pintas PAT T4 2024 (TERKINI)
No ratings yet
Peraturan Pemarkahan Modul Pintas PAT T4 2024 (TERKINI)
2 pages
Using Songs in Primary Education, Advantages and Challenges
No ratings yet
Using Songs in Primary Education, Advantages and Challenges
63 pages
Text Intermediate-Upperintermediate, DOES HONESTY ALWAYS PAY
No ratings yet
Text Intermediate-Upperintermediate, DOES HONESTY ALWAYS PAY
2 pages
Lesson Plan 7E Template - Recahmas 1 1
No ratings yet
Lesson Plan 7E Template - Recahmas 1 1
4 pages
Past Simple or Present Perfect
No ratings yet
Past Simple or Present Perfect
3 pages
Minority Report Thesis Statement
100% (2)
Minority Report Thesis Statement
8 pages
Unit 8 - Vocabulary
No ratings yet
Unit 8 - Vocabulary
10 pages
Parent Night Power Point
No ratings yet
Parent Night Power Point
29 pages
A Literary Analysis of The Poems Fire and Ice & A Question Part 2
No ratings yet
A Literary Analysis of The Poems Fire and Ice & A Question Part 2
42 pages
INITIAL TEST FOR 2nd GRADE
No ratings yet
INITIAL TEST FOR 2nd GRADE
2 pages
LP w2.2
No ratings yet
LP w2.2
2 pages
My Mother Ar Sixty Six
100% (2)
My Mother Ar Sixty Six
9 pages
Teaching Kids Songs I Can Run
No ratings yet
Teaching Kids Songs I Can Run
3 pages
Controls: Powerwizard 2.0 - Digital Control Panel
No ratings yet
Controls: Powerwizard 2.0 - Digital Control Panel
2 pages
Visited Places by Rizal
No ratings yet
Visited Places by Rizal
96 pages
BA History 2
No ratings yet
BA History 2
41 pages
Why Should We Teach Grammar
100% (3)
Why Should We Teach Grammar
2 pages
Written Laos
No ratings yet
Written Laos
5 pages

Tamil Morphological Analysis

Uploaded by

Tamil Morphological Analysis

Uploaded by

An Efficient Rule-Based System for Morphological

Parsing of Tamil Language

Final Semester Project

1 WHO WHAT WHERE WHY HOW

2 WHO WHAT WHERE WHY HOW

WHO ARE THE STAKEHOLDERS ?

WHY ARE THESE APPLICATION THE STAKEHOLDERS ?

3 WHO WHAT WHERE WHY HOW

திணை - Class பால் - Gender

இடம் - Person காலம் - Tense

4 WHO WHAT WHERE WHY HOW

vAḷ - வாழ் intu - ந்து koṇṭu - கொண்டு irunta - இருந்த ēn - ஏன்

root voice marker tense marker aspect marker person marker

live past tense during past progressive first person,

4 WHO WHAT WHERE WHY HOW

SURFACE LEVEL LEXICAL LEVEL

RULE – BASED MORPHOLOGICAL ANALYSIS

அன் ஆன் அள் ஆள் அர் ஆர் பம்மார்

5 WHO WHAT WHERE WHY HOW

Two words might share the same inflectional part.

6 WHO WHAT WHERE WHY HOW

Word – Tokenization படித்தான்

Morphological Analysis படி - த்த் - ஆன்

Sentence Syntax Analysis அவன் புத்தகத்தைப்

Semantic Analysis Meaning of the sentence ???

7 WHO WHAT WHERE WHY HOW

He came அவன் வந்தான்

TRANSLATION AND SEMANTIC ANALYSIS

அவன் மதுரைக்கு வந்தாள் -- Semantically Wrong

To check semantic correctness of a sentence, morphological analysis is needed.

8 WHO WHAT WHERE WHY HOW

9 WHO WHAT WHERE WHY HOW

10 WHO WHAT WHERE WHY HOW

படி < VERB_ROOT >

11 WHO WHAT WHERE WHY HOW

GOOGLE TAMIL TRANSLITERATOR IME (Input Method)

12 WHO WHAT WHERE WHY HOW

HASH TABLE GENERATOR

13 WHO WHAT WHERE WHY HOW

 The algorithm can be improved to incorporate stricter rules so as to reduce

 The algorithm can be extended for other agglutinative languages.

 The various resources obtained as a part of this project, including the

12/08/2021 National Institute of Technology, Tiruchirappalli

You might also like