CH2
CH2
• Neural Language Models: These use neural networks to predict word sequences and are more
powerful than traditional statistical models.
• Recurrent Neural Networks (RNNs): These models handle sequential data by maintaining a 'memory'
of previous words in the sequence.
• Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU): These are advanced types of RNNs
designed to better capture long-range dependencies.
• Transformers: These models, like BERT and GPT, use self-attention mechanisms to process the entire
sequence of words simultaneously, leading to improved performance on many NLP tasks
Applications of Language Models:
• Text Generation: Creating coherent and contextually relevant text.
• Developed by OpenAI, models like GPT-3 and GPT-4 generate human-like text based on
the input they receive.
2. 3. To improve the accuracy of NLP tasks: POS tagging can help improve
the performance of various NLP tasks, such as named entity recognition
and text classification. By providing additional context and information
about the words in a text, we can build more accurate and sophisticated
algorithms.
1. 4. To facilitate research in linguistics: POS tagging can also be used
to study the patterns and characteristics of language use and to
gain insights into the structure and function of different parts of
speech.
Steps Involved in the POS tagging:
1. Collect a dataset of annotated text: This dataset will be used to train and test the POS
tagger. The text should be annotated with the correct POS tags for each word.
2. Preprocess the text: This may include tasks such as tokenization (splitting the text into
individual words), lowercasing, and removing punctuation.
3. Divide the dataset into training and testing sets: The training set will be used to train
the POS tagger, and the testing set will be used to evaluate its performance.
4. Train the POS tagger: This may involve building a statistical model, such as a hidden
Markov model (HMM), or defining a set of rules for a rule-based or transformation-
based tagger. The model or rules will be trained on the annotated text in the training
set.
5. Test the POS tagger: Use the trained model or rules to predict the POS tags of the
words in the testing set. Compare the predicted tags to the true tags and calculate
metrics such as precision and recall to evaluate the performance of the tagger.
6.Fine-tune the POS tagger: If the performance of the tagger is not satisfactory,
adjust the model or rules and repeat the training and testing process until the
desired level of accuracy is achieved.
7.Use the POS tagger: Once the tagger is trained and tested, it can be used to
perform POS tagging on new, unseen text. This may involve preprocessing the text
and inputting it into the trained model or applying the rules to the text. The output
will be the predicted POS tags for each word in the text.
Application of POS Tagging:
• Information extraction: POS tagging can be used to identify specific types of
information in a text, such as names, locations, and organizations. This is useful for
tasks such as extracting data from news articles or building knowledge bases for
artificial intelligence systems.
• Named entity recognition: POS tagging can be used to identify and classify named
entities in a text, such as people, places, and organizations. This is useful for tasks such
as building customer profiles or identifying key figures in a news story.
• Text classification: POS tagging can be used to help classify texts into different
categories, such as spam emails or sentiment analysis. By analyzing the POS tags of the
words in a text, algorithms can better understand the content and tone of the text.
• Machine translation: POS tagging can be used to help translate texts
from one language to another by identifying the grammatical
structure and relationships between words in the source language
and mapping them to the target language.
• Collect a large annotated corpus of text and divide it into training and
testing sets.
• Use the trained model to predict the POS tags of the words in the
testing data.
• Evaluate the performance of the model by comparing the predicted
tags to the true tags in the testing data and calculating metrics such
as precision and recall.
• Fine-tune the model and repeat the process until the desired level of
accuracy is achieved.
• Use the trained model to perform POS tagging on new, unseen text.
Transformation-based tagging (TBT):
• A set of rules is defined to transform the tags of words in a text based
on the context in which they appear. For example, a rule might
change the tag of a verb to a noun if it appears after a determiner
such as “the.” The rules are applied to the text in a specific order, and
the tags are updated after each transformation.
• TBT can be more accurate than rule-based tagging, especially for tasks with
complex grammatical structures. However, it can be more computationally
intensive and requires a larger set of rules to achieve good performance.
• Define a set of rules for transforming the tags of words in the text. For
example:
• If the word is a verb and appears after a determiner, change the tag to
“noun.”
• If the word is a noun and appears after an adjective, change the tag to
“adjective.”
• Iterate through the words in the text and apply the rules in a specific
order. For example:
• In the sentence “The cat sat on the mat,” the word “sat” would be
changed from a verb to a noun based on the first rule.
• In the sentence “The red cat sat on the mat,” the word “red” would
be changed from an adjective to a noun based on the second rule.
• Ambiguity: Some words can have multiple POS tags depending on the
context in which they appear, making it difficult to determine their
correct tag. For example, the word “bass” can be a noun (a type of
fish) or an adjective (having a low frequency or pitch).
• Out-of-vocabulary (OOV) words: Words that are not present in the
training data of a POS tagger can be difficult to tag accurately,
especially if they are rare or specific to a particular domain.
• Lexicon : The list of stems and affixes together with basic information
about them such as their main categories (noun, verb, adjective, …) and
their sub-categories (regular noun, irregular noun, …).
• Morphotactics : The model of morpheme ordering that explains which
classes of morphemes can follow other classes of morphemes inside a
word.
• Orthographic Rules (Spelling Rules) : These spelling rules are used to
model changes that occur in a word (normally when two morphemes
combine).
g e
o s
o
e e e
m
o u s
i c
This only says yes or no. Does not give lexical representation.
It accepts a wrong word (foxs).
irreg-pl-noun
+PL:#
+N: є
intermediate d o g ^ s #
surface d o g s