XGBOOST
XGBOOST
Abstract— The Airline sector is an important field nowadays reviewing the feedback of user towards a product or service
in the market. In order to keep that sector alive and up to date we which can help make decisions about providing a new product
have to consider opinion mining. Text sentiment analysis is a or enhancing a new one, furthermore this information can also
Natural Language Processing (NLP) technique to analyze text. In shape and change the strategy of a certain company such as
this research, we will use opinion mining one of the text sentiment establishing marketing campaigns to extract new customers [2].
applications to investigate customer feedback about airline Sentiment analysis composed of three main general types
services. One of the largest opinion mining sources is Twitter lexicon sentiment analysis approach, machine learning-based
which contains a huge number of tweets that needs to be processed sentiment approach, and hybrid techniques. A lexicon is a broad
and analyzed to make a decision and enhance a certain service. In
dictionary or corpus consists of collection of common terms that
this research, we proposed a machine learning model to categorize
Twitter posts into positive, negative and neutral categories. We
are classified according to the polarity ranking. Majority of users
implemented our model on a dataset containing tweets of 6 use informal terms in their tweets or comments that are not
different Airlines in the US. We started our model by included in lexicons [3]. Scientists focus on applying alternate
preprocessing steps where we cleaned tweets and extracted methods for identifying sentiment in text. Machine learning
features to represent them as a feature vector and finally, we built approach use models that can be trained on a text dataset and
our Bag of Words (BoW) model. In the classification phase, we after training iterations the model can predict from another
applied 6 machine learning techniques Support Vector Machine dataset which is considered as a classification problem by
(SVM), Logistic Regression (LR), Random Forest (RF), XgBoost extracting text features and categorized text as positive, negative
(XGB), Naïve Bayes (NB) and Decision Tree (DT) to classify or neutral. Finally, the hybrid method merges between machine
tweets. Finally, in the validation phase, we split data into 70% learning and lexicon approach.
training and 30% testing, for the purpose of testing and validating
the data we used the K-Fold Cross-Validation technique. Finally, The paper is organized as follows: Section 2 introduces
we calculated Accuracy, Precision, Recall and F1-score for each background on sentiment analysis techniques. Section 3 is the
classifier. After comparing the results of each classifier, we found literature survey on sentiment analysis classification. Section 4
that SVM had the highest accuracy of 83.31%. states dataset and methods used in this paper while, section 5
discuss the results of our experiment. Finally Section 6
Keywords— Text Sentiment Analysis, Opinion Mining, Tweets, concludes our work.
Bag of Words, Classification
II. BACKGROUND
I. INTRODUCTION
A. Sentiment Analysis
Sentiment analysis is a method of gathering information
about an object and automatically defining the subjectivity of Sentiment analysis is a machine learning approach that
that object. Our aim is to determine if the twitter's user expresses senses polarity within the text, whether it is a document,
his true opinion whether it is positive, negative or neutral paragraph, sentence or clause. There are many types in
opinion. Sentiment classification can be performed on two sentiment analysis such as emotion detection, fine grained
levels: document level and sentence level. Analysis of data on sentiment analysis, aspect based analysis and intent analysis.
social media networks, such as Facebook, LinkedIn and Twitter Emotion detection inspects the emotional state in text to know
has become a strong resource to know user preferences and has if the user is sad, happy or angry which can impact any public
a variety of applications [1]. decision. Fine grained sentiment analysis is concerned with the
public opinion maybe opinion about a certain product or an
Another term for Sentiment analysis is opinion mining, to
opinion in a certain candidate in elections and detect weather
construct a framework or a system to extract user views in blogs,
reviews about a product or a service and reveal the attitude of a its positive, negative or neutral opinion. Aspect based analysis
speaker about a certain topic or issue. Recently, sentiment is a more specific analysis to extract the opinion of users about
analysis is a crucial activity in modern times by classifying user a specific part in a product for example the car engine to
opinions in addition to user emotions. The extracted information enhance this part after the analysis. Finally, intent analysis
plays an important role in decision making for example
Authorized licensed use limited to: Rutgers University. Downloaded on May 16,2021 at 23:21:07 UTC from IEEE Xplore. Restrictions apply.
depends on detecting intentions of the person behind text which dictionary. They compared the results of classification for the
will help to figure out the right response. five categories business, entertainment, politics, sport, tech.
The sentiments were concentrated mostly in positive and
B. Lexicon-Based Sentiment Analysis
negative classes.
Lexicon is the origin of sentiment analysis and it is divided
to two techniques dictionary based and corpus based [4]. The In 2020, Sudhanshu Kumar et al. [10] designed a system to
dictionary based uses dictionary of terms such as WordNet recommend movies based on the viewer's opinion. Their
while corpus based doesn't depend on dictionary terms but on objective was to understand the user opinion of a certain movie
using machine learning algorithms to analyze the documents and understand trends from movie tweets. Their dataset was a
and extract information. “User-rated” movie database containing 6209 movies, 292863
C. Machine Learning-Based Sentiment Analysis ratings and 51081 users. They used weighted score fusion by
comparing the metadata of movies hence they detect similarity
Machine learning algorithms such as Support Vector
between them. They compared the results between their model,
Machine (SVM) and Decision Trees (DT) can be used in text
hybrid model consisting of collaborative filtering and content
classification. These algorithms depend on extracting features
based metadata and sentiment similarity. The average precision
from text in addition to deep learning models such as Recurrent of the proposed model was 2.54 in top five sentiments and 4.97
Neural Network (RNN), Deep Neural Network (DNN) and in top ten sentiments.
Conventional Neural Network (CNN) [5], [6].
D. Hybrid-Based Sentiment Analysis
IV. MATERIALS & METHODS
Combining both lexicon & machine learning needs data
cleansing before fitting it into the model. Most of the models A. Datasets
depend on word embedding technique such as Word2vec We used “twitter-airline-sentiment” dataset downloaded from
where each word is represented by a vector. Kaggle [11]. The dataset represents the opinion of passengers
in the form of tweets. Tweets represent six airlines in the United
III. LITERATURE SURVEY
States in 14640 records in total as shown in Table I. records are
In 2019, Furqan Rustam et al. [7] classified the tweets into divided into three classes positive, negative and neutral.
positive, negative and neutral classes. In their study they tried
to predict different classes of US airline tweets collected from B. Methods
passengers about the offered service. They used “Twitter- 1) Support Vector Machine (SVM): Is based on classifying
Airline-Sentiment” downloaded from Kaggle and it contains a data as linear and non-linear. SVM is a binary classifier
total of 14,640 records for 6 airlines in US. They proposed a based on the idea of hyperplanes separation of objects.
Voting Classifier (VC) using Logistic Regression (LR) and Hyperplanes act as a boundary to distinguish data points
Stochastic Gradient Descent Classifier (SGDC). Their to be assigned to different classes [12].
proposed model started by some preprocessing such as
removing stop words, then building a corpus. After that they 2) Logistic Regression (LR): Formed of logistic function
used three features Term Frequency (TF), Document Frequency and represented by a curve where function act as sigmoid which
(TF-IDF) and word2vec, features were fed into ten machine has an output of 0 and 1. LR represented as an S curve that
learning classifiers. They compared the result of the classifiers shows the growth and rising in the range of 0 and 1.
with their proposed model through the three features. Their
model achieved an accuracy of 79.1% in TF, 79.2% in TF-IDF 3) Random Forest (RF): Is called decision bagged trees
and 77.7% in word2vec. which use ensemble method formed of decision trees. In RF
dataset was split forming independent decision tree. A subset of
In 2020, Chirag Kariya et al. [8] extracted sentiments out of the tree is selected form its root to leaf node. Features are
the tweets to categorize them. They used two machine learning extracted from each tree to represent the final tree [13].
algorithms K-nearest neighbors (KNN) and Naive Bayes (NB),
they used twitter API to collect tweets as a dataset. The idea of 4) XGboost (XGB): Is bases on gradient boosting algorithm
the model was based on determining the intensity of the word
implemented in advanced form to boost speed and performance
for example if the intensity of the positive words is high then
[14]. The algorithms have three main parameters booster,
the tweet is considered as a positive tweet. KNN achieved an
learner and general. Booster parameters are responsible for
accuracy of 99.6456%.
booster operation in regression and tree. while learner
In 2020, Antony Samuels et al. [9] worked on news sentiment parameters responsible for optimization and general parameters
analysis to identify whether opinions are positive, negative or are responsible for the functioning whole algorithm.
neutral. Their dataset was “BBC News” which contained 2225
documents for 5 different topics business, sport, politics etc... . 5) Gaussian Naïve Bayes (NB): Belongs to machine
Their technique depended on calculating polarity of sentiment learning family for binary and multiclass problems and follows
analysis and sentiment score for text using WordNet lexical Gaussian normal distribution built on Bayes theorem [15].
60
Authorized licensed use limited to: Rutgers University. Downloaded on May 16,2021 at 23:21:07 UTC from IEEE Xplore. Restrictions apply.
Main idea of algorithm is based on calculating probabilities for 2) Classification Phase: After we built corpus and
hypothesis. NB is efficient in training and predicting since it extracted features data are fitted into machine learning models
doesn't need much data to find classification parameters which to classify tweets whether it is negative, positive or neutral.
mainly support continuous features.
3) Validation And Testing Phase: The data were split into
6) Decision tree (DT): Is a tree like structure of internal 70% training and 30% testing then we carried 10-fold cross-
nodes and leaf nodes. Internal nodes represent existing validation technique to evaluate our data then we compared the
conditions and leaf nodes represent classes. One of the great results of the different classifiers in terms of accuracy,
advantages of DT is the adaptation to any form of data. DT has precision, recall and f1-score and finally, we chose the best
two statistical properties entropy and information gain where classifier as shown in Table V.
entropy measures the variance in data while information gain
responsible of the split quality [16].
% Overall Accuracy
SVM 83.31
LR 81.81
RF 78.55
XGB 75.93
NB 73
DT 70.51
C. Proposed Model
In this study we used Bag of Words (BoW) to represent data of
tweets as features. We used 6 classifiers on the dataset Support
Vector Machine (SVM), Logistic Regression (LR), Random
Forest (RF), Xgboost (XGB) Decision Tree (DT), Naive Bayes
(NB) and K-nearest neighbors(KNN). We started our model by
preprocessing phase to clean tweets from noise such as
punctuation and stopwords removal then we created a corpus of
the cleaned tweets. After that we created the BoW model to
extract features from tweets then we split our data into 70%
training and 30% testing as shown in Fig. 1. The final step with Fig. 1. Tweets analysis proposed model
fitting the data into different machine learning models and apply
10-fold cross validation technique to evaluate data.
1) Preprocessing Phase: In this phase we performed five V. EXPERIMENT RESULTS & DISCUSSION
cleaning steps. The first step we removed the stop words by We carried 6 experiments on different machine learning
removing unnecessary words that creates redundancy and techniques. Main purpose was to classify tweets to three classes
making analysis more complex such as "To, For, how, and". The positive, negative and neutral. The experiments showed that
second step was the punctuation removal where we removed any SVM and LR got the highest average accuracy of 85.59% and
81.81% respectively while RF, XGB, NB, DT had an average
punctuation signs such as "@, :, !,?" since each tweet started by accuracy of 78.55%, 73% and 70.51% receptively. SVM
@airline company so we removed the "@" sign and the air achieved an accuracy of 87.2%, 94.44% and 89.53%, precision
company from all tweets. The third step was converting all of 84%,94% and 88%, recall of 98%, 69% and 63% and f1-
letters into lowercase to unify all words since the machine is score of 90%, 79% and 73% for negative, positive and neutral
case sensitive. The last step is the stemming by getting the classes receptively. LR achieved an accuracy of 85.04%,
origin of the word for example "flew" after stemming turns into 92.76% and 85.82%, precision of 85%, 87% and 70%, recall of
"fly". Finally, we are ready to build our corpus which is a 92%, 69% and 64% and f1 score of 88%, 77% and 67% for
collection of text representing our cleaned tweets. After that we negative, positive and neutral classes receptively. While RF
used BoW to represent tweets as a feature vector so that it can achieved an accuracy of 81.44%, 92.44% and 83.22%, precision
be used in machine learning models. of 81%, 85% and 65% recall of 92%, 62% and 52% and f1-
score of 86%, 72% and 58% for negative, positive and neutral
classes receptively. XGB achieved an accuracy of 76.94%,
92.87% and 82.06%, precision of 74%, 89% and 69%, recall of
97%, 66% and 17% and f1-score of 84%, 76% and 27% for
61
Authorized licensed use limited to: Rutgers University. Downloaded on May 16,2021 at 23:21:07 UTC from IEEE Xplore. Restrictions apply.
negative, positive and neutral classes receptively. NB achieved TABLE V. Overall accuracy of classifiers
an accuracy of 75.52%, 83.9% and 86.57%, precision of 89%,
71% and 54%, recall of 57%, 93% and 77% and f1-score of % Overall Accuracy
69%, 81% and 63% for negative, positive and neutral classes
receptively. Finally, DT achieved an accuracy of 75.75%, SVM 83.31
88.91% and 76.37%, precision of 81%, 62% and 44%, recall of
81%, 60% and 45% and f1-score of 81%, 61% and 44% for LR 81.81
negative, positive and neutral classes receptively as show in RF 78.55
Table II, Table III, Table IV and Table V. XGB 75.93
NB 73
Comparing our results with paper [7] we found that using DT 70.51
BoW model and extracting the most representative features got
a better result in terms of accuracy, precision, recall and f1-
score. Also we introduced optimized XGB as shown in Fig. 2
which outperformed the AdaBoost classifier mentioned in paper VI. CONCLUSION
[7]. There is an enhancement by 4.6% in SVM, 1.91% in DT and
3.11% in LR. Since we introduced XGB which is not in paper In this paper we applied machine learning techniques to classify
[7] but it performed better than AdaBoost. While in case of RF tweets. Since, a lot of tweets are generated everyday on twitter
and NB there was an enhancement by 2.25% and 2.2% we found that there is a need to design a model which will be
respectively. able to process these tweets and classify them. We stated the
latest approaches to deal with text sentiment such as Lexicon,
Machine Learning, Hybrid-Based models and Deep Learning.
The findings of the paper showed that SVM outperformed other
classifiers with an accuracy of 83.31%. We compared our
results to others, thus we believe this enhancement will add a
great contribution in the domain of sentiment analysis.
REFERENCES
62
Authorized licensed use limited to: Rutgers University. Downloaded on May 16,2021 at 23:21:07 UTC from IEEE Xplore. Restrictions apply.
Learning Technologies and Applications (AMLTA2019), pp. 281–290, [15] C. D. Manning, P. Raghavan, and H. Schutze, “Text classification and
Mar. 2019. Naive Bayes,” Introduction to Information Retrieval, pp. 234–265.
[14] Z. Qi, “The Text Classification of Theft Crime Based on TF-IDF and [16] J. Ababneh, “Application of Naïve Bayes, Decision Tree, and K-Nearest
XGBoost Model,” 2020 IEEE International Conference on Artificial Neighbors for Automated Text Classification,” Modern Applied Science,
Intelligence and Computer Applications (ICAICA), Jun. 2020. vol. 13, no. 11, p. 31, Oct. 2019.
63
Authorized licensed use limited to: Rutgers University. Downloaded on May 16,2021 at 23:21:07 UTC from IEEE Xplore. Restrictions apply.