0% found this document useful (0 votes)
117 views

Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning

This document discusses methods for detecting fake online reviews using semi-supervised and supervised learning approaches. It summarizes existing research on content-based and behavior-based methods for identifying fake reviews. The paper then proposes using both semi-supervised and supervised classification models, including Expectation-Maximization and Naive Bayes classifiers, to classify reviews on a hotel dataset as real or fake. The goal is to enhance classification efficiency and address limitations of existing supervised-only approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views

Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning

This document discusses methods for detecting fake online reviews using semi-supervised and supervised learning approaches. It summarizes existing research on content-based and behavior-based methods for identifying fake reviews. The paper then proposes using both semi-supervised and supervised classification models, including Expectation-Maximization and Naive Bayes classifiers, to classify reviews on a hotel dataset as real or fake. The goal is to enhance classification efficiency and address limitations of existing supervised-only approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

JOURNAL OF RESOURCE ISSN NO: 0745-6999

MANAGEMENT AND TECHNOLOGY

DETECTION OF FAKE ONLINE REVIEWS USING SEMI


SUPERVISED AND SUPERVISED LEARNING.DOCX
D.SAI KRISHNA, Konduri Nikitha
MCA Student, Assistant Professor
Dept Of MCA
Sree Chaitanya College Of Engineering, Karimnagar
ABSTRACT of which are semi-supervised and some of which are
Internet ratings have a huge influence on today's supervised. We use the Expectation- maximization
industry and trade. The decision-making method for algorithm for semi-supervised learning. In our
the purchasing of online goods relies mainly on research work, the Statistical Naive Bayes classifier
customer feedback. Opportunistic people or and Support Vector Machines (SVM) are used as
companies are therefore seeking to exploit product classifiers to enhance classification efficiency. We
ratings for their own interests. In order to classify also primarily concentrated on the content of the
false online reviews, this paper incorporates several methods focused on
semi-supervised and supervised text mining models the study. We also used word frequency count,
and contrasts the efficacy of both approaches on the emotion polarity and period of analysis as a function
dataset containing hotel reviews. II.LITERATURESURVEY
I.INTRODUCTION Chengai Sun, Qiaolin Du and Gang Tian, ―Exploiting
Technologies are swiftly evolving. Old innovations product related rating features for the analysis of
are being constantly substituted with modern and false feedback
emerging technologies. This emerging innovations Brand ratings are now extensively used by people to
allow individuals to carry out their work effectively. make their selections. However, by publishing false
The online marketplace is such a technical feedback to support or degrade the target goods,
advancement. Through utilising online portals, we reviewers game the mechanism regardless of the
can shop and make reservations. Before consuming intent of benefit. False check identification has
those goods or facilities, almost every one of us seeks gained substantial interest from both manufacturing
out feedback. As a consequence, online feedback organisations and research communities over the past
have been a fantastic source of trust for businesses. few years. However, owing to the shortage of
They also have a huge influence on advertisements labelling materials for controlled instruction and
and the marketing of goods and services. Fake web assessment, the issue remains a complicated problem.
reviews are becoming extremely relevant with the Present analysis has made several efforts to fix this
spread of the online marketplace. For the marketing topic from the point of view of reviewers and
of their own goods, people may create fake reviews reviewers. There has been little debate, however,
that damage the real consumers. Competitive firms regarding the product- related analysis characteristics
may even threaten to damage the credibility of each that are the primary objective of our process. In order
other by offering false critical feedback. to implement the product-related review functionality
Researchers have been exploring several ways to into a product word composition model, this paper
recognise these bogus web reviews. Some methods proposes a novel convolutionary neural network
are focused on the quality of the article and some are model. A bagging model is used to bag the neural
based on the actions of the consumer who publishes network model with two effective classifiers in order
feedback. Content-driven analysis focuses on what is to minimise overfitting and high variance. The
written on the review, which is the review text where feasibility of the suggested solution is shown by tests
the approach based on user activity focuses on on the real-life Amazon analysis dataset.
country, ip address, reviewer's number of messages, M. You see, Ott, Y. Choi, C. Cardie, and J. In the
etc. Supervised classification models are the bulk of Proceedings of the 49th Annual Meeting of the
the recommended techniques. A few scientists have Association for Computational Linguistics, T.
experimented with semi-supervised simulations as Hancock, "Finding deceptive opinion spam by any
well. Owing to the absence of accurate identification stretch of the imagination,"
of the tests, semi-supervised approaches are being Consumers are rapidly digitally rateing, evaluating
applied. and investigating items. As a consequence, blogs that
In this article, we establish several classification contain user feedback are being spam targets for
methods to identify fraudulent online reviews, some opinion. Although recent literature has mainly
JOURNAL OF RESOURCE ISSN NO: 0745-6999
MANAGEMENT AND TECHNOLOGY

concentrated on manually recognisable instances of methodology to overcome this downside. To do this,


opinion spam, we are researching we integrate the algorithm of expectation-
misleading opinion spam in this work—fictitious maximization, which is an easy and important semi-
opinions that have been purposely written to sound supervised learning algorithm. Experiments are
credible. We build and evaluate three approaches to carried out on actual site spam content, which
detecting misleading opinion spam and eventually demonstrates that the latest approach performs quite
construct a classifier that is almost 90 percent well in practise.
accurate on our gold-standard opinion spam dataset III. SYSTEM ANALYSIS AND DESIGN
by combining work from psychology and EXISTING SYSTEM
computational linguistics. In addition, we make some Methods focused on content rely on what the content
theoretical contributions on the basis of attribute of the analysis is. It is the text of, or what is said in,
study of our learned models, including disclosing a the analysis. Heydari et al.[2] tried to classify the
connection between dishonest perceptions and spam analysis by examining the review's linguistic
creative prose. characteristics. In order to conduct grouping, Ott et
J. Li, M. Ott, E., and C. Cardie. Hovy, in Proceedings al. [3] used three methods. These three methods are-
of the 52nd Annual Meeting of the Association for genre recognition, psycholinguistic manipulation
Computational Linguistics, 'Towards a general law detection and categorization of documents.
for the detection of dishonest opinion spam' (ACL) The behavioural feature-based analysis focuses on
The buying decisions of customers are more and the reviewer, which involves the features of the
more informed by online feedback created by users. person offering the examination. The topic of
There has also been increasing worry regarding the checking spammer identification or identifying users
potential for publishing misleading opinion spam, who are the root of spam feedback has been
fake reviews that have been purposefully published to discussed by Lim et al[7]. The behaviour of people
appear credible, to trick the reader. In this article, we who post deliberate false reviews is substantially
discuss generalised approaches to the detection of different from that of the average person. The
online disappointing spam related opinions following manipulative ranking and evaluation habits
On a modern gold standard dataset composed of data have been established by them.
from three separate domains (i.e. hotel, restaurant, Deceptive online review identification is widely
doctor), each of which comprises three forms of regarded as a problem with classification and one
feedback, i.e. truthful reviews produced by common solution is to use supervised techniques for
consumers, deceptive reviews generated by Turker classification of text[5]. Since the teaching is carried
and deceptive reviews generated by employees out utilising broad databases with labelled instances
(domain-expert). Our methodology aims to capture from all groups, dishonest opinions (positive
the general disparity in linguistic use between instances) and honest opinions (negative
misleading and honest feedback, which we believe examples)[8], these strategies are durable. Semi-
can help users investigate potential malicious conduct supervised grouping methods have also been used by
on their platforms while making buying choices and several scholars.
evaluating portal operators, such as TripAdvisor or Having drawbacks
Yelp. In the current project, the method still utilises semi-
J. Oh, Karimpour, A. A. Noroozi, and S. "Alizadeh, supervised learning.
"Detection of site spam by learning from tiny labelled Just Document Classification as the text of sentiment
samples and bogus analysis is never identified.
Online spamming aims to mislead search engines in Proposed System
order to rate certain sites higher than they warrant. In In the proposed method, each appraisal first goes
order to tackle site spamming and to identify spam through the phase of tokenization. Then, redundant
sites, several approaches have been suggested. words are omitted and words are created for
Classification, i.e. learning a classification model candidate functionality.
from previously classified training data and using this The terms of each nominee feature are tested against
model for classifying spam or non-spam web pages, the dictionary and if their entry is accessible in the
is one fundamental process. A downside to this dictionary, its frequency is counted and applied to the
approach is that it can be biassed, non-accurate, column in the vector of the feature that refers to the
labour intensive and time word's numeric map.
consuming to manually mark a vast number of web In addition to the counting frequency, the period of
pages to produce the training data. In this article, by the analysis is calculated and applied to the function
using semi-supervised learning to automatically mark vector.
the training results, we will suggest a new Finally, in the function vector, the sentiment score
JOURNAL OF RESOURCE ISSN NO: 0745-6999
MANAGEMENT AND TECHNOLOGY

which is present in the data set is applied. As any


positive evaluated in the function vector, we have
allocated negative sentiment as zero valued and
positive sentiment.
Advantages
Because of semi-supervised and supervised
instruction, the method is very simple and
effective.Concentrated on the content of the methods
centred on the study. We also used word frequency
count, emotion polarity, and period of analysis as a
function.

V. CONCLUSIONS

In this study, we have illustrated many semi-


supervised and supervised text mining strategies for
identifying fake online feedback. In order to build a
superior feature collection, we have merged features
from many test projects. We have also tested several
other classifiers that have not been included in
previous work. Therefore, we have been able to
improve the precision of Jiten et al.[8previous ]'s
semi-supervised techniques. We have also pointed
IV. SCREEN SHOTS out that the maximum precision is given by the
supervised Naive Bayes classifier. This means that
our dataset is labelled well, so we recognise that
when accurate labelling is not accessible, the semi-
supervised model works well.
We have just worked on consumer feedback of our
analysis work. In the future, it is feasible to merge
consumer actions with texts to build a stronger
classification model. In order to render the dataset
more reliable, sophisticated preprocessing techniques
for tokenization can be used. For a wider data
collection, assessment of the feasibility of the
suggested technique may be conducted. This research
work is only being carried out for feedback in
English. For Bangla and many other languages, this
can be chieved.
REFERENCES
[1] Chengai Sun, Qiaolin Du and Gang Tian,
―Exploiting Product Related Review Features for
Fake Review Detection,‖ Mathematical Problems in
Engineering, 2016.
[2] A. Heydari, M. A. Tavakoli, N. Salim, and Z.
Heydari, ‖Detection of review spam: a survey‖,
Expert Systems with Applications, vol. 42, no. 7, pp.
3634–3642, 2015.
[3] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock,
―Finding deceptive opinion spam by any stretch of
the imagination,‖ in Proceedings of the 49th Annual
Meeting of the Association for Computational
Linguistics: Human Language Technologies (ACL-
HLT), vol. 1, pp. 309–319, Association for
Computational Linguistics, Portland, Ore, USA, June
JOURNAL OF RESOURCE ISSN NO: 0745-6999
MANAGEMENT AND TECHNOLOGY

2011.
[4] J. W. Pennebaker, M. E. Francis, and R. J. Booth,
‖Linguistic Inquiry and Word Count: Liwc,‖ vol. 71,
2001.
[5] S. Feng, R. Banerjee, and Y. Choi, ―Syntactic
stylometry for deception detection,‖ in Proceedings
of the 50th Annual Meeting of the Association for
Computational Linguistics: Short Papers, Vol. 2,
2012.
[6] J. Li, M. Ott, C. Cardie, and E. Hovy, ―Towards a
general rule for identifying deceptive opinion spam,‖
in Proceedings of the 52nd Annual Meeting of the
Association for Computational Linguistics (ACL),
2014.
[7] E. P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and
H. W. Lauw, ―Detecting product review spammers
using rating behaviors,‖ in Proceedings of the 19th
ACM International Conference on Information and
Knowledge Management (CIKM), 2010.
[8] J. K. Rout, A. Dalmia, and K.-K. R. Choo,
―Revisiting semi-supervised learning for online
deceptive review detection,‖ IEEE Access, Vol. 5,
pp. 1319–1327, 2017.
[9] J. Karimpour, A. A. Noroozi, and S. Alizadeh,
―Web spam detection by learning from small labeled
samples,‖ International Journal of Computer
Applications,

You might also like