Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning

This document discusses methods for detecting fake online reviews using semi-supervised and supervised learning approaches. It summarizes existing research on content-based and behavior-based methods for identifying fake reviews. The paper then proposes using both semi-supervised and supervised classification models, including Expectation-Maximization and Naive Bayes classifiers, to classify reviews on a hotel dataset as real or fake. The goal is to enhance classification efficiency and address limitations of existing supervised-only approaches.

Uploaded by

Nani Gottipalakala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views

Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning

Uploaded by

Nani Gottipalakala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

JOURNAL OF RESOURCE ISSN NO: 0745-6999

MANAGEMENT AND TECHNOLOGY

DETECTION OF FAKE ONLINE REVIEWS USING SEMI

SUPERVISED AND SUPERVISED LEARNING.DOCX
D.SAI KRISHNA, Konduri Nikitha
MCA Student, Assistant Professor
Dept Of MCA
Sree Chaitanya College Of Engineering, Karimnagar
ABSTRACT of which are semi-supervised and some of which are
Internet ratings have a huge influence on today's supervised. We use the Expectation- maximization
industry and trade. The decision-making method for algorithm for semi-supervised learning. In our
the purchasing of online goods relies mainly on research work, the Statistical Naive Bayes classifier
customer feedback. Opportunistic people or and Support Vector Machines (SVM) are used as
companies are therefore seeking to exploit product classifiers to enhance classification efficiency. We
ratings for their own interests. In order to classify also primarily concentrated on the content of the
false online reviews, this paper incorporates several methods focused on
semi-supervised and supervised text mining models the study. We also used word frequency count,
and contrasts the efficacy of both approaches on the emotion polarity and period of analysis as a function
dataset containing hotel reviews. II.LITERATURESURVEY
I.INTRODUCTION Chengai Sun, Qiaolin Du and Gang Tian, ―Exploiting
Technologies are swiftly evolving. Old innovations product related rating features for the analysis of
are being constantly substituted with modern and false feedback
emerging technologies. This emerging innovations Brand ratings are now extensively used by people to
allow individuals to carry out their work effectively. make their selections. However, by publishing false
The online marketplace is such a technical feedback to support or degrade the target goods,
advancement. Through utilising online portals, we reviewers game the mechanism regardless of the
can shop and make reservations. Before consuming intent of benefit. False check identification has
those goods or facilities, almost every one of us seeks gained substantial interest from both manufacturing
out feedback. As a consequence, online feedback organisations and research communities over the past
have been a fantastic source of trust for businesses. few years. However, owing to the shortage of
They also have a huge influence on advertisements labelling materials for controlled instruction and
and the marketing of goods and services. Fake web assessment, the issue remains a complicated problem.
reviews are becoming extremely relevant with the Present analysis has made several efforts to fix this
spread of the online marketplace. For the marketing topic from the point of view of reviewers and
of their own goods, people may create fake reviews reviewers. There has been little debate, however,
that damage the real consumers. Competitive firms regarding the product- related analysis characteristics
may even threaten to damage the credibility of each that are the primary objective of our process. In order
other by offering false critical feedback. to implement the product-related review functionality
Researchers have been exploring several ways to into a product word composition model, this paper
recognise these bogus web reviews. Some methods proposes a novel convolutionary neural network
are focused on the quality of the article and some are model. A bagging model is used to bag the neural
based on the actions of the consumer who publishes network model with two effective classifiers in order
feedback. Content-driven analysis focuses on what is to minimise overfitting and high variance. The
written on the review, which is the review text where feasibility of the suggested solution is shown by tests
the approach based on user activity focuses on on the real-life Amazon analysis dataset.
country, ip address, reviewer's number of messages, M. You see, Ott, Y. Choi, C. Cardie, and J. In the
etc. Supervised classification models are the bulk of Proceedings of the 49th Annual Meeting of the
the recommended techniques. A few scientists have Association for Computational Linguistics, T.
experimented with semi-supervised simulations as Hancock, "Finding deceptive opinion spam by any
well. Owing to the absence of accurate identification stretch of the imagination,"
of the tests, semi-supervised approaches are being Consumers are rapidly digitally rateing, evaluating
applied. and investigating items. As a consequence, blogs that
In this article, we establish several classification contain user feedback are being spam targets for
methods to identify fraudulent online reviews, some opinion. Although recent literature has mainly
JOURNAL OF RESOURCE ISSN NO: 0745-6999
MANAGEMENT AND TECHNOLOGY

concentrated on manually recognisable instances of methodology to overcome this downside. To do this,

opinion spam, we are researching we integrate the algorithm of expectation-
misleading opinion spam in this work—fictitious maximization, which is an easy and important semi-
opinions that have been purposely written to sound supervised learning algorithm. Experiments are
credible. We build and evaluate three approaches to carried out on actual site spam content, which
detecting misleading opinion spam and eventually demonstrates that the latest approach performs quite
construct a classifier that is almost 90 percent well in practise.
accurate on our gold-standard opinion spam dataset III. SYSTEM ANALYSIS AND DESIGN
by combining work from psychology and EXISTING SYSTEM
computational linguistics. In addition, we make some Methods focused on content rely on what the content
theoretical contributions on the basis of attribute of the analysis is. It is the text of, or what is said in,
study of our learned models, including disclosing a the analysis. Heydari et al.[2] tried to classify the
connection between dishonest perceptions and spam analysis by examining the review's linguistic
creative prose. characteristics. In order to conduct grouping, Ott et
J. Li, M. Ott, E., and C. Cardie. Hovy, in Proceedings al. [3] used three methods. These three methods are-
of the 52nd Annual Meeting of the Association for genre recognition, psycholinguistic manipulation
Computational Linguistics, 'Towards a general law detection and categorization of documents.
for the detection of dishonest opinion spam' (ACL) The behavioural feature-based analysis focuses on
The buying decisions of customers are more and the reviewer, which involves the features of the
more informed by online feedback created by users. person offering the examination. The topic of
There has also been increasing worry regarding the checking spammer identification or identifying users
potential for publishing misleading opinion spam, who are the root of spam feedback has been
fake reviews that have been purposefully published to discussed by Lim et al[7]. The behaviour of people
appear credible, to trick the reader. In this article, we who post deliberate false reviews is substantially
discuss generalised approaches to the detection of different from that of the average person. The
online disappointing spam related opinions following manipulative ranking and evaluation habits
On a modern gold standard dataset composed of data have been established by them.
from three separate domains (i.e. hotel, restaurant, Deceptive online review identification is widely
doctor), each of which comprises three forms of regarded as a problem with classification and one
feedback, i.e. truthful reviews produced by common solution is to use supervised techniques for
consumers, deceptive reviews generated by Turker classification of text[5]. Since the teaching is carried
and deceptive reviews generated by employees out utilising broad databases with labelled instances
(domain-expert). Our methodology aims to capture from all groups, dishonest opinions (positive
the general disparity in linguistic use between instances) and honest opinions (negative
misleading and honest feedback, which we believe examples)[8], these strategies are durable. Semi-
can help users investigate potential malicious conduct supervised grouping methods have also been used by
on their platforms while making buying choices and several scholars.
evaluating portal operators, such as TripAdvisor or Having drawbacks
Yelp. In the current project, the method still utilises semi-
J. Oh, Karimpour, A. A. Noroozi, and S. "Alizadeh, supervised learning.
"Detection of site spam by learning from tiny labelled Just Document Classification as the text of sentiment
samples and bogus analysis is never identified.
Online spamming aims to mislead search engines in Proposed System
order to rate certain sites higher than they warrant. In In the proposed method, each appraisal first goes
order to tackle site spamming and to identify spam through the phase of tokenization. Then, redundant
sites, several approaches have been suggested. words are omitted and words are created for
Classification, i.e. learning a classification model candidate functionality.
from previously classified training data and using this The terms of each nominee feature are tested against
model for classifying spam or non-spam web pages, the dictionary and if their entry is accessible in the
is one fundamental process. A downside to this dictionary, its frequency is counted and applied to the
approach is that it can be biassed, non-accurate, column in the vector of the feature that refers to the
labour intensive and time word's numeric map.
consuming to manually mark a vast number of web In addition to the counting frequency, the period of
pages to produce the training data. In this article, by the analysis is calculated and applied to the function
using semi-supervised learning to automatically mark vector.
the training results, we will suggest a new Finally, in the function vector, the sentiment score
JOURNAL OF RESOURCE ISSN NO: 0745-6999
MANAGEMENT AND TECHNOLOGY

which is present in the data set is applied. As any

positive evaluated in the function vector, we have
allocated negative sentiment as zero valued and
positive sentiment.
Advantages
Because of semi-supervised and supervised
instruction, the method is very simple and
effective.Concentrated on the content of the methods
centred on the study. We also used word frequency
count, emotion polarity, and period of analysis as a
function.

V. CONCLUSIONS

In this study, we have illustrated many semi-

supervised and supervised text mining strategies for
identifying fake online feedback. In order to build a
superior feature collection, we have merged features
from many test projects. We have also tested several
other classifiers that have not been included in
previous work. Therefore, we have been able to
improve the precision of Jiten et al.[8previous ]'s
semi-supervised techniques. We have also pointed
IV. SCREEN SHOTS out that the maximum precision is given by the
supervised Naive Bayes classifier. This means that
our dataset is labelled well, so we recognise that
when accurate labelling is not accessible, the semi-
supervised model works well.
We have just worked on consumer feedback of our
analysis work. In the future, it is feasible to merge
consumer actions with texts to build a stronger
classification model. In order to render the dataset
more reliable, sophisticated preprocessing techniques
for tokenization can be used. For a wider data
collection, assessment of the feasibility of the
suggested technique may be conducted. This research
work is only being carried out for feedback in
English. For Bangla and many other languages, this
can be chieved.
REFERENCES
[1] Chengai Sun, Qiaolin Du and Gang Tian,
―Exploiting Product Related Review Features for
Fake Review Detection,‖ Mathematical Problems in
Engineering, 2016.
[2] A. Heydari, M. A. Tavakoli, N. Salim, and Z.
Heydari, ‖Detection of review spam: a survey‖,
Expert Systems with Applications, vol. 42, no. 7, pp.
3634–3642, 2015.
[3] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock,
―Finding deceptive opinion spam by any stretch of
the imagination,‖ in Proceedings of the 49th Annual
Meeting of the Association for Computational
Linguistics: Human Language Technologies (ACL-
HLT), vol. 1, pp. 309–319, Association for
Computational Linguistics, Portland, Ore, USA, June
JOURNAL OF RESOURCE ISSN NO: 0745-6999
MANAGEMENT AND TECHNOLOGY

2011.
[4] J. W. Pennebaker, M. E. Francis, and R. J. Booth,
‖Linguistic Inquiry and Word Count: Liwc,‖ vol. 71,
2001.
[5] S. Feng, R. Banerjee, and Y. Choi, ―Syntactic
stylometry for deception detection,‖ in Proceedings
of the 50th Annual Meeting of the Association for
Computational Linguistics: Short Papers, Vol. 2,
2012.
[6] J. Li, M. Ott, C. Cardie, and E. Hovy, ―Towards a
general rule for identifying deceptive opinion spam,‖
in Proceedings of the 52nd Annual Meeting of the
Association for Computational Linguistics (ACL),
2014.
[7] E. P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and
H. W. Lauw, ―Detecting product review spammers
using rating behaviors,‖ in Proceedings of the 19th
ACM International Conference on Information and
Knowledge Management (CIKM), 2010.
[8] J. K. Rout, A. Dalmia, and K.-K. R. Choo,
―Revisiting semi-supervised learning for online
deceptive review detection,‖ IEEE Access, Vol. 5,
pp. 1319–1327, 2017.
[9] J. Karimpour, A. A. Noroozi, and S. Alizadeh,
―Web spam detection by learning from small labeled
samples,‖ International Journal of Computer
Applications,

DeepFake-edit final
No ratings yet
DeepFake-edit final
47 pages
WORKSHEET - Data Representation
100% (1)
WORKSHEET - Data Representation
3 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
Big Data
No ratings yet
Big Data
30 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Seminar On Deep CNN
No ratings yet
Seminar On Deep CNN
36 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
8 pages
Minor Project Report
No ratings yet
Minor Project Report
49 pages
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
No ratings yet
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
74 pages
Steganography Project Report For Major Project in B Tech
No ratings yet
Steganography Project Report For Major Project in B Tech
74 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
Malicious Url Detection Based On Machine Learning
No ratings yet
Malicious Url Detection Based On Machine Learning
52 pages
python microproject
No ratings yet
python microproject
11 pages
THE FAKE ACCOUNT DETECTION IN ONLINE SOCIAL NETWORKS (OSNs) USING RANDOM FOREST
No ratings yet
THE FAKE ACCOUNT DETECTION IN ONLINE SOCIAL NETWORKS (OSNs) USING RANDOM FOREST
95 pages
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
Fake Product1
No ratings yet
Fake Product1
37 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Digital Media Marketing Using Trend Analysis On Social Media Seminar Presentation
100% (1)
Digital Media Marketing Using Trend Analysis On Social Media Seminar Presentation
16 pages
Visvesvaraya Technological University: "Car Rental Management System"
No ratings yet
Visvesvaraya Technological University: "Car Rental Management System"
31 pages
Soft Computing Lab Manual
No ratings yet
Soft Computing Lab Manual
24 pages
Drug Recommender System Using Machine Learning For Sentiment Analysis
No ratings yet
Drug Recommender System Using Machine Learning For Sentiment Analysis
4 pages
Internship Report
No ratings yet
Internship Report
13 pages
YouTube Transcript Summarizer
No ratings yet
YouTube Transcript Summarizer
62 pages
Roo Project
No ratings yet
Roo Project
16 pages
Report
100% (1)
Report
32 pages
Fake Product Review Monitoring & Removal For Genuine Ratings.
No ratings yet
Fake Product Review Monitoring & Removal For Genuine Ratings.
12 pages
Sign Language Recognition Using Deep Learning
No ratings yet
Sign Language Recognition Using Deep Learning
6 pages
Weather Prediction Using CPT+ Algorithm: Proposed Scheme
No ratings yet
Weather Prediction Using CPT+ Algorithm: Proposed Scheme
12 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Synopsis P
100% (1)
Synopsis P
6 pages
Few-Shot Learning: Shusen Wang
No ratings yet
Few-Shot Learning: Shusen Wang
42 pages
Student management system
No ratings yet
Student management system
41 pages
Interim Project - Sentiment Analysis of Movie
No ratings yet
Interim Project - Sentiment Analysis of Movie
101 pages
CD Questions With Answers
100% (1)
CD Questions With Answers
36 pages
Final Report
No ratings yet
Final Report
49 pages
Spammer Detect Project Document
No ratings yet
Spammer Detect Project Document
45 pages
Final Report
No ratings yet
Final Report
79 pages
SENTIMENT ANALYSIS REPORT
No ratings yet
SENTIMENT ANALYSIS REPORT
31 pages
7th Sem 1
No ratings yet
7th Sem 1
32 pages
Parkison's Diseases Prediction Using Machine Learning
No ratings yet
Parkison's Diseases Prediction Using Machine Learning
10 pages
Deep Learning Based Car Damage Detection, Classification and Severity
No ratings yet
Deep Learning Based Car Damage Detection, Classification and Severity
7 pages
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
No ratings yet
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
6 pages
Nimbalkar Sandesh Seminar PPT Final
No ratings yet
Nimbalkar Sandesh Seminar PPT Final
20 pages
Machine Learning/ Artificial Intelligence (MLAI) Internship
No ratings yet
Machine Learning/ Artificial Intelligence (MLAI) Internship
4 pages
Fake News Detection Using Natural Language Processing
100% (1)
Fake News Detection Using Natural Language Processing
8 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Spam Detection Using Machine Learning
No ratings yet
Spam Detection Using Machine Learning
4 pages
Mc9280 Data Mining and Data Warehousing
No ratings yet
Mc9280 Data Mining and Data Warehousing
1 page
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Signature Forgery Detection
No ratings yet
Signature Forgery Detection
6 pages
Fake Job Detection Using Machine Learning
No ratings yet
Fake Job Detection Using Machine Learning
8 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
JARVIS
No ratings yet
JARVIS
6 pages
Project Report On Flight Price Predication Using ML Techniques
No ratings yet
Project Report On Flight Price Predication Using ML Techniques
23 pages
Genai Manual
No ratings yet
Genai Manual
103 pages
Survey of Machine Learning in Phishing Detection Research
No ratings yet
Survey of Machine Learning in Phishing Detection Research
21 pages
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Python Deep Learning Complete Self-Assessment Guide
From Everand
Python Deep Learning Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
XII Pre-Board (Morning)
100% (1)
XII Pre-Board (Morning)
5 pages
Company Profile - Wahoo Waterworld
No ratings yet
Company Profile - Wahoo Waterworld
28 pages
01__361-Emerald_Pratt-3611866_Link
No ratings yet
01__361-Emerald_Pratt-3611866_Link
12 pages
Module 2 Standard Operating Procedures - PostTest
No ratings yet
Module 2 Standard Operating Procedures - PostTest
2 pages
CV Jeltri Aktifani Lase
No ratings yet
CV Jeltri Aktifani Lase
1 page
AIC - A318 - A319 - A320 - A321 - AMM - FSN - 908 - 01-Feb-2024 - 35-10-00-040-002-A - Loss of REGUL LO PR Indication
No ratings yet
AIC - A318 - A319 - A320 - A321 - AMM - FSN - 908 - 01-Feb-2024 - 35-10-00-040-002-A - Loss of REGUL LO PR Indication
6 pages
Hamming Weight
No ratings yet
Hamming Weight
5 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
18 pages
Pr2 Action Plan
No ratings yet
Pr2 Action Plan
3 pages
Rani Durgawati DWG 2
No ratings yet
Rani Durgawati DWG 2
1 page
FE - Exit Formalities (Logesh Selvaraj)
No ratings yet
FE - Exit Formalities (Logesh Selvaraj)
6 pages
Sakuragawa UCF Catalog
No ratings yet
Sakuragawa UCF Catalog
2 pages
Future of Work
No ratings yet
Future of Work
33 pages
PHD Thesis Industrial Engineering
100% (3)
PHD Thesis Industrial Engineering
5 pages
Lab Assignment 2
No ratings yet
Lab Assignment 2
3 pages
Peranan Media Digital Dalam Mempertahankan Budaya Lokal Indonesia Di Era Globalisasi
No ratings yet
Peranan Media Digital Dalam Mempertahankan Budaya Lokal Indonesia Di Era Globalisasi
7 pages
6911
No ratings yet
6911
13 pages
Bankart Uputstvo Za Integraciju, ENG
No ratings yet
Bankart Uputstvo Za Integraciju, ENG
31 pages
GRC Risk Management and Process Control 10.0 Content Starter Kits
No ratings yet
GRC Risk Management and Process Control 10.0 Content Starter Kits
34 pages
Freedoge - Co.in Verifier2
No ratings yet
Freedoge - Co.in Verifier2
2 pages
Chapter 6
No ratings yet
Chapter 6
29 pages
Tutorial 4 To 6 QP
No ratings yet
Tutorial 4 To 6 QP
6 pages
Cómo Escribir Un Ensayo de Memorias
100% (1)
Cómo Escribir Un Ensayo de Memorias
6 pages
B2B-Suzlon Wind Turbine
No ratings yet
B2B-Suzlon Wind Turbine
41 pages
SoMe4AYRH Guide - Updated-Min
No ratings yet
SoMe4AYRH Guide - Updated-Min
93 pages
TLC 5615
No ratings yet
TLC 5615
25 pages
Installation Instructions: Load Sensing Valve
No ratings yet
Installation Instructions: Load Sensing Valve
8 pages
Data Structure-58-60
No ratings yet
Data Structure-58-60
3 pages
MICRO PROJECT Saniya Cyber Law PDF
No ratings yet
MICRO PROJECT Saniya Cyber Law PDF
8 pages