0% found this document useful (0 votes)

62 views

Sms Spam

Les spams sms

Uploaded by

Ngwoua Nzié Achaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Sms Spam

Les spams sms

Uploaded by

Ngwoua Nzié Achaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Building an Effective Spam Detection

System for Mobile Operators in the

Democratic Republic of the Congo
1. Introduction

Spam sms and their impact on users and network resources

With the increasing use of mobile devices in mobile telecommunication, the number of text
messages sent every day has grown exponentially. According to Statista [1], a company that
provides market and consumer data on a wide range of topics, including digital media and
technology, the number of mobile messages sent worldwide in 2020 reached 3.5 trillion. In
parallel, the proliferation of web pages and social media messaging applications such as
WhatsApp, Telegram, Snapchat, Facebook, and Instagram has expanded the types of messages
that phone users can send. Messages now include not only text but also videos and audios,
adding depth and richness to communication [2].
While email messages are commonly used for professional communication, in some regions
like the Democratic Republic of the Congo, people frequently rely on SIM cards provided by
telecommunication providers to access mobile messaging services. The diverse range of
services and interests that mobile messages encompass, from mobile banking to social media
communication, gaming, localization platforms, health, and more, has led to a significant
increase in the volume of messages being sent.
However, among the legitimate messages, there exist those with malicious intent aimed at
deceiving individuals into divulging personal information, spreading fake news, coercing
money transfers, issuing threats, or engaging in other harmful activities. Moreover it can also
affect voice experience by increasing network traffic.

The relevance of developing a spam detection application for mobile operators in the
DRC

In the realm of telecommunications, mobile devices are extensively utilized for sharing various
forms of communication, including texts, emails, and chats through targeted applications.
Specifically, Short Message Service (SMS) plays a crucial role in conveying personal and
professional information via mobile phones [3]. Initially limited to messages of fewer than 160
characters [4], SMS has evolved to accommodate longer texts and even multimedia content,
sourced from humans, other phone users, online applications, or network operators [5].
However, the accessibility and openness of SMS communication have also paved the way for
a subset of individuals to exploit this platform by disseminating deceptive or unwanted
messages. These messages often employ tactics such as false promises of monetary rewards,
requests for payments under false pretenses, or misleading job offers, among others. Such
fraudulent activities contribute to the proliferation of spam messages that can deceive
unsuspecting recipients.
Moreover, beyond the realm of deceptive content lies a more insidious threat posed by
scammers who exploit vulnerabilities within messaging systems to inject malware or install
mobile spyware on users’ devices. Notable instances like SimJacker [6] underscore the potential
risks that extend beyond message content to the very infrastructure of messaging systems.
Additionally, legitimate organizations that utilize network operators for marketing or
advertising purposes inadvertently provide an avenue for scammers to impersonate authorized
entities, further complicating the distinction between official and unofficial messages. This
confluence of challenges underscores the complexity of distinguishing between genuine and
fraudulent messages in the mobile communication landscape [7, 8].
Given the multifaceted nature of spam messages and the associated risks they pose, it is
imperative to devise solutions that aid and protect users from falling victim to malicious
activities. Addressing issues such as false market advertising, security threats, and unwanted
messages necessitates the development of robust tools and mechanisms to facilitate secure and
reliable messaging communication. By proactively tackling these challenges, stakeholders can
foster a safer and more trustworthy mobile communication environment in the Democratic
Republic of the Congo which have the particularity of Swahili.

2. Background and Motivation

SMS Spam Prevalence in the Democratic Republic of Congo (DRC)

The proliferation of mobile communication in the Democratic Republic of Congo (DRC) has
brought both opportunities and challenges. While SMS (Short Message Service) has become a
powerful tool for communication and marketing, it has also led to an increase in unwanted SMS
spam. Here, we explore the prevalence of SMS spam in the DRC, its impact on users, and
potential strategies to mitigate this issue.

Between 2002 and 2007, the DRC's mobile market transformed from an oligopolistic structure
to one characterized by monopolistic competition. New players entered the market, and
established leaders like Vodacom and Airtel significantly influenced the landscape. As a result,
the mobile penetration rate reached 44.6% in 2014, surpassing the African average [9].

Mobile operators in the DRC have leveraged SMS as a communication and marketing channel.
Promotional messages, service alerts, and transaction notifications are routinely delivered via
SMS. However, this widespread use has also opened the door to spam. Users receive unsolicited
messages, often related to commercial offers, contests, or dubious services.

Challenges and impact of sms spam can be grouped in three ways. Firstly, we have user
experience: SMS spam disrupts the user experience, causing annoyance and frustration.
Legitimate messages can get lost in the flood of unwanted content. Secondly, resource drain:
Spam consumes network resources, affecting overall system efficiency and potentially leading
to increased costs for operators. And thirdly, privacy concerns: Some spam messages request
personal information or promote fraudulent schemes, posing privacy risks to recipients.

While the DRC faces SMS spam challenges, it is essential to compare its situation with that of
other countries. For instance, in Nigeria [10], each subscriber receives an average of 2.45 spam
messages per day. The majority of these are commercial in nature. However, only a small
fraction of recipients report spam, highlighting the need for more effective regulation and user
empowerment.

Addressing SMS spam in the DRC requires a multi-pronged approach. By combining

regulatory efforts, user awareness, and technological solutions, we can create a cleaner and
more user-friendly mobile communication environment.

Interest

In the society, it is worthy to contribute to facilitating communication and reducing the impact
of spam messages, which are somehow annoying and stressful for citizens.
Economically, it is appears as time saver in business employees by decreasing concerns
basedonthreatsandunwillingmessagesduetoitscapacityofprovidingﬁlteringfunctions.
For scientists, this work is a reference for those whose to delve in and gaining skills and
techniques in mobile networks environment in machine learning operations systems.

3. Data Collection and Preprocessing

Description of dataset used for training and evaluation.

For this paper, we used two data sources: one from Kaggle [11, 12] web site, for English and
French, another from a survey for Swahili.
Kaggle is a thriving online platform, hosts the most extensive global community of data
scientists. With over 536,000 active members spanning 194 countries, Kaggle provides
powerful tools and resources to propel your data science endeavors. Whether you're a seasoned
professional or just starting out, Kaggle offers an array of opportunities to learn, collaborate,
and showcase your skills.
One fascinating aspect of Kaggle is its collection of datasets, including those related to spam
detection. Here there two groups of datasets:
Spam Mails Dataset: This dataset, containing 5,572 SMS messages, is meticulously labeled as
either "ham" (legitimate) or "spam". Researchers and data enthusiasts can use this dataset to
develop and fine-tune spam detection models. The dataset includes a variety of text messages,
allowing practitioners to explore different patterns and features associated with spam.
Email Spam Classification Dataset: Another valuable resource on Kaggle is the CSV file
containing information about 5,172 emails, categorized as either spam or not spam. Researchers
can use this dataset to build and evaluate spam filters, employing techniques such as Naive
Bayes or machine learning algorithms.
We used the first group, with English and French, the repartition of row data are shown in Table
1 below.
Languages Ham Spam Total
English 4825 747 5572
French 4825 747 5572
Table 1. Raw data counted before any wrangling, Kaggle

For Swahili, we get Table 2 from survey:

Languages Ham Spam Total
Swahili 117 15 132
Table 2. Raw data counted before any wrangling, Swahili

Data preparation

It where done in five steps:

 Check the columns inside the Kaggle dataset, we get five columns: labels, text, text_hi,
text_de and text_fr. labels contains spam or ham, text, text_hi, text_de and text_fr
contain respectively messages in English, Hindi, Dutch and French;
 Extract the valuable features: Here, we extract two sets of couples, (labels, text) and
(labels, text_fr). This because our study focus on three languages which two, English
and French, are in the dataset;
 Counting raw data: check Table 1;
 Ckecking null values;
 Dropping duplicated values;
The same process was execute for Swahili data and the results are shown in Table 3, Figure 1
and Figure 2.
Languages Ham Spam Total
English 4516 641 5157
French 4494 640 5134
Swahili 99 14 113
Table 3. Clean data counted after preprocessing
Raw vs Clean data
6000

5000

4000

3000

2000

1000

0
English French Swahili

Raw data Clean data

Figure 1. Data: Raw vs Clean

Clean data: Ham vs Spam

6000

5000

4000

3000

2000

1000

0
English French Swahili

Ham Spam

Figure 2. Clean data: Ham vs Spam

4. Tools and Feature Engineering

Tools and frameworks

The completion of this project necessitated the utilization of various dependencies [13], tools,
frameworks. These resources were instrumental in realizing the project’s objectives. Notably,
they were categorized into two main areas: those integral to the core functionality and others
relevant to the user side, distinguishing the back-end from the front-end. Moreover, the project
involved tools for data analysis and predictive modeling. The structure of these tools is
presented in the Table 4 for more clarity.
Tools Roles
Numpy Scientiﬁc computing library
Data manipulation and analysis
Panda
library
Python library used for 2D and 3D
Data analysis and Matplotlib
data visualization
Machine Learning
Another Python library for
libraries
Seaborn statistical data visualization, built
on matplotlib
Machine learning machine
Scickit-learn
learning library
Python framework for developing
Django Python
web applications and APIs
Maker of web pages by providing
Programming Languages HTML
its structure and content
(Front-end and Back-end)
CSS Styling the content
Rendering web pages interactive
JS
and dynamic
Table 4. Structuring the tools [13]

Feature engineering

Feature engineering plays a crucial role in spam detection by selecting and creating relevant
features from the available data to enhance the performance of the spam detection model. In
the context of spam detection, features can include elements such as the frequency of specific
words in the message text, the presence of commonly associated spam keywords, the message
length, the presence of hyperlinks, the time of message sending, and the sender’s domain. By
utilizing these features, a machine learning model can be trained to more effectively identify
spam messages. The process of feature engineering thus improves the model’s ability to detect
spams by providing it with informative and discriminative information.
For this paper, we have two relevant features: column header labels in dataset and word
frequency in message. The exploration of those features are done through sklearn Python
library, precisely by used of LabelEncoder and TfidfVectorizer functions [13].

5. Model Selection, Evaluation and Optimisation

Different machine learning algorithms for spam classification

When this stage is reached, it’s obviously understood that the previous, ie. gathering,
preparation, exploratory steps are already fulﬁlled. However, it is possible to go back there once
again according to the analysis requirements. What’s happen during the training and testing
stages ? Since all clean data are found in a dataset containing features in columns and data
values in rows, the class based on choices made for the comfortable algorithm can be
subsequently used to generate a model. This model is able to learn relationships and patterns
within the data, is what we call ’training’. Thus, the selection of suit ML algorithm is more
competitive involving studies, testing. However it is more lead by the kind of problem we wish
to solve, the number of features and its types, the kind of model that would suit the data more
the best [14].
So, according to those principles, here are the types of ML algorithms used:
 Supervised models: The Supervise ML algorithms are one of algorithms often used in
intelligent systems. Their manner of functioning is this: They get as inputs the data
related to the features, then they map them with desired outputs (the output is input’s
entry). Tasks used by supervised models to solve problems are in Table 5;
 Unsupervised Models: Unlike supervised models which learns from the labels data,
the unsupervised models are trained on unlabeled data. Their particularity lies in their
ability to learn from complex and large amount of data. Their best goal is to ﬁnd hidden
patterns, structures or relationships within the data even though they are not
proportional. Therefore, they are categorized as non-linear models [15]. The Table 6
provides a comprehensive overview of these tasks and the associated algorithms
employed to achieve them.

Genre and problem solving

Tasks Algorithms
examples
Naive Bayes, Logistic
Categorize input data into
Regression, Support Vector
Classiﬁcation [16] predeﬁned classes or labels.
Machine (SVM),Random
E.g: Sentiment Analysis
Forest, Decision Trees, etc
Linear Regression,
Polynomial Regression, Predict continuous numerical
Regression [17] Lasso Regression, Ridge output value. E.g: House
Regression, Suport Vector prices forecasting
Regression (SVR), etc
YOLO (You Only Look
Convolutional Neural
Object Detection [18] Once)
Networks (CNNs), etc
E.g: Self-driving cars
Natural Language Processing RNNS(Recurrent Neural Understand human language
(NLP) [19] Networks), LSTMs, etc E.g: Language translation
ARIMA(Autoregressive
Integral Moving average),
Predict future values in a
Exponential Smoothing
Time series Analysis [20] time series.
methods, Seasonal
E.g: Weather prediction
Decomposition of Time
Series (STL), etc
Learn hierarchical
RNNs, NLP,
representations from data, for
Deep Learning [21] GANs(Generative
deep prediction E.g: Image
Adversarial Networks), etc
recognition
Table 5. Which task for which Supervised Machine learning algorithm [13]
Genre and problem solving
Tasks Algorithms
examples
Group data points into
K-Means Clustering, clusters based on similarity,
Clustering [13] Hierarchical Clustering, without prior knowledge of
DBSCAN, etc group labels
E.g: Customer Segmentation
Reduce the number of
PCA(Principal Component
features(dimensions) in a
Analysis), t-Distributed
Dimension Reduction [22] dataset and retain essential
Stochastic Neighbor
information.
Embedding, etc
E.g: Image compression
Discovering latent topics
LDA(Latent Dirichlet
within a collection of
Allocation), Non-Negative
Topic Modeling [23] documents
Matric Factorization(NMF),
E.g: Content
etc
Recommendation
Preprocess daat to make data
Data compression PCA more manageable.
E.g : Data storage.
Table 6. Which task for which Unsupervised Machine learning algorithm

Comparing different machine learning algorithms for spam classification.

To compare those algorithms, we used accuracy and AUC (Area Under Curve). Results are
shown in Table 7 and Figure 3.
Logistic Decision Ham Spam
Bayes SVM
regression three
English 99% 99% 99% 90% 4516 641
French 99% 99% 99% 90% 4494 640
Swahili 90% 87% 85% 67% 99 14
Table 7. AUC: Languages vs ML algorithms
120%

100%

80%

60%

40%

20%

0%
Bayes SVM Logistic regression Decision three

English French Swahili

Figure 3. AUC : Languages vs ML algorithms

Logistic Decision Ham Spam

Bayes SVM
regression three
English 96% 97% 95% 96% 4516 641
French 96% 98% 97% 96% 4494 640
Swahili 87% 87% 87% 91% 99 14
Table 8. Accuracy: Languages vs ML algorithms

100%
98%
96%
94%
92%
90%
88%
86%
84%
82%
80%
Bayes SVM Logistic regression Decision three

English French Swahili

Figure 4. Accuracy: Languages vs ML algorithms

By observing Tables 7, 8 and Figures 3, 4 we note that the performance of a model depend on
the language on which it is apply. Specifically, results of English and French are very similar
compare to Swahili due to the structure of the language and the number of messages in Swahili.
Those models where trained on 80% of data and test on 20%.
Model optimisation

Actually, Tables 7, 8 show that the every models have its characteristics allowing to be
convenient to certain data, thus combining all for more optimization can be appealing by taking
care of over-ﬁtting. Therefore, two means are used: Voting Classifier and Grid Search.
 VotingClassifier: It is a versatile ensemble classifier that combines multiple base
estimators to make predictions.
Language Score
English 98%
French 98%
Swahili 87%
Table 9. VotingClassifier: Accuracy scores (en, fr and sw)

 GridSearchCV: it perform an exhaustive search over specified parameter values for an

estimator. It helps find the best combination of hyperparamater by evaluating the
model’s performance using cross validation. We test it on Swahili data and we obtain a
score of 87%, exacly like VotingClassifier.

6. Model Deployment and Integration

The deployment phase transitions the machine learning models from a development
environment to a production setting. For spam detection, this involves integrating the models
into an API that will act as the user interface. This API enables real-time analysis and
classification of SMS in English, French, and Swahili, providing immediate feedback to the
end-user.
The API serves as an intermediary between the machine learning models and the application
layer. It receives input SMS, processes them through the deployed models, and returns a spam
or non-spam verdict. The integration process requires careful planning to ensure that the API
can handle the expected load and provide low-latency responses. Figures 5, 6 Show model
deployement and integration.
The user interface, powered by the API, must be intuitive and user-friendly. It should provide
clear options for users to submit SMS for analysis and display the results in an easily
understandable format. Additionally, it should offer the ability to learn from user feedback to
improve the accuracy of the spam detection models. To achieve that, we had use Django, HTML
and CSS (Figure 7, [24]).
Figure 5. ML model

Figure 6. Deplyement model

Figure 7. Testing the end-point for api Services

7. Study Limitations and Future research paths

Although this work comes with several advantages, it also has its limitations. The solution it
offers isn’t universal, as it’s primarily tailored to a specific region, especially the eastern part of
the Democratic Republic of Congo.
Furthermore, a portion data, which is a fundamental component of any machine learning
algorithms, was collected from an unofficial source, namely Kaggle. This data doesn’t consider
the local interests and language cultures, as it predominantly consists of content in French and
English. Additionally, the other data sources were limited in quantity, making it challenging to
provide accurate estimations that represent the entire population.
Moreover, the solution is designed to work with only three languages. This means that messages
containing a mixture of languages might pose challenges for the model in terms of accurate
classification. There is a need for a more advanced multilingual model classifier to address this
issue for instance by utilizing advanced techniques in artificial intelligence.

8. Conclusion
9. References

[1] Statista’s own team of researchers and analysts. Number of mobile messages worldwide
from 2019 to 2023 (in trillions). https://fr.statista.com/, 2020.
[2] Cori Faklaris and Sara Anne Hook. Oh, snap! the state of electronic discovery amid the rise
of snapchat, whatsapp, kik, and other mobile messaging apps. 2016.
[3] M Lavanya and KR Aruna. Sms spam detection using deep learning. Journal homepage:
www. ijrpr. com ISSN, 2582:7421.
[4] Gwenael Le Bodic. Mobile messaging technologies and services: SMS, EMS and MMS.
John Wiley & Sons, 2005.
[5] Sunil Kumar Jangir, Manoj Kumar Sharma, and Pawan Kumar Gupta. Design and
implementation of sms gateway api for mobile communication networks. International
Journal of Computer Applications, 151(9):1–5, 2016.
[6] Catalin Cimpanu. Simjacker vulnerability exploited for surveillance by at least one nation-
state. ZDNet, 2019.
[7] Guangquan Chen, Weijun Wang, and Xuan Zhou. A survey on sms spam filtering
techniques. Journal of Network and Computer Applications, 80:149–159, 2017.
[8] Matti Leppäniemi and Heikki Karjaluoto. Mobile marketing: From marketing strategy to
mobile marketing campaign implementation. International Journal of Mobile Marketing,
3(1), 2008.
[9] Crispin Malingumu Syosyo. Analyse du marché des télécommunications mobiles en
République Démocratique du Congo : Dynamique du marché et stratégies des acteurs.HAL
open science. 2021.
[10] Oluwafemi Osho and al. Mobile spamming in Nigeria : An empirical survey.
ResearchGate. 2015.
[11] www.kaggle.com
[12] Kaggle : Tout ce qu'il faut savoir sur cette plateforme - DataScientest.com.
https://datascientest.com/kaggle-tout-ce-quil-a-savoir-sur-cette-plateforme.
[13] Christian Murhula Byabushi. Development of an interface application for detection of
spam on a mobile operator: Case study of Airtel, Vodacom and Orange. CATHOLIC
UNIVERSITY OF BUKAVU. Academic year: 2022-2023.
[14] H Wang, ZeZXeZBePJ Lei, X Zhang, B Zhou, and J Peng. Machine learning basics.
Deep learning, pages 98–164, 2016.
[15] Memoona Khanum, Tahira Mahboob, Warda Imtiaz, Humaraia Abdul Ghafoor, and
Rabeea Sehar. A survey on unsupervised machine learning algorithms for
automation,classificationandmaintenance. International Journal of Computer Applications,
119(13), 2015.
[16] FY Osisanwo, JET Akinsola, O Awodele, JO Hinmikaiye, O Olakanmi, J Akinjobi, et
al. Supervised machine learning algorithms: classification and comparison. International
Journal of Computer Trends and Technology (IJCTT), 48(3):128–138, 2017.
[17] Dastan Maulud and Adnan M Abdulazeez. A review on linear regression comprehensive
in machine learning. Journal of Applied Science and Technology Trends, 1(4):140–147,
2020.
[18] Qiang Bai, Shaobo Li, Jing Yang, Qisong Song, Zhiang Li, and Xingxing Zhang.
Objectdetectionrecognitionandrobotgraspingbasedonmachinelearning: Asurvey. IEEE
access, 8:181855–181879, 2020.
[19] Maria Razno. Machine learning text classification model with nlp approach.
Computational Linguistics and Intelligent Systems, 2:71–73, 2019.
[20] Martin Längkvist, Lars Karlsson, and Amy Loutfi. A review of unsupervised feature
learning and deep learning for time-series modeling. Pattern recognition letters, 42:11–24,
2014.
[21] Christian Janiesch, Patrick Zschech, and Kai Heinrich. Machine learning and deep
learning. Electronic Markets, 31(3):685–695, 2021.
[22] Carlos Oscar Sánchez Sorzano, Javier Vargas, and A Pascual Montano. A survey of
dimensionality reduction techniques. arXiv preprint arXiv:1403.2877, 2014.
[23] Jipeng Qiang, Zhenyu Qian,Yun Li, Yunhao Yuan, andXindongWu. Short text topic
modeling techniques, applications, and performance: a survey. IEEE Transactions on
Knowledge and Data Engineering, 34(3):1427–1445, 2020.
[24] IL FAUT METTRE LE LIEN VERS L’APPLICATION ICI!!!!

DM 64 - Capstone Project - T2 Bajaj Auto Project - Gropup 6
No ratings yet
DM 64 - Capstone Project - T2 Bajaj Auto Project - Gropup 6
4 pages
Nghi Dinh 91 2020 ND CP Chong Tin Nhan Rac Thu Dien Tu Rac Cuoc Goi Rac
No ratings yet
Nghi Dinh 91 2020 ND CP Chong Tin Nhan Rac Thu Dien Tu Rac Cuoc Goi Rac
21 pages
CORGIBUCKS Statement of Work
No ratings yet
CORGIBUCKS Statement of Work
7 pages
Short Message Service Using SMS Gateway - Thesis
100% (1)
Short Message Service Using SMS Gateway - Thesis
5 pages
2g Scam Project
No ratings yet
2g Scam Project
24 pages
(Case Study) Whatsapp Forensics: Decrypt Encrypted Whatsapp Database Files With Salvationdata'S Free Forensic Tool
100% (1)
(Case Study) Whatsapp Forensics: Decrypt Encrypted Whatsapp Database Files With Salvationdata'S Free Forensic Tool
5 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
E-Mail Spam Detection Using Machine Learning KNN
No ratings yet
E-Mail Spam Detection Using Machine Learning KNN
5 pages
Spam Review Detection Using Natural Language Processing Techniques
No ratings yet
Spam Review Detection Using Natural Language Processing Techniques
6 pages
Email Spam Filtering ITS Repository 5216201701-Master - Thesis
No ratings yet
Email Spam Filtering ITS Repository 5216201701-Master - Thesis
82 pages
The Spam Book On Porn Viruses and Other PDF
No ratings yet
The Spam Book On Porn Viruses and Other PDF
18 pages
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR: The requested URL could not be retrieved</TITLE> <STYLE type="text/css"></STYLE> </HEAD><BODY> <H1>ERROR</H1> <H2>The requested URL could not be retrieved</H2> <HR noshade size="1px"> <P> While trying to process the request: <PRE> TEXT http://www.scribd.com/titlecleaner?title=CyberCrime+Report.docx HTTP/1.1 Host: www.scribd.com Proxy-Connection: keep-alive Accept: */* Origin: http://www.scribd.com X-CSRF-Token: ea5b3d74fc35283c15ef440947b36a61b715cffd User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31 X-Requested-With: XMLHttpRequest Referer: http://www.scribd.com/upload-document Accept-Encoding: gzip,defl
No ratings yet
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR: The requested URL could not be retrieved</TITLE> <STYLE type="text/css"></STYLE> </HEAD><BODY> <H1>ERROR</H1> <H2>The requested URL could not be retrieved</H2> <HR noshade size="1px"> <P> While trying to process the request: <PRE> TEXT http://www.scribd.com/titlecleaner?title=CyberCrime+Report.docx HTTP/1.1 Host: www.scribd.com Proxy-Connection: keep-alive Accept: */* Origin: http://www.scribd.com X-CSRF-Token: ea5b3d74fc35283c15ef440947b36a61b715cffd User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31 X-Requested-With: XMLHttpRequest Referer: http://www.scribd.com/upload-document Accept-Encoding: gzip,defl
32 pages
Social Issues and Professional Practice: M5LE5B
No ratings yet
Social Issues and Professional Practice: M5LE5B
2 pages
SMS Spam Fraud Prevention
No ratings yet
SMS Spam Fraud Prevention
6 pages
Fighting Obfuscated Spam
No ratings yet
Fighting Obfuscated Spam
15 pages
S-CSL Searchable PDF
No ratings yet
S-CSL Searchable PDF
152 pages
Social Engineering and Islam
No ratings yet
Social Engineering and Islam
8 pages
True-Caller: Oftware Equirements Pecification
No ratings yet
True-Caller: Oftware Equirements Pecification
12 pages
Free IDM Download Latest Version From or If You Already Have IDM Installed Update It (Process For Update
No ratings yet
Free IDM Download Latest Version From or If You Already Have IDM Installed Update It (Process For Update
26 pages
Understanding Sms PDU
No ratings yet
Understanding Sms PDU
18 pages
Class 7 Cyber Tools
No ratings yet
Class 7 Cyber Tools
20 pages
Solutions To The GSM Security Weaknesses
No ratings yet
Solutions To The GSM Security Weaknesses
6 pages
Interview Questions and Answers
No ratings yet
Interview Questions and Answers
3 pages
Dissecting SIM Jacker - Part 4 of 4 - Exploitation. Security Grind
No ratings yet
Dissecting SIM Jacker - Part 4 of 4 - Exploitation. Security Grind
1 page
GPS Spoofing
No ratings yet
GPS Spoofing
5 pages
The Flutter Taxi Clone Booking App
No ratings yet
The Flutter Taxi Clone Booking App
2 pages
Tutorial 1 Internet - Question
No ratings yet
Tutorial 1 Internet - Question
2 pages
GSM Security
No ratings yet
GSM Security
36 pages
Detection of Spams Using Extended ICA & Neural Networks
No ratings yet
Detection of Spams Using Extended ICA & Neural Networks
6 pages
SEO Book
No ratings yet
SEO Book
32 pages
Cell Phone Cloning: A Perspective On GSM Security: July 2007
No ratings yet
Cell Phone Cloning: A Perspective On GSM Security: July 2007
9 pages
Letter For The Record To Members of The House Committee On Energy and Commerce On Facebook, Transparency and Use of Consumer Data
No ratings yet
Letter For The Record To Members of The House Committee On Energy and Commerce On Facebook, Transparency and Use of Consumer Data
4 pages
Spam Filtering Install Guide
No ratings yet
Spam Filtering Install Guide
20 pages
Activation
No ratings yet
Activation
2 pages
Seo Tools
No ratings yet
Seo Tools
34 pages
2012 WHITEPAPER Telecommunication-Fraud-Management Waveroad ConsulT
No ratings yet
2012 WHITEPAPER Telecommunication-Fraud-Management Waveroad ConsulT
24 pages
OTP Based 2F Authentication
No ratings yet
OTP Based 2F Authentication
16 pages
Cyber1 PDF
No ratings yet
Cyber1 PDF
32 pages
Predicting Phishing Websites Based On Self-Structuring Neural Network
No ratings yet
Predicting Phishing Websites Based On Self-Structuring Neural Network
17 pages
Spam Message Detection Using Logistic Regression
No ratings yet
Spam Message Detection Using Logistic Regression
4 pages
Cyber Crime: by Ramesh Kumar
No ratings yet
Cyber Crime: by Ramesh Kumar
32 pages
Hacking The Complete Beginners Guide to Computer Hacking 1st edition by Jack Jones ISBN â€Ž 1545355053 978-1545355053 - The full ebook version is just one click away
100% (4)
Hacking The Complete Beginners Guide to Computer Hacking 1st edition by Jack Jones ISBN â€Ž 1545355053 978-1545355053 - The full ebook version is just one click away
82 pages
Project Syndicate Scanner 3.0
No ratings yet
Project Syndicate Scanner 3.0
16 pages
Citibank Scam at Gurgaon Retail Branch
No ratings yet
Citibank Scam at Gurgaon Retail Branch
8 pages
Preventing Mobile Fraud WP 2010
No ratings yet
Preventing Mobile Fraud WP 2010
7 pages
ORBIS
No ratings yet
ORBIS
3 pages
Information Gathering and Social Engineering
No ratings yet
Information Gathering and Social Engineering
11 pages
Detection of Phishing Websites Using PSO and Machine Learning Frameworks
No ratings yet
Detection of Phishing Websites Using PSO and Machine Learning Frameworks
3 pages
Silent SMS
100% (1)
Silent SMS
6 pages
How To Detect Fraud Sites On The Internet
No ratings yet
How To Detect Fraud Sites On The Internet
6 pages
Anti Forensic 1
No ratings yet
Anti Forensic 1
8 pages
Exploit Attack: Shaunak Joshi (Roll No:43162)
No ratings yet
Exploit Attack: Shaunak Joshi (Roll No:43162)
7 pages
Intelligent Phishing Website Detection and Prevention System by Using Link Guard Algorithm
No ratings yet
Intelligent Phishing Website Detection and Prevention System by Using Link Guard Algorithm
9 pages
Data Privacy On Social Media
No ratings yet
Data Privacy On Social Media
8 pages
Bandwidth Bandits
No ratings yet
Bandwidth Bandits
9 pages
Spamming As Cyber Crime
No ratings yet
Spamming As Cyber Crime
20 pages
Trend Micro WP Cybercrime and The Deep Web PDF
No ratings yet
Trend Micro WP Cybercrime and The Deep Web PDF
14 pages
Social Engineering
No ratings yet
Social Engineering
4 pages
Sherlock Being Catfished: A Memoir
From Everand
Sherlock Being Catfished: A Memoir
Joan Mellen
No ratings yet
Internet of Things Complete Self-Assessment Guide
From Everand
Internet of Things Complete Self-Assessment Guide
Gerardus Blokdyk
1/5 (1)
Copy Me Popular
From Everand
Copy Me Popular
Neil Smith
No ratings yet
The Social Construct
No ratings yet
The Social Construct
4 pages
Implementation of Special Science Elementary School Curriculum As Correlate of School Performance and Instructional Leadership: Basis For Continuous Improvement Plan
No ratings yet
Implementation of Special Science Elementary School Curriculum As Correlate of School Performance and Instructional Leadership: Basis For Continuous Improvement Plan
9 pages
Paint Industry of India
No ratings yet
Paint Industry of India
15 pages
RCL-08 Inewatt Airfield Lighting Solutions
No ratings yet
RCL-08 Inewatt Airfield Lighting Solutions
2 pages
MONEY AND INFLATION
No ratings yet
MONEY AND INFLATION
42 pages
A Bird of A Different Feather - Hawaii - Barney Wiki - Fandom
No ratings yet
A Bird of A Different Feather - Hawaii - Barney Wiki - Fandom
6 pages
Unit 3: Internal Control Over Cash
No ratings yet
Unit 3: Internal Control Over Cash
10 pages
PST Unit 3
No ratings yet
PST Unit 3
7 pages
Saic D 2025
No ratings yet
Saic D 2025
12 pages
Ahern Et Al 2018 A Cost Effectiveness Analysis of School-Based Suicide Prevention Programs (Germany)
No ratings yet
Ahern Et Al 2018 A Cost Effectiveness Analysis of School-Based Suicide Prevention Programs (Germany)
11 pages
Chevy High Performance - July 2015 USA
100% (1)
Chevy High Performance - July 2015 USA
92 pages
Roof Framing Plan: FT-1 FT-1
No ratings yet
Roof Framing Plan: FT-1 FT-1
1 page
Presentation by
No ratings yet
Presentation by
10 pages
Punong Barangay Tasks and Responsibilities 2018 PDF
100% (1)
Punong Barangay Tasks and Responsibilities 2018 PDF
56 pages
Basic Accounting Notes
No ratings yet
Basic Accounting Notes
2 pages
MSC Economics Dissertation Example
100% (2)
MSC Economics Dissertation Example
7 pages
Report by KKParthiban On Boiler Explosion of A Shell Type High PR Boiler
100% (2)
Report by KKParthiban On Boiler Explosion of A Shell Type High PR Boiler
97 pages
Noticia Ingles
No ratings yet
Noticia Ingles
2 pages
BRE Modern Methods of Construction
No ratings yet
BRE Modern Methods of Construction
10 pages
SRM Digest 2010
No ratings yet
SRM Digest 2010
208 pages
Spyder Student Excel
No ratings yet
Spyder Student Excel
21 pages
CS1311A Lecture 4 - Computer Software
No ratings yet
CS1311A Lecture 4 - Computer Software
39 pages
Lubuntu - Lightweight, Fast, Easier PDF
No ratings yet
Lubuntu - Lightweight, Fast, Easier PDF
4 pages
Hamza Masoud C.V - "I&C" Instrument and Control Specialist
No ratings yet
Hamza Masoud C.V - "I&C" Instrument and Control Specialist
3 pages
Mandate Letter Valuation Services
No ratings yet
Mandate Letter Valuation Services
4 pages
FINAL_Rule for Opening_till Amendment dated 18_09_2024
No ratings yet
FINAL_Rule for Opening_till Amendment dated 18_09_2024
74 pages
GRDJEV06I070016
No ratings yet
GRDJEV06I070016
7 pages
EXP-3
No ratings yet
EXP-3
3 pages
Understanding Data Warehousing and Data Mining
No ratings yet
Understanding Data Warehousing and Data Mining
7 pages

Sms Spam

Uploaded by

Sms Spam

Uploaded by

Building an Effective Spam Detection

System for Mobile Operators in the

Spam sms and their impact on users and network resources

2. Background and Motivation

SMS Spam Prevalence in the Democratic Republic of Congo (DRC)

Addressing SMS spam in the DRC requires a multi-pronged approach. By combining

3. Data Collection and Preprocessing

Description of dataset used for training and evaluation.

For Swahili, we get Table 2 from survey:

It where done in five steps:

Raw data Clean data

Figure 1. Data: Raw vs Clean

Clean data: Ham vs Spam

Figure 2. Clean data: Ham vs Spam

4. Tools and Feature Engineering

Tools and frameworks

5. Model Selection, Evaluation and Optimisation

Different machine learning algorithms for spam classification

Genre and problem solving

Comparing different machine learning algorithms for spam classification.

English French Swahili

Figure 3. AUC : Languages vs ML algorithms

Logistic Decision Ham Spam

English French Swahili

Figure 4. Accuracy: Languages vs ML algorithms

 GridSearchCV: it perform an exhaustive search over specified parameter values for an

6. Model Deployment and Integration

Figure 6. Deplyement model

7. Study Limitations and Future research paths

You might also like