This repository discusses fraud detection analysis used to predict fake transactions made to retail shops, utilizing their transaction history details. It finds out which combination of sampling and machine learning model is best for fraud detection by comparing their quality by PR curve comparision.
Link: fraud transaction analysis
-
I have increased the accuracy of the genuine transaction by 47% after sorting out the best model and data pre-processing.
-
Analyzed, cleaned, and pre-processed for 41989 records of datasets, assessed using various combinations of samplings and machine learning models.
-
Instead of using a single model used 3 unique models and sampling methods created 9 combinations to select the best combo.
-
Import libraries, keep the dataset prepared, ready for analysis, filter important variables, and well-structured by running them in MS Excel.
-
Started with data cleaning, instead of removing incomplete records replaced them with the median of the column to bring more accurate insights.
-
Data pre-processing, need to convert categorical data into, numerical data for convenience, converted 5 categorical variables.
-
Implementing sampling to support the least count data, in the project made use of over-sampling, under-sampling, and SMOTE sampling.
-
With 3 different samples are processed to find fake transactions using 3 models to get 9 predictions.
-
Used PR curve to find each model's behavior precision, recall, and f1 score are compared for each model.