Fraud Detection in E-commerce Credit Transactions

Overview

This project aims to detect fraudulent transactions in e-commerce credit data using machine learning. The workflow includes data analysis, preprocessing, model building, evaluation, and explainability using SHAP.

Data Sources

Fraud_Data.csv: Contains transaction and user information with a 'class' label indicating fraud.
IpAddress_to_Country.csv: Maps IP address ranges to countries for geolocation analysis.
creditcard.csv: Standard credit card fraud dataset with a 'Class' label.

Project Structure

fraud-detection-ecommerce-credit/
  ├── data/                  # Raw data files
  ├── notebooks/             # Jupyter notebooks for analysis
  ├── output/                # Output files and results
  ├── scripts/               # (Optional) Python scripts
  ├── src/                   # Source code for preprocessing, etc.
  └── README.md              # Project documentation

Setup Instructions

Clone the repository and navigate to the project directory.

Install dependencies (recommended: use a virtual environment):

pip install pandas numpy matplotlib seaborn scikit-learn shap missingno
# For XGBoost or LightGBM, install as needed:
pip install xgboost lightgbm

Download the data files and place them in the data/ directory.
Open the main notebook:
- notebooks/01_data_analysis_preprocessing.ipynb

How to Run

Launch Jupyter Notebook:
```
jupyter notebook
```
Open the notebook and run cells sequentially.

Key Tasks

1. Data Analysis and Preprocessing

Handle missing values (impute/drop)
Data cleaning (remove duplicates, correct types)
Exploratory Data Analysis (EDA)
Merge datasets for geolocation
Feature engineering (transaction frequency, time-based features)
Data transformation (class imbalance, scaling, encoding)

2. Model Building and Training

Train-test split
Model selection: Logistic Regression (baseline), Random Forest or XGBoost (ensemble)
Model evaluation: AUC-PR, F1-Score, Confusion Matrix
Model comparison and justification

3. Model Explainability

Use SHAP to interpret the best model
Generate summary and force plots
Discuss key drivers of fraud

Requirements

Python 3.7+
pandas, numpy, matplotlib, seaborn
scikit-learn
shap
missingno
xgboost or lightgbm (optional, for ensemble models)

Notes

Ensure all data files are present in the data/ directory before running the notebook.
For large datasets, ensure sufficient memory and processing power.

License

This project is for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
data		data
notebooks		notebooks
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fraud Detection in E-commerce Credit Transactions

Overview

Data Sources

Project Structure

Setup Instructions

How to Run

Key Tasks

1. Data Analysis and Preprocessing

2. Model Building and Training

3. Model Explainability

Requirements

Notes

License

About

Uh oh!

Releases

Packages

Languages

goShell3/fraud-detection-ecommerce-credit

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection in E-commerce Credit Transactions

Overview

Data Sources

Project Structure

Setup Instructions

How to Run

Key Tasks

1. Data Analysis and Preprocessing

2. Model Building and Training

3. Model Explainability

Requirements

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages