Skip to content

goShell3/fraud-detection-ecommerce-credit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraud Detection in E-commerce Credit Transactions

Overview

This project aims to detect fraudulent transactions in e-commerce credit data using machine learning. The workflow includes data analysis, preprocessing, model building, evaluation, and explainability using SHAP.

Data Sources

  • Fraud_Data.csv: Contains transaction and user information with a 'class' label indicating fraud.
  • IpAddress_to_Country.csv: Maps IP address ranges to countries for geolocation analysis.
  • creditcard.csv: Standard credit card fraud dataset with a 'Class' label.

Project Structure

fraud-detection-ecommerce-credit/
  ├── data/                  # Raw data files
  ├── notebooks/             # Jupyter notebooks for analysis
  ├── output/                # Output files and results
  ├── scripts/               # (Optional) Python scripts
  ├── src/                   # Source code for preprocessing, etc.
  └── README.md              # Project documentation

Setup Instructions

  1. Clone the repository and navigate to the project directory.
  2. Install dependencies (recommended: use a virtual environment):
    pip install pandas numpy matplotlib seaborn scikit-learn shap missingno
    # For XGBoost or LightGBM, install as needed:
    pip install xgboost lightgbm
  3. Download the data files and place them in the data/ directory.
  4. Open the main notebook:
    • notebooks/01_data_analysis_preprocessing.ipynb

How to Run

  • Launch Jupyter Notebook:
    jupyter notebook
  • Open the notebook and run cells sequentially.

Key Tasks

1. Data Analysis and Preprocessing

  • Handle missing values (impute/drop)
  • Data cleaning (remove duplicates, correct types)
  • Exploratory Data Analysis (EDA)
  • Merge datasets for geolocation
  • Feature engineering (transaction frequency, time-based features)
  • Data transformation (class imbalance, scaling, encoding)

2. Model Building and Training

  • Train-test split
  • Model selection: Logistic Regression (baseline), Random Forest or XGBoost (ensemble)
  • Model evaluation: AUC-PR, F1-Score, Confusion Matrix
  • Model comparison and justification

3. Model Explainability

  • Use SHAP to interpret the best model
  • Generate summary and force plots
  • Discuss key drivers of fraud

Requirements

  • Python 3.7+
  • pandas, numpy, matplotlib, seaborn
  • scikit-learn
  • shap
  • missingno
  • xgboost or lightgbm (optional, for ensemble models)

Notes

  • Ensure all data files are present in the data/ directory before running the notebook.
  • For large datasets, ensure sufficient memory and processing power.

License

This project is for educational purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages