End-to-end machine learning pipeline for fraud detection with real-time simulation, data drift monitoring, and interactive dashboards.
breeboost/
├── src/
│ ├── inference.py # Model inference logic
│ ├── utils/
│ │ └── logger.py # Logging utility
│ └── ...
├── monitoring/
│ ├── reference.csv # Clean data from training
│ ├── production.csv # New data + predictions
│ ├── report.py # Drift detection script
│ └── reports/
│ └── report.html # Generated drift report
├── data/
│ └── processed/
│ └── paysim_cleaned.csv # Cleaned dataset
├── app.py # Streamlit fraud dashboard
├── requirements.txt
└── README.md
flowchart TD
A[Cleaned Data] --> B[Model Inference]
B --> C[Predictions + Probabilities]
A --> D[Reference Data]
C --> E[Drift Detection]
D --> E
E --> F[HTML Report]
style A fill:#f9f,stroke:#333,stroke-width:1px
style B fill:#bbf,stroke:#333,stroke-width:1px
style C fill:#bfb,stroke:#333,stroke-width:1px
style D fill:#bbf,stroke:#333,stroke-width:1px
style E fill:#ffb,stroke:#333,stroke-width:1px
style F fill:#fc9,stroke:#333,stroke-width:1px
- Trained XGBoost classifier
- Input validation and prediction interface
- Outputs label + fraud probability
- Simulate transactions with input form
- View prediction results and probabilities
- Visualize important features and correlations
- Compares training vs production data
- Visualizes drift for numerical features
- Outputs full HTML diagnostics
pip install -r requirements.txtpython src/inference.pypython src/utils/extract_ref.pyUse the dashboard or inference module to generate rows for production.csv.
python monitoring/report.pystreamlit run app.pyamount,oldbalanceOrg,newbalanceOrigerrorBalanceOrig,errorBalanceDesthour,day,is_large_transaction
- Python 3.8+
- pandas, xgboost, joblib
- evidently, seaborn, matplotlib
- streamlit
Install everything:
pip install -r requirements.txt- Performance drift monitoring (F1, recall)
- Notification triggers (e.g. Slack alerts)
- CI/CD with GitHub Actions
- Dockerization + cloud deployment
- Add feature importance explanation (e.g. SHAP)
- Run scripts from the project root for relative paths to resolve.
- Check
monitoring/reports/report.htmlregularly to evaluate input stability. - You can customize the model threshold and feature set in
inference.py.
This project is licensed under the MIT License.
