Skip to content

Automated ML web application built in Python. Features include dynamic data cleaning (One-Hot Encoding/Scaling), interactive EDA with Plotly, and model training via Scikit-Learn or custom algorithms. Designed to demonstrate robust backend logic, state management, and algorithmic understanding through a simple, drag-and-drop interface.

Notifications You must be signed in to change notification settings

kapooraditi79/ML-pipeline-streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ML-pipeline-streamlit

🚀No-Code ML Pipeline Builder

Live demo: https://ml-pipeline-app-aditi79.streamlit.app/ A robust, web-based AutoML tool that allows users to upload raw data, process it, train models, and visualize results—without writing a single line of code.

Designed with a focus on Machine Learning Engineering principles, this project bridges the gap between complex algorithmic logic and intuitive user experience.

📖 Overview The objective was to build an end-to-end ML pipeline that is accessible to non-technical users but powerful enough to handle real-world, "dirty" datasets.

Unlike standard tutorial scripts, this application implements a dynamic preprocessing engine that automatically detects data types, handles missing values, and encodes categorical features, ensuring the pipeline never crashes on arbitrary user data.

Key Features

  1. Universal Dataset Support: Upload any CSV/Excel. The system automatically identifies Target vs. Features and Text vs. Numbers.

  2. Robust Preprocessing:

    Automated Imputation (Mean for numerical, Mode for categorical).

    Dynamic One-Hot Encoding and Standardization/Normalization.

  3. Algorithm "Zoo":

    Standard: Logistic Regression & Decision Trees (Scikit-Learn).

    Custom Implementation: A MulticlassLogistic regression built from scratch using Numpy to demonstrate the underlying mathematics (Gradient Descent, Cross-Entropy Loss).

  4. Interactive Visualizations:

    Plotly Heatmaps for Confusion Matrices.

    Tree Structure Visualization for Decision Tree interpretability.

Feature Importance analysis.

🛠 Technical Architecture

  1. it is engineered for stability and mathematical correctness.

  2. The Preprocessing Pipeline (ColumnTransformer) To handle raw user data safely, I implemented a split-path pipeline using Scikit-Learn’s ColumnTransformer. This ensures that:

  3. Numerical columns are isolated, imputed, and scaled (Standard or MinMax).

  4. Categorical columns are isolated, imputed, and One-Hot Encoded.

  5. The paths merge back together into a generic NumPy array ready for training.

Architecture Snippet

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_cols),
        ('cat', categorical_transformer, categorical_cols)
    ], remainder='drop'
)
  1. Custom ML Implementation (The "From Scratch" Engine) While Scikit-Learn is used for production reliability, I implemented a Custom Logistic Regression class to demonstrate an understanding of the core algorithms.

  2. Optimization: Batch Gradient Descent.

  3. Loss Function: Log-Loss (Binary Cross-Entropy) with L1/L2 Regularization support.

  4. Multiclass Strategy: One-vs-Rest (OvR) wrapper around the binary classifier to handle multi-class datasets (e.g., Iris).

  5. Initialization: Xavier/Glorot Initialization for weight stability.

📸 Visuals & Insights

The application prioritizes interpretability:

  1. Decision Tree Plotting: Uses matplotlib to render the actual tree structure, allowing users to trace the decision logic of the model.

  2. Interactive Confusion Matrix: A Plotly heatmap that allows users to hover over specific errors to understand False Positives/Negatives.

  3. Exploratory Data Analysis (EDA): Automatic correlation heatmaps and target distribution checks before training begins.

⚙️ Installation & Usage

Prerequisites: Python 3.8+

Clone the Repository

git clone https://github.com/yourusername/no-code-ml-pipeline.git
cd no-code-ml-pipeline

Install Dependencies

pip install -r requirements.txt

Run the Application streamlit run app.py How to Use

  1. Upload a .csv file (e.g., Titanic, Iris, or Customer Churn).
  2. Select your Target Column in the sidebar.
  3. Adjust Preprocessing (Scaling) and Model Hyperparameters.

Click Run Pipeline.

🧠 Design Philosophy (Why Streamlit?) As an ML Engineer, my primary focus is on the model architecture and data integrity rather than frontend boilerplate.

I chose Streamlit because it allows for rapid prototyping of data applications. It enables the creation of a clean, functional UI while keeping the codebase 100% Python, facilitating easier integration with backend ML libraries like PyTorch, TensorFlow, or Scikit-Learn in the future.

This approach mimics real-world industry workflows where ML Engineers build "Proof of Concept" (PoC) apps to demonstrate model value to stakeholders before handing off to full-stack teams for scaling.

🚀 Future Roadmap [ ] Model Persistence: Add functionality to download the trained model as a .pkl file.

[ ] Deep Learning Support: Add a simple Neural Network builder (PyTorch) for more complex datasets.

[ ] Auto-Model Selection: Implement a "Search" feature that runs multiple models and picks the best one automatically (AutoML).

About

Automated ML web application built in Python. Features include dynamic data cleaning (One-Hot Encoding/Scaling), interactive EDA with Plotly, and model training via Scikit-Learn or custom algorithms. Designed to demonstrate robust backend logic, state management, and algorithmic understanding through a simple, drag-and-drop interface.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published