ML-pipeline-streamlit

🚀No-Code ML Pipeline Builder

Live demo: https://ml-pipeline-app-aditi79.streamlit.app/ A robust, web-based AutoML tool that allows users to upload raw data, process it, train models, and visualize results—without writing a single line of code.

Designed with a focus on Machine Learning Engineering principles, this project bridges the gap between complex algorithmic logic and intuitive user experience.

📖 Overview The objective was to build an end-to-end ML pipeline that is accessible to non-technical users but powerful enough to handle real-world, "dirty" datasets.

Unlike standard tutorial scripts, this application implements a dynamic preprocessing engine that automatically detects data types, handles missing values, and encodes categorical features, ensuring the pipeline never crashes on arbitrary user data.

Key Features

Universal Dataset Support: Upload any CSV/Excel. The system automatically identifies Target vs. Features and Text vs. Numbers.
Robust Preprocessing:

Automated Imputation (Mean for numerical, Mode for categorical).

Dynamic One-Hot Encoding and Standardization/Normalization.
Algorithm "Zoo":

Standard: Logistic Regression & Decision Trees (Scikit-Learn).

Custom Implementation: A MulticlassLogistic regression built from scratch using Numpy to demonstrate the underlying mathematics (Gradient Descent, Cross-Entropy Loss).
Interactive Visualizations:

Plotly Heatmaps for Confusion Matrices.

Tree Structure Visualization for Decision Tree interpretability.

Feature Importance analysis.

🛠 Technical Architecture

it is engineered for stability and mathematical correctness.
The Preprocessing Pipeline (ColumnTransformer) To handle raw user data safely, I implemented a split-path pipeline using Scikit-Learn’s ColumnTransformer. This ensures that:
Numerical columns are isolated, imputed, and scaled (Standard or MinMax).
Categorical columns are isolated, imputed, and One-Hot Encoded.
The paths merge back together into a generic NumPy array ready for training.

Architecture Snippet

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_cols),
        ('cat', categorical_transformer, categorical_cols)
    ], remainder='drop'
)

Custom ML Implementation (The "From Scratch" Engine) While Scikit-Learn is used for production reliability, I implemented a Custom Logistic Regression class to demonstrate an understanding of the core algorithms.
Optimization: Batch Gradient Descent.
Loss Function: Log-Loss (Binary Cross-Entropy) with L1/L2 Regularization support.
Multiclass Strategy: One-vs-Rest (OvR) wrapper around the binary classifier to handle multi-class datasets (e.g., Iris).
Initialization: Xavier/Glorot Initialization for weight stability.

📸 Visuals & Insights

The application prioritizes interpretability:

Decision Tree Plotting: Uses matplotlib to render the actual tree structure, allowing users to trace the decision logic of the model.
Interactive Confusion Matrix: A Plotly heatmap that allows users to hover over specific errors to understand False Positives/Negatives.
Exploratory Data Analysis (EDA): Automatic correlation heatmaps and target distribution checks before training begins.

⚙️ Installation & Usage

Prerequisites: Python 3.8+

Clone the Repository

git clone https://github.com/yourusername/no-code-ml-pipeline.git
cd no-code-ml-pipeline

Install Dependencies

pip install -r requirements.txt

Run the Application streamlit run app.py How to Use

Upload a .csv file (e.g., Titanic, Iris, or Customer Churn).
Select your Target Column in the sidebar.
Adjust Preprocessing (Scaling) and Model Hyperparameters.

Click Run Pipeline.

🧠 Design Philosophy (Why Streamlit?) As an ML Engineer, my primary focus is on the model architecture and data integrity rather than frontend boilerplate.

I chose Streamlit because it allows for rapid prototyping of data applications. It enables the creation of a clean, functional UI while keeping the codebase 100% Python, facilitating easier integration with backend ML libraries like PyTorch, TensorFlow, or Scikit-Learn in the future.

This approach mimics real-world industry workflows where ML Engineers build "Proof of Concept" (PoC) apps to demonstrate model value to stakeholders before handing off to full-stack teams for scaling.

🚀 Future Roadmap [ ] Model Persistence: Add functionality to download the trained model as a .pkl file.

[ ] Deep Learning Support: Add a simple Neural Network builder (PyTorch) for more complex datasets.

[ ] Auto-Model Selection: Implement a "Search" feature that runs multiple models and picks the best one automatically (AutoML).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
nighter		nighter
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML-pipeline-streamlit

Key Features

Feature Importance analysis.

Architecture Snippet

📸 Visuals & Insights

⚙️ Installation & Usage

About

Uh oh!

Releases

Packages

Languages

kapooraditi79/ML-pipeline-streamlit

Folders and files

Latest commit

History

Repository files navigation

ML-pipeline-streamlit

Key Features

Feature Importance analysis.

Architecture Snippet

📸 Visuals & Insights

⚙️ Installation & Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages