Repository containing portfolio of data science projects completed by me, presented in the form of iPython Notebooks. Some of the interesting repositories that highlight my interests, exposure and proficiency.
Data is being recorded from various condition monitoring sensors in a manufacturing plant. There are hundreds of such sensors which may be impacting the quality of final product which is getting manufactured. Given the past data, task is to predict the target variable (could be an efficiency factor - though is it not explicitly mentioned)
Modelling was done with different approaches to feature engineering and different models and comparison done.
The price of an airline tickets is predicted based on the past data for several airlines. Effect of combination of various features like - Date of Journey, Route, Start city,Destination City,Airline etc are modelled. Various models and encoding techniques for catagorical data have been tried.
A rental company aggregates rental providers and offers service to the customers to estimate/negotiate rental value for their potential listing. Various property attributes for the current listings have been provided as the past data. Task is to predict the rental value for the target.
Correlated features have been identified and removed. Different labelling techniques have been tried. New feature 'distance from city center' derived from latitude,longitude value is added and tested.
Sentiment analysis of text data based on the input dataset is done. Various vectorization methods and machine learning models are compared. There are two notebooks: one uses TfidfVectorizer,CountVectorizer and other uses Word2Vec with Nuera Network using Keras.
Given the results of previous ad-compaigns need to predict(classify) there would be net gain in running the campaign. Various classification algorithms have been tried. LogisticClassfier model with parameter class_wight='balanced' set had the best resuts on the training data: However SVC model (with 'rbf' kernel) had the best resuts on test data:
Given the retail billing data like InvoiceNo,StockCode,Description,Quantity,UnitPrice,CustomerID,Country, the problem is to estimate the retail price on the test data.
Predict Insurance cost based on past Insurance Claim Data. Initial Exploratory Analysis and Modelling done to arrive at model selections based on experiments with pipelines and gridsearch; Also using comet.ml for running experiments and hyperparameter tuning.
A passionate engineering professional with wide range of experience in architecture, data engineering/ingestion, data warehousing, Machine Learning, predictive modelling including Bigdata and cloud technologies.
- linkedin{:target="_blank"}
- PREDICTING STRENGTH DEVELOPMENT BY CEMENT ADMIXTURE BASED ON WATER CONTENT{:target="_blank"}
- A Simplified Method for Assessment of Volume Change Behavior of Rockfill Material," Geotechnical Testing Journal, Vol. 21, No. 2{:target="_blank"}
- Python for Data Science and Machine Learning-Udemy{:target="_blank"}
- AWS SageMaker Hands-on-Training Udemy{:target="_blank"}
- AWS Certified Big Data Professional{:target="_blank"}
- AWS Certified Machine Learning Specialty - Udemy{:target="_blank"}
- Understanding Google Charts-Udemy{:target="_blank"}
- Azure BigData Certification - Udemy{:target="_blank"}
- Apache Kafka - Udemy{:target="_blank"}
- H2o-Automl{:target="_blank"} and H2o quickstart{:target="_blank"}
- Choosing right structure for your Data Team{:target="_blank"}
- Practical Deep Learning for Coders-fastai{:target="_blank"}
- Feaure Selection For Machine Learning{:target="_blank"}
- Data Transformation - Standardization vs Normaliation{:target="_blank"}
- Transition from pandas -> PySpark{:target="_blank"} using Koalas{:target="_blank"}
- Leverage In-Database ML with MPP Database{:target="_blank"} and the Python API
- Microsoft Learn AI Fundamentals Exercises{:target="_blank"}
- Microsoft Azure AI Fundamentals Exercises{:target="_blank"}
- Open Source Platform for ML lifecycle Management{:target="_blank"}
- LAMA - An Automatic Model Creation Framework{:target="_blank"}
- Deploying the model on Kubernetes{:target="_blank"}
- Azure ML Studio{:target="_blank"}
- PyArrow for Parquet Files{:target="_blank"}
- 7 Mistakes in Using Apache Kafka{:target="_blank"}
- Event Driven Microservices with Kafka{:target="_blank"}
- Good Bad Ugly of Spark{:target="_blank"}
- Spark Best Practices for Datascience{:target="_blank"}
- PCA with AWS Sagemaker, algorithms and tradeoff{:target="_blank"}
- t-Distributed Stochastic Neighbor Embedding{:target="_blank"}
- Scalars - when to use what{:target="_blank"}
- Gradient Descent Optimiers{:target="_blank"}
- ROC - AUC{:target="_blank"}
VIDEOS
- Statistical Learning with Big Data{:target="_blank"}
- Decision Trees{:target="_blank"}
- Ensembles{:target="_blank"}
- Gradient Boosting Machine Learning{:target="_blank"})
- MIT Introduction to Deep Learning{:target="_blank"}
- MIT CNN{:target="_blank"}
- MIT RNN{:target="_blank"}
- GANs for Good - Panel Discussion{:target="_blank"}
- Nuts and Bolts of Deep Learning{:target="_blank"}
- Transfer Learning{:target="_blank"})
- Amazon Sagemaker Deep Dive{:target="_blank"})
- Train custom models on Sagemaker{:target="_blank"}
- HyperParameter tuning for Deep Learning in AWS{:target="_blank"}
- SageMaker Studio{:target="_blank"}