Overview of DSI-9 major projects:
-
- Sourcing own data (211 hours of audio, 25K tracks, 17K album reviews, 151K user artist follows) to build music recommender
- Concepts: NLP (Topic Modeling/LDA), Audio Engineering (via Librosa), Recommenders (Collaborative-Filtering, Content-Based), Scraping (BS4, Selenium), Data Merging (Fuzzy Matching, Imputation Strategies), Neural Network Optimization
-
- Using a variety of modeling technique to minimize RMSE for Ames, IA house prices as part of internal Kaggle competition
- Concepts: Regression, Regularization, Gridsearching, Ensembling
-
- Leveraging NLP techniques to classify whether or not a post is from r/Jokes v r/DadJokes
- Programmatically building a validating 80+ models to find optimal combination of data, model, and hyperparameters
- Concepts: Cvec/Tfidf Vectorization, Lemmatization/Stemming, Logistic Regression, Ensembling
-
- Utitlizing FEMA data to deep dive into socioeconomic and disaster relief datasets
- Construction of a corresponding Tableau dashboard to house model output and build prototype views for real life client
- Concepts: Regression, Regularization, Tableau Visualization