This repository contains implementation of various machine learning algorithms based on QMSS 4058 and STAT 5241 courses at Columbia. All R code is accompanied by explanatory narration.
EM for Image Segmentation: For an image segmentation task, implementing the EM algorithm (mixture model of multinomials) from scratch and visualizing the result.
Optimization, PCR, Classification: Assignment tasks are to optimize a loss function in R; run a PCR model; and explore the best model for classifying a binary outcome.
Smoothing, Trees: Assignment tasks are to fit a generalized additive regression model (GAM); run and compare different tree-based classification models (Bagging, Boosting, Random Forest).
Neural Nets, bartMachine: Assignment tasks are to fit a neural networks model by varying the number of hidden layers; run and compare this and other models (including bartMachine) in prediction accuracy, based on Mean Squared Error.
Predictive Classification (Final Project): Final project for QMSS 4058: Data Mining. Data set is from The Second International Knowledge Discovery and Data Mining Tools Competition. The task is a classification problem with the goal to estimate the response rate (donate vs no donate) to a direct mailing program. Collaborator: Arnold Lau. Data and info here: http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html