This is the project my team and I had submitted for my Introduction to Machine Learning Class for Fall 2022. I was responsible for the tasks done on the startup valuation dataset and the breast cancer dataset
Data Set Declaration:
Spotify Tracks: https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset
This is a tabular data set has 20 features and 14000 samples. In this dataset there are 114 genres of music as well. Task: to be able to predict a song between 5 different genres of music: acoustic, dance, k-pop, metal, RnB.
Twitter Stock Data Set: ttps://www.kaggle.com/datasets/maharshipandya/twitter-stocks-dataset
Type: Regression - Time series (predicting next stock price) Motivation and Description: This data set contains information about Twitter's stock. We thought that this would be a nice way of predicting if the stock price would rise or fall and to predict how much it would change by. This is a time series data set that contains 7 features that describe how the stock is being changed. This includes the Date, opening price ,high, low, Adjusted close price, and the volume.
Breast Cancer: https://www.kaggle.com/code/lbronchal/breast-cancer-dataset-analysis/data
This is a tabular data set has 32 features and 570 samples. Luckily for us there are no missing values in this data set, and there is one categorical variable that is the diagnosis. Task: Predict whether the cell is benign or malignant
Startup Valuation: https://www.kaggle.com/datasets/manishkc06/startup-success-prediction
Tabular Dataset with a total of 923 samples, and 48 features with the dependent feature being the startup status Task: given data set our goal is to predict if a startup will be successful (acquired) or (closed)