This repository contains the code for the Calories Burnt Prediction competition, where the objective is to predict the 'Calories' burnt based on various features. The dataset was generated from a deep learning model trained on an original Calories Burnt Prediction dataset.
The datasets (train.csv
, test.csv
, sample_submission.csv
) are provided within the competition.
-
train.csv
: The training dataset, with 'Calories' as the continuous target variable. -
test.csv
: The test dataset for which predictions need to be made. -
sample_submission.csv
: A sample submission file showing the expected format.
The feature distributions are close to, but not exactly the same as, the original dataset. You may explore incorporating the original dataset if it improves model performance.
The main script calories_prediction.py
(or the equivalent code provided) performs the following steps:
-
Loads Data: Reads
train.csv
,test.csv
, andsample_submission.csv
. -
Data Preprocessing:
-
Separates features and the target variable.
-
Identifies numerical and categorical features.
-
Applies
StandardScaler
to numerical features for normalization. -
Applies
OneHotEncoder
to categorical features for conversion into a numerical format suitable for machine learning models. -
Uses
ColumnTransformer
to apply these different transformations simultaneously.
-
-
Model Training:
-
Utilizes a
RandomForestRegressor
for the prediction task. -
Constructs a
Pipeline
to chain the preprocessing steps and the regressor. -
Trains the model on the preprocessed training data.
-
-
Prediction:
-
Makes predictions on the preprocessed test set.
-
Ensures that predicted calorie values are non-negative.
-
-
Submission File Generation:
- Creates a
submission.csv
file in the required format, containingid
and predictedCalories
values.
- Creates a
To run this project, you need to have Python installed along with the following libraries.
-
Clone the repository (or save the provided script): If this code were in a GitHub repository, you would clone it:
git clone <repository_url> cd <repository_name>