Getting and Cleaning Data Course Project

Getting and Cleaning Data Course Project. The goal is to prepare tidy data that can be used for later analysis. In order to accomplish the assigment I do the following steps.

Read Data

From features I read the files:

"activity_labels.txt": Links the class labels with their activity name.
'features.txt': List of all features.

Define col_names: a vector (V1, V2, ..., V561) for reading X_test/train.txt file

From test I read the equivalent files:

"test/subject_test.txt": Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30. Name: V562.
"test/X_test.txt": Test set.
"test/y_test.txt": Test labels. Name: V563.

From train I read the equivalent files:

"train/subject_train.txt": Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30. Name: V562.
"train/X_train.txt": Training set.
"train/y_train.txt": Training labels. Name: V563.

Merge Data Sets

From "test"" I join the 3 files with cbind function. First the Test set, then the subject and finally the label. I do the same with the 3 "train" files. With rbind function I make the data_set.

Extracts the measurements on the mean and standard deviation

From the features table I obtain the index for all the features matching with "mean" or "std". From the same variable I also obtain the index for "meanFreq" to remove from the "mean" index. With the cbind function I select the columns from data_set refering to the calculated index plus the subject (V562) and id_Activity (V563) to form the data_mean_std_set.

Uses descriptive activity names

I add a new column to data_mean_std_set, V564, with the labels of the activities obtained from the activity_labels table.

Appropriately labels the data set with descriptive variable names

I select the same names from "feature.txt" to each column from 1 to 561. As names from "features.txt" have "-" and "()" characters if we call them directly R interpret them as operators and functions. So we should refer the with "". "subject", "id_Activity" and "activity" are the labels for the 3 extra columns.

Tidy data set with the average of each variable for each activity and each subject

With the agreggate function I construct the tidy data. In each column we have the mean of each variable for each activity and each subject. There are 180 rows = 30 subjects * 6 activities. There are 69 columns, 66 variable columns, 1 subject column, 1 id_Activity column and 1 activity column with the label name.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
codebook.md		codebook.md
output.txt		output.txt
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting and Cleaning Data Course Project

Read Data

Merge Data Sets

Extracts the measurements on the mean and standard deviation

Uses descriptive activity names

Appropriately labels the data set with descriptive variable names

Tidy data set with the average of each variable for each activity and each subject

About

Uh oh!

Releases

Packages

Uh oh!

pinap28/Getting_Cleaning_Data_Project

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data Course Project

Read Data

Merge Data Sets

Extracts the measurements on the mean and standard deviation

Uses descriptive activity names

Appropriately labels the data set with descriptive variable names

Tidy data set with the average of each variable for each activity and each subject

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages