README

The following section describes the work that has been performed to clean up the raw data.

Training and test data sets have been combined using rbind. Similarly, labels and subjects of training and test data have been combined.

# Read training data
train <- read.table("UCI HAR Dataset/train/X_train.txt")
trainActLabels <- read.table("UCI HAR Dataset/train/y_train.txt")
trainSubjects <- read.table("UCI HAR Dataset/train/subject_train.txt")

# Read test data
test <- read.table("UCI HAR Dataset/test/X_test.txt")
testActLabels <- read.table("UCI HAR Dataset/test/y_test.txt")
testSubjects <- read.table("UCI HAR Dataset/test/subject_test.txt")

# Combine train and test data
all <- rbind(train, test)
allActLabels <- rbind(trainActLabels, testActLabels)
allSubjects <- rbind(trainSubjects, testSubjects)

The resulting three data sets have been combined with cbind to create one data set.

all <- cbind(all, allActLabels, allSubjects)

Names of the features have been added as column names

# Get feature names from features.txt
features <- read.table("UCI HAR Dataset/features.txt")
# Add column names
names(all) <- c(as.character(features[,2]), "ActivityLabels", "Subjects")

Columns that have the words "mean" or "std" were identified using grep and the remaining measurements have been removed.

# Find columns that contain "mean" or "std" in their names
meanCol <- grep("mean", names(all))
stdCol <- grep("std", names(all))
# Use unique just in case there are duplicates. Keep last two columns for activity labels and subjects
col <- unique(c(meanCol, stdCol, ncol(all)-1, ncol(all))) 

df <- all[,col]

Activity numbers have been replaced with names by merging the data frame from the previous step with "activity_labels.txt".

actLabels <- read.table("UCI HAR Dataset/activity_labels.txt")
df <- merge(df, actLabels, by.x = "ActivityLabels", by.y="V1")
# Matched columns (w/ activity numbers) becomes the first column, delete it
df <- df[,-1]
# Change the new activity labels column name
names(df)[ncol(df)] <- "ActivityLabels"

Using reshape2 library the data frame has been melted to have rows by subject and activity. Then dcasted to take average of each variable for each activity and each subject. Last, the final data frame has been written to .txt file.

library(reshape2)
allMelted <- melt(df, id=c("Subjects", "ActivityLabels"))
# Taking the average of all values for each activity and subject combination.
df2 <- dcast(allMelted, Subjects + ActivityLabels ~ variable, mean)

write.table(df2, "df2.txt", row.name=FALSE)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
CodeBook.Rmd		CodeBook.Rmd
CodeBook.html		CodeBook.html
CodeBook.md		CodeBook.md
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
df2.txt		df2.txt
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README

About

Uh oh!

Releases

Packages

Languages

onurece/CourseProject

Folders and files

Latest commit

History

Repository files navigation

README

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages