Alison MacLellan Created 2014-05-25
Human Activity Recognition Using Smartphones Dataset Version 1.0
Jorge L. Reyes-Ortiz, Davide Anguita, Alessandro Ghio, Luca Oneto. Smartlab - Non Linear Complex Systems Laboratory DITEN - Università degli Studi di Genova. Via Opera Pia 11A, I-16145, Genoa, Italy. [email protected] www.smartlab.ws
The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.
The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain. See 'features_info.txt' for more details.
- Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
- Triaxial Angular velocity from the gyroscope.
- A 561-feature vector with time and frequency domain variables.
- Its activity label.
- An identifier of the subject who carried out the experiment.
-
'README.txt'
-
'features_info.txt': Shows information about the variables used on the feature vector.
-
'features.txt': List of all features.
-
'activity_labels.txt': Links the class labels with their activity name.
-
'train/X_train.txt': Training set.
-
'train/y_train.txt': Training labels.
-
'test/X_test.txt': Test set.
-
'test/y_test.txt': Test labels.
-
'train/subject_train.txt': Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30.
-
'train/Inertial Signals/total_acc_x_train.txt': The acceleration signal from the smartphone accelerometer X axis in standard gravity units 'g'. Every row shows a 128 element vector. The same description applies for the 'total_acc_x_train.txt' and 'total_acc_z_train.txt' files for the Y and Z axis.
-
'train/Inertial Signals/body_acc_x_train.txt': The body acceleration signal obtained by subtracting the gravity from the total acceleration.
-
'train/Inertial Signals/body_gyro_x_train.txt': The angular velocity vector measured by the gyroscope for each window sample. The units are radians/second.
- Features are normalized and bounded within [-1,1].
- Each feature vector is a row on the text file.
For more information about this dataset contact: [email protected]
Use of this dataset in publications must be acknowledged by referencing the following publication [1]
[1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012
This dataset is distributed AS-IS and no responsibility implied or explicit can be addressed to the authors or their institutions for its use or misuse. Any commercial use is prohibited.
Jorge L. Reyes-Ortiz, Alessandro Ghio, Luca Oneto, Davide Anguita. November 2012.
The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.
Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).
Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).
'-XYZ' is used to denote 3-axial signals in the X, Y and Z directions.
- tBodyAcc-XYZ
- tGravityAcc-XYZ
- tBodyAccJerk-XYZ
- tBodyGyro-XYZ
- tBodyGyroJerk-XYZ
- tBodyAccMag
- tGravityAccMag
- tBodyAccJerkMag
- tBodyGyroMag
- tBodyGyroJerkMag
- fBodyAcc-XYZ
- fBodyAccJerk-XYZ
- fBodyGyro-XYZ
- fBodyAccMag
- fBodyAccJerkMag
- fBodyGyroMag
- fBodyGyroJerkMag
- mean(): Mean value
- std(): Standard deviation
- mad(): Median absolute deviation
- max(): Largest value in array
- min(): Smallest value in array
- sma(): Signal magnitude area
- energy(): Energy measure. Sum of the squares divided by the number of values.
- iqr(): Interquartile range
- entropy(): Signal entropy
- arCoeff(): Autorregresion coefficients with Burg order equal to 4
- correlation(): correlation coefficient between two signals
- maxInds(): index of the frequency component with largest magnitude
- meanFreq(): Weighted average of the frequency components to obtain a mean frequency
- skewness(): skewness of the frequency domain signal
- kurtosis(): kurtosis of the frequency domain signal
- bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window.
- angle(): Angle between to vectors.
Additional vectors obtained by averaging the signals in a signal window sample. These are used on the angle() variable:
- gravityMean
- tBodyAccMean
- tBodyAccJerkMean
- tBodyGyroMean
- tBodyGyroJerkMean
The complete list of variables of each feature vector is available in 'features.txt'
Read in the appropriate reference files
features <- read.table("UCI HAR Dataset/features.txt") # get features
activityLabels <- read.table("UCI HAR Dataset/activity_labels.txt") # get activity labels
Process the Training (Train) Data adding column names from the features, the subject, and activity, merging it all into the table Train.
xTrain <- read.table("UCI HAR Dataset/train/X_train.txt") #7352
yTrain <- read.table("UCI HAR Dataset/train/y_train.txt") #7352 Activity
colnames(xTrain) <-features$V2 # features
subjectTrain <- read.table("UCI HAR Dataset/train/subject_train.txt")
colnames(subjectTrain)<-"subject"
colnames(yTrain)<-"activity"
Train <- cbind(xTrain, subjectTrain, yTrain)
Do the same with the Testing (Test) data.
xTest <- read.table("UCI HAR Dataset/test/X_test.txt") #2947
yTest <- read.table("UCI HAR Dataset/test/y_test.txt") #2947 Activity
colnames(xTest) <- features$V2 #features
subjectTest <- read.table("UCI HAR Dataset/test/subject_test.txt")
colnames(subjectTest)<-"subject"
colnames(yTest)<-"activity"
Test <- cbind(xTest, subjectTest, yTest)
Merge the Test and Train into one data table called oneSet and add the appropriate column names.
oneSet <- rbind(Train, Test)
colnames(oneSet) <-colnames(Train)
We want to only use the standard deviation and mean values. I pulled out the column number and name with the std() and mean() regular expressions
meanVectorN<-grep("mean\\(\\)", features$V2) # get the column numbers of the means
meanVector<-grep("mean\\(\\)", features$V2, value=TRUE) # get the values of the means
stdVectorN<-grep("std\\(\\)", features$V2) # get the column numbers of the std deviation
stdVector<-grep("std\\(\\)", features$V2, value=TRUE) # get the columna names values of the std deviation
extraColumns<-c("subject","activity")
These vectors are then used to pull the selected columns into a new data table.
selectedColumns <- oneSet[c(meanVector,stdVector,extraColumns)]
selectedColumnsNames <-colnames(selectedColumns)
Now I expanded the column names to a more easily read value using gsub
selectedColumnsNames <- gsub("^f", "fastfouriertransform", selectedColumnsNames)
selectedColumnsNames <- gsub("^t", "total", selectedColumnsNames)
selectedColumnsNames <- gsub("Body", "body", selectedColumnsNames)
selectedColumnsNames <- gsub("Gyro", "gyroscope", selectedColumnsNames)
selectedColumnsNames <- gsub("Acc", "accelerometer", selectedColumnsNames)
selectedColumnsNames <- gsub("Jerk", "jerk", selectedColumnsNames)
selectedColumnsNames <- gsub("Mag", "mag", selectedColumnsNames)
selectedColumnsNames <- gsub("Gravity", "gravityaccelerometer", selectedColumnsNames)
selectedColumnsNames <- gsub("\\-X", "\\-x", selectedColumnsNames)
selectedColumnsNames <- gsub("\\-Y", "\\-y", selectedColumnsNames)
selectedColumnsNames <- gsub("\\-Z", "\\-z", selectedColumnsNames)
selectedColumnsNames <- gsub("\\-", "", selectedColumnsNames)
selectedColumnsNames <- gsub("std\\(\\)", "standarddeviation", selectedColumnsNames)
selectedColumnsNames <- gsub("mean\\(\\)", "mean", selectedColumnsNames)
colnames(selectedColumns) <- selectedColumnsNames
To make processing simpler, I put the data into a data.frame
dfSelectedColumns <-data.frame(selectedColumns)
Now I compute the averages for the values by subject and activity
avgs <- aggregate(dfSelectedColumns, list(dfSelectedColumns$subject, dfSelectedColumns$activity), mean, na.rm=TRUE)
I then merge the activity with the descritive value of the activity, example 1 = Walking
key <- data.frame(activityLabels)
colnames(key) <- c("activity","activityvalue")
avgs <- merge(avgs, key, by="activity")
Before writing to the file, I removed the columns added from the aggregate.
- totalbodyaccelerometermeanx
- totalbodyaccelerometermeany
- totalbodyaccelerometermeanz
- totalgravityaccelerometeraccelerometermeanx
- totalgravityaccelerometeraccelerometermeany
- totalgravityaccelerometeraccelerometermeanz
- totalbodyaccelerometerjerkmeanx
- totalbodyaccelerometerjerkmeany
- totalbodyaccelerometerjerkmeanz
- totalbodygyroscopemeanx
- totalbodygyroscopemeany
- totalbodygyroscopemeanz
- totalbodygyroscopejerkmeanx
- totalbodygyroscopejerkmeany
- totalbodygyroscopejerkmeanz
- totalbodyaccelerometermagmean
- totalgravityaccelerometeraccelerometermagmean
- totalbodyaccelerometerjerkmagmean
- totalbodygyroscopemagmean
- totalbodygyroscopejerkmagmean
- fastfouriertransformbodyaccelerometermeanx
- fastfouriertransformbodyaccelerometermeany
- fastfouriertransformbodyaccelerometermeanz
- fastfouriertransformbodyaccelerometerjerkmeanx
- fastfouriertransformbodyaccelerometerjerkmeany
- fastfouriertransformbodyaccelerometerjerkmeanz
- fastfouriertransformbodygyroscopemeanx
- fastfouriertransformbodygyroscopemeany
- fastfouriertransformbodygyroscopemeanz
- fastfouriertransformbodyaccelerometermagmean
- fastfouriertransformbodybodyaccelerometerjerkmagmean
- fastfouriertransformbodybodygyroscopemagmean
- fastfouriertransformbodybodygyroscopejerkmagmean
- totalbodyaccelerometerstandarddeviationx
- totalbodyaccelerometerstandarddeviationy
- totalbodyaccelerometerstandarddeviationz
- totalgravityaccelerometeraccelerometerstandarddeviationx
- totalgravityaccelerometeraccelerometerstandarddeviationy
- totalgravityaccelerometeraccelerometerstandarddeviationz
- totalbodyaccelerometerjerkstandarddeviationx
- totalbodyaccelerometerjerkstandarddeviationy
- totalbodyaccelerometerjerkstandarddeviationz
- totalbodygyroscopestandarddeviationx
- totalbodygyroscopestandarddeviationy
- totalbodygyroscopestandarddeviationz
- totalbodygyroscopejerkstandarddeviationx
- totalbodygyroscopejerkstandarddeviationy
- totalbodygyroscopejerkstandarddeviationz
- totalbodyaccelerometermagstandarddeviation
- totalgravityaccelerometeraccelerometermagstandarddeviation
- totalbodyaccelerometerjerkmagstandarddeviation
- totalbodygyroscopemagstandarddeviation
- totalbodygyroscopejerkmagstandarddeviation
- fastfouriertransformbodyaccelerometerstandarddeviationx
- fastfouriertransformbodyaccelerometerstandarddeviationy
- fastfouriertransformbodyaccelerometerstandarddeviationz
- fastfouriertransformbodyaccelerometerjerkstandarddeviationx
- fastfouriertransformbodyaccelerometerjerkstandarddeviationy
- fastfouriertransformbodyaccelerometerjerkstandarddeviationz
- fastfouriertransformbodygyroscopestandarddeviationx
- fastfouriertransformbodygyroscopestandarddeviationy
- fastfouriertransformbodygyroscopestandarddeviationz
- fastfouriertransformbodyaccelerometermagstandarddeviation
- fastfouriertransformbodybodyaccelerometerjerkmagstandarddeviation
- fastfouriertransformbodybodygyroscopemagstandarddeviation
- fastfouriertransformbodybodygyroscopejerkmagstandarddeviation
The final two columns is the subject and the activity