You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you're not sure where your working directory is, you can find out with the getwd() command. Alternatively, you can view/change it through the Tools > Global Options menu in R Studio.
22
+
If you're not sure where your working directory is, you can find out with the `getwd()` command. Alternatively, you can view/change it through the Tools > Global Options menu in R Studio.
23
23
24
24
So assuming you've unzipped the file into your R directory, you should have a folder called diet_data. In that folder there are five files. Let's get a list of those files:
25
25
@@ -47,7 +47,7 @@ Alternatively, you could look at the dimensions of the data.frame:
47
47
dim(andy)
48
48
```
49
49
50
-
This tells us that we 30 rows of data in 4 columns. There are some other commands we might want to run to get a feel for a new data file, str(), summary(), and names().
50
+
This tells us that we 30 rows of data in 4 columns. There are some other commands we might want to run to get a feel for a new data file, `str()`, `summary()`, and `names()`.
Or, we could use the subset function to do the same thing:
76
+
Or, we could use the `subset()` function to do the same thing:
77
77
```{r}
78
78
subset(andy$Weight, andy$Day==30)
79
79
```
@@ -96,9 +96,9 @@ andy_loss
96
96
97
97
Andy lost 5 pounds over the 30 days. Not bad. What if we want to look at other subjects or maybe even everybody at once?
98
98
99
-
Let's look back to the list.files() command. It returns the contents of a directory in alphabetical order. You can type '?list.files' at the R prompt to learn more about the function.
99
+
Let's look back to the `list.files()` command. It returns the contents of a directory in alphabetical order. You can type `?list.files` at the R prompt to learn more about the function.
100
100
101
-
Let's take the output of list.files() and store it:
101
+
Let's take the output of `list.files()` and store it:
102
102
```{r}
103
103
files <- list.files("diet_data")
104
104
files
@@ -116,9 +116,9 @@ Let's take a quick look at John.csv:
116
116
head(read.csv(files[3]))
117
117
```
118
118
119
-
Woah, what happened? Well, John.csv is sitting inside the diet_data folder. We just tried to run the equivalent of read.csv("John.csv") and R correctly told us that there isn't a file called John.csv in our working directory. To fix this, we need to append the directory to the beginning of the file name.
119
+
Woah, what happened? Well, John.csv is sitting inside the diet_data folder. We just tried to run the equivalent of `read.csv("John.csv")` and R correctly told us that there isn't a file called John.csv in our working directory. To fix this, we need to append the directory to the beginning of the file name.
120
120
121
-
One approach would be to use paste() or sprintf(). However, if you go back to the help file for list.files(), you'll see that there is an argument called full.names that will append (technically prepend) the path to the file name for us.
121
+
One approach would be to use `paste()` or `sprintf()`. However, if you go back to the help file for `list.files()`, you'll see that there is an argument called `full.names` that will append (technically prepend) the path to the file name for us.
Cool. We now have a data frame called 'dat' with all of our data in it. Out of curiousity, what would happen if we had put dat <- data.frame() inside of the loop? Let's see:
180
+
Cool. We now have a data frame called 'dat' with all of our data in it. Out of curiousity, what would happen if we had put `dat <- data.frame()` inside of the loop? Let's see:
181
181
182
182
```{r}
183
183
for (i in 1:5) {
@@ -188,18 +188,18 @@ str(dat2)
188
188
head(dat2)
189
189
```
190
190
191
-
Because we put dat2 <- data.frame() inside of the loop, dat2 is being rewritten with each pass of the loop. So we only end up with the data from the last file in our list.
191
+
Because we put `dat2 <- data.frame()` inside of the loop, `dat2` is being rewritten with each pass of the loop. So we only end up with the data from the last file in our list.
192
192
193
193
194
-
Back to dat... So what if we wanted to know the median weight for all the data? Let's use the median() function.
194
+
Back to `dat`... So what if we wanted to know the median weight for all the data? Let's use the `median()` function.
195
195
196
196
```{r}
197
197
median(dat$Weight)
198
198
```
199
199
200
200
NA? Why did that happen? Type 'dat' into the console and you'll see a print out of all 150 obversations. Scroll back up to row 77, and you'll see that we have some missing data from John, which is recorded as NA by R.
201
201
202
-
We need to get rid of those NA's for the purposes of calculating the median. There are several approaches. For instance, we could subset the data using complete.cases() or is.na(). But if you look at '?median', you'll see there is an argument called 'na.rm' that will strip the NA values out for us.
202
+
We need to get rid of those NA's for the purposes of calculating the median. There are several approaches. For instance, we could subset the data using `complete.cases()` or `is.na()`. But if you look at `?median`, you'll see there is an argument called `na.rm` that will strip the NA values out for us.
203
203
```{r}
204
204
median(dat$Weight, na.rm=TRUE)
205
205
```
@@ -217,13 +217,13 @@ Let's start out by defining what the arguments of the function should be. These
217
217
218
218
So our function is going to start out something like this:
219
219
220
-
weightmedian <- function(directory, day) {
220
+
`weightmedian <- function(directory, day) {
221
221
# content of the function
222
-
}
222
+
}`
223
223
224
-
So what goes in the content? Let's think through it logically. We need a data frame with all of the data from the CSV's. We'll then subset that data frame using the argument 'day' and take the median of that subset.
224
+
So what goes in the content? Let's think through it logically. We need a data frame with all of the data from the CSV's. We'll then subset that data frame using the argument `day` and take the median of that subset.
225
225
226
-
In order to get all of the data into a single data frame, we can use the method we worked through earlier using list.files and rbind.
226
+
In order to get all of the data into a single data frame, we can use the method we worked through earlier using `list.files()` and `rbind()`.
227
227
228
228
Essentially, these are all things that we've done in this example. Now we just need to combine them into a single function.
0 commit comments