0% found this document useful (0 votes)

123 views

02b Data Structures Datasets

This document provides tutorials and code examples on data structures in R. It begins by listing various tutorial links on data structures like vectors, matrices, and data frames. It then demonstrates how to create vectors using functions like c(), seq(), and rep(). Examples are given for numeric, character, date, and logical vectors. The document also shows how to reference, subset, filter, sort, and perform vectorized operations on data structures in R.

Uploaded by

Alexandra Gabriela Grecu

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views

02b Data Structures Datasets

Uploaded by

Alexandra Gabriela Grecu

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 96

Al.I.

Cuza University of Iai

Faculty of Economics and Business Administration
Department of Accounting, Information Systems and
Statistics

Data Analysis & Data

Science with R
Data structures in R.
Build-in Datasets
By Marin Fotache

Data structures in R

Tutorials (and code) on Data

Structures

Data structures (Advanced R by Hadley Wickham)

http://adv-r.had.co.nz/Data-structures.html

1.2 Variables (Variables and Data Structures)

https://www.youtube.com/watch?v=DG7YNf8kb3w

2 - Introduction to R : Atomic Classes

https://www.youtube.com/watch?v=271FKAYavYE
http://repidemiology.wordpress.com/introduction-to-r-code/

1.3 Vectors (Variables and Data Structures)

https://www.youtube.com/watch?v=QygSZw77Hs8

3- Introduction to R : Vectors

https://www.youtube.com/watch?v=MGphwmXCCgM#t=12
http://repidemiology.wordpress.com/introduction-to-r-code/

1.4 Matrices (Variables and Data Structures)

https://www.youtube.com/watch?v=UakyyZSyuZU

Tutorials on Data Structures (cont.)

1.5

Lists and Data Frames (Variables and Data Structures)

https://www.youtube.com/watch?v=U6vbR4el3kQ
1.6 Logical Vectors and Operators (Variables and Data
Structures)
https://www.youtube.com/watch?v=GQb735O2qjc
4- Introduction to R : Matrix, List and Data Frame
https://www.youtube.com/watch?v=cEX4iXUPqoo
http://repidemiology.wordpress.com/introduction-to-r-code/
Common Data Structures in R
https://www.youtube.com/watch?v=q5YJUGTYUvI
Introduction to R Statistical Computing: Data Structures
https://www.youtube.com/watch?v=OZD4oLobjWM
Lecture 2b: Subsetting
https://www.youtube.com/watch?v=hWbgqzsQJF0&index=7&
list=PLjTlxb-wKvXNSDfcKPFH2gzHGyjpeCZmJ

R script associated with this

presentation
02b_data_structures__datasets.R

http://1drv.ms/1sYllLB

Vectors with c() function

Vectors

are one-dimensional arrays that can hold

numeric, character logical, or date/time/timestamp data
Most frequently function c() is used to declare/form the
vector
> x = c(1, 3, 5, 7, 25, -13, 47)
> x
[1]
1
3
5
7 25 -13 47
> y = c("one", "two", "three", "eight")
> y
[1] "one"
"two"
"three" "eight"
> z = c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE)
> z
[1] TRUE FALSE TRUE TRUE FALSE TRUE
The data in a vector must only be one type (numeric,
character, or logical)

Vectors of numbers with

sequences
Vectors

can also be created with a sequence

> ten_integers.1 <- 5:14

> ten_integers.1
[1] 5 6 7 8 9 10 11 12 13 14
or
> ten_integers.2 <- seq(from=5, to=14, by=1)
> ten_integers.2
[1] 5 6 7 8 9 10 11 12 13 14
Declare

a vector of descending numbers

> seq(from=5, to=-5, by=-1)

[1] 5 4 3 2 1 0 -1 -2 -3 -4 -5
Combine

sequences and c function

> a_vector <- c( 2:4, 8:14)

> a_vector
[1] 2 3 4 8 9 10 11 12 13 14

Vectors containing a range of

dates
Generating

a vector with dates between

September 29th and October 2nd 2014 as
"pure" dates

First solution:

> seq(as.Date("2014/09/29"), by = "day", length.out = 4)

Second solution:

> seq(as.Date("2014/09/29"), as.Date("2014/10/02"),

"days")

In both cases the result is:

[1] "2014-09-29" "2014-09-30" "2014-10-01" "201410-02"

Vectors containing a range of

timestamps
Generating

a vector with dates between

September 29th and October 2nd 2014 as
timestamps
First solution
> seq(c(ISOdate(2014,9,29)), by = "DSTday",
length.out = 4)
Second solution
> x <- as.POSIXct("2014-09-25 23:59:59",
tz="Turkey")
> format(seq(x, by="day", length.out=8),
"%Y-%m-%d %Z")
Third solution
> d1<-ISOdate(year=2014,month=9,day=25,tz="GMT")
> seq(from=d1,by="day",length.out=8)

Vectors generated from the

normal distribution
Vector

object named x contains five random

values drawn from the standard normal
distribution; values are not ordered
> x <- rnorm(5)
> x

[1] -0.2766566 0.7262000

-0.3409396 -0.5192846

0.5508588

Numbers

are extracted randomly, so that the

same function will draw other five numbers:
> x <- rnorm(5)
> x

[1] 1.9030714 -1.7139177 -0.2287666

0.8369275 0.4203014

Vectors created with function rep

(repeat)
Vector

x.rep contains a sequence of

numbers (5, 7, 11) repeated three times

> x.rep <- rep(c(5, 7, 11), 3)

> x.rep
[1] 5 7 11 5 7 11 5 7 11
See

the difference with version which uses

each clause:

> x.rep.2 <- rep(c(5, 7, 11), each=2,

times=3)
> x.rep.2
[1] 5 5 7 7 11 11 5 5 7 7 11 11
5 5 7 7 11 11

Example of built-in (system

defined) vectors
> Letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"
"o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N"
"O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

> month.name
[1] "January"
"June"
[10] "October"

"February"
"July"
"November"

"March"
"August"
"December"

"April"
"May"
"September"

> state.name
[1] "Alabama"
"Arkansas"
...

"Alaska"

> state.area
[1]
...

51609 589757 113909

53104

"Arizona"

Vectors of factors
Factors

are nominal variables whose values have a number of

levels
Very important in data analysis and visualization
Ex: two vectors:
student names
student genres
Both

vectors initially contain characters

> names <- c( "Popescu I. Valeria", "Ionescu V. Viorel",

+
"Genete I. Aurelia", "Lazar T. Ionut",
+
"Sadovschi V. Iuliana", "Dominte I. Nicoleta")
> genre <- c("Female", "Male", "Female", "Male",
+
"Female", "Female" )
> class(names)
[1] "character"
> class(genre)
[1] "character"

Vectors of factors (cont.)

> unclass(genre)
[1] "Female" "Male"
"Female" "Male"
"Female" "Female"
Genre can have only two values, so it is converted into a factor
> genre <- as.factor(genre)
> class(genre)
[1] "factor"
> unclass(genre)
[1] 1 2 1 2 1 1
attr(,"levels")
[1] "Female" "Male"
If

a non existing value is added in vector "genre", it is

automatically converted back into character

> genre <- c(genre, "Boy")

> class(genre)
[1] "character"
> unclass(genre)

Functions for getting vector

type and length

Class

returns elements data type; unclass returns the

values
> class(ten_integers.1)
[1] "integer"

> unclass(ten_integers.1)
[1] 5 6 7 8 9 10 11 12 13 14
Internally, factor levels are stored

as integers

> class(genre)
[1] "factor"

> unclass(genre)
[1] 1 2 1 2 1 1
attr(,"levels")
[1] "Female" "Male"

> typeof(genre)
[1] "integer"
Function length

returns the number of elements in a vector

> length(ten_integers.1)
[1] 10

Referencing vector elements

First

element in vector ten_integers.1

> ten_integers.1 [1]
[1] 5
Last element in vector ten_integers.1
> ten_integers.1 [length(ten_integers.1)]
[1] 14
First three elements in vector ten_integers.1
> ten_integers.1 [1:3]
[1] 5 6 7
Last three elements in vector
> ten_integers.1 [(length(ten_integers.1)-2) :
length(ten_integers.1)]
[1] 12 13 14
First, third, fifth and sixth elements
> ten_integers.1 [c(1, 3, 5, 6)]
[1] 5 7 9 10

Referencing vector elements

(cont.)
Indices

of elements can be qualified with other

vectors
Display first, third, fifth and sixth elements in
vector ten_integers.1
Vector ind contains indices for elements of
interest from vector ten_integers.1
> ind <- c(1, 3, 5, 6)
> ind
[1] 1 3 5 6
> ten_integers.1
[1]
Now

9 10 11 12 13 14

the result:

> ten_integers.1 [ind]

[1] 5 7 9 10

Excluding elements from a

vector
Basic

idea: R will exclude from a vector the

elements whose indices are negative
(prefixed by minus)

Excluding

first element:

> ten_integers.1 [-1]

[1]

Excluding

9 10 11 12 13 14

first three elements:

> ten_integers.1 [-(1:3)]

[1]

9 10 11 12 13 14

Excluding

first, third, and fourth elements:

> ten_integers.1 [-(c(1,3,4))]

[1]

9 10 11 12 13 14

Excluding elements from a vector

(cont.)
Excluding

first three elements and the 6 th

element and the 8th element

> ten_integers.1 [-(c(1:3,6,8))]

[1] 8 9 11 13 14

Excluding

the first two elements and

the last two elements of the vector:

> ten_integers.1 [-c((1:2),

(length(ten_integers.1)-1) :
length(ten_integers.1))]
[1] 7 8 9 10 11 12

Vector filtering
Filter

vector elements - select only elements

greater than 10

> ten_integers.1 [ten_integers.1 > 10]

[1] 11 12 13 14
How

many elementes are greater than 10 ?

> length(ten_integers.1 [ten_integers.1 > 10])

[1] 4
Display

INDICES of elements greater than 10

> which (ten_integers.1 > 10)

[1]

9 10

Filter

vector elements - select only elements

greater than 10 ver. 2

> ind <- which (ten_integers.1 > 10)

> ten_integers.1 [ind]

[1] 11 12 13 14

Sorting/ordering a vector
Initial

vector

> names <- c( "Popescu I. Valeria", "Ionescu V. Viorel",

+
"Genete I. Aurelia", "Lazar T. Ionut",
+
"Sadovschi V. Iuliana", "Dominte I. Nicoleta")
Sort

the vector elements in ascending (default) order

> names <- sort(names)

> names
[1] "Dominte I. Nicoleta" "Genete I. Aurelia"
"Ionescu V. Viorel"
"Lazar T. Ionut"
[5] "Popescu I. Valeria"
"Sadovschi V. Iuliana"
Sorting

the vector in descending order

> names.desc <- rev(sort(names))

> names.desc
[1] "Sadovschi V. Iuliana" "Popescu I. Valeria"
T. Ionut"
"Ionescu V. Viorel"
[5] "Genete I. Aurelia"
"Dominte I. Nicoleta"

"Lazar

R as a vectorized language
Lecture

2c: Vectorized Operations

https://www.youtube.com/watch?v=Fm8SORJQjPY&list=PLjTlx
b-wKvXNSDfcKPFH2gzHGyjpeCZmJ&index=8
Operations

are automatically applied on each element of the

vector without looping among vector elements

> num.vec.1 <- c(1, 3, 5, 7, 25, -13, 47)

> num.vec.2 <- num.vec.1 + 100
> num.vec.2
[1] 101 103 105 107 125 87 147
> date.vec.1 <- c ("2013-10-01", "2013-10-03", "2013-10-10")
For

the moment, elements are strings

> class(date.vec.1)
[1] "character"
as.Date()

converts all of the vector elements into dates

> date.vec.1 <- as.Date(date.vec.1)

> class(date.vec.1)
[1] "Date"

R as a vectorized language
(cont.)
Operations

can be applied on two or more vectors

> num.vec.3 <- num.vec.1 + num.vec.2

> num.vec.3
[1] 102 106 110 114 150 74 194
Compare

a vector with a value

> x
[1] -0.56757455 -0.90079348
> x >= 0
[1] FALSE FALSE TRUE FALSE
> x.1 <- x >= 0
> x.1
[1] FALSE FALSE TRUE FALSE
Testing

0.24397156 -0.51325283

0.03209287

TRUE

if at least one of the vector elements fulfils the predicate

> x
[1] -0.56757455 -0.90079348
> any(x > 0)
[1] TRUE

0.24397156 -0.51325283

0.03209287

R as a vectorized language
(cont.)
Testing

if all the vector elements fulfill the

predicate (function all)

> all(x > 0)

[1] FALSE
> all(x > -25)
[1] TRUE
For

a character vector, display the number of

characters for each element

> y
[] "one"
"two"
> nchar(y)
[1] 3 3 5 5
>

"three" "eight"

Naming vector elements

Provide

a name for each vector element

> num_ro = c (one = "unu", two="doi", three="trei",

four="patru")
> num_ro
one
two
three
four
"unu"
"doi" "trei" "patru"
The

same result can be accomplished with:

> num_ro = c ("unu", "doi", "trei", "patru")

> num_ro
[1] "unu"
"doi"
"trei" "patru"
> names(num_ro) = c ("one", "two", "three", "four")
> num_ro
one
two
three
four
"unu"
"doi" "trei" "patru"

Descriptive statistics on vectors

vector (age) containing the age of 10 persons

(Kabacoff, 2011)

> age = c(1,3,5,2,11,9,3,9,12,3)

Another

vector containing the weight of above people

> weight = c(4.4,5.3,7.2,5.2,8.5,7.3,6.0,10.4,10.2,6.1)

Suppose

above weights were in US metric system, we had

convert them from lbs into kg

> weight.kg <- weight * 0.454

Compute

the mean of people's weight

> mean(weight)

[1] 7.06
Compute

the standard deviation of people's weight

> sd(weight)

[1] 2.077498
Compute

correlation between age and weight

> cor(age,weight)

Matrices
Two-dimensional

arrays where each element has

the same type (numeric,character, or logical)
Created with the m atrix function. Format:
> Myymatrix <- matrix(vector,
nrow=number_of_rows,
ncol=number_of_columns, byrow=logical_value,
dimnames=list( char_vector_rownames,
char_vector_colnames))
vector contains the elements for the matrix
nrow and ncol specify the row and column dimensions
dimnames contains optional row and column labels stored in
character vectors.
byrow indicates whether the matrix should be filled in by row
(byrow=TRUE) or by column (byrow=FALSE); the default is by
column.

Matrices (cont.)
m.1

is a 5 x 4 matrix
> m.1 <- matrix(1:20, nrow=5, ncol=4)
> m.1
[,1] [,2] [,3] [,4]
[1,]
1
6
11
16
[2,]
2
7
12
17
[3,]
3
8
13
18
[4,]
4
9
14
19
[5,]
5
10
15
20
m.2

>
>
>
>
+

is a 2 x 2 matrix, filled by rows

cells <- c(1,26,24,68)
rownames <- c("Row1", "Row2")
colnames <- c("Col1", "Col2")
m.2 <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
dimnames=list(rownames, colnames))

Matrices (cont.)
Display

m.2

> m.2
Col1 Col2
Row 1 1 26
Row 2 24 68
m.3 is a 2 x 2 matrix, filled by columns
list is a data structure presented after data frame
> m.3 <- matrix(cells, nrow=2, ncol=2,
byrow=FALSE,
+ dimnames=list(rownames, colnames))
> m.3
Col1 Col2
Row 1 1 24
Row 2 26 68

Matrices (cont.)
m.4

is a 4 x 3 matrix, filled by rows

> m.4 <- matrix(1:12, nrow=4, ncol=3, byrow=TRUE)

> m.4
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
Naming

rows: row.1, row.2, ... and columns: col.1, col.2, ...

> dimnames(m.4)=list(paste("row.", 1:nrow(m.4), sep=""),
paste("col.", 1:ncol(m.4), sep=""))
> m.4
col.1 col.2 col.3
row .1
1
2
3
row .2
4
5
6
row .3
7
8
9
row .4 10 11 12

Accesing matrix elements

> m.1
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

Display the 3rd row

> m.1[3,]
[1] 3 8 13 18

Display the

3rd column

> m.1[,3]
[1] 11 12 13 14 15

Display the element

at the intersection of the 2nd

row and the 3rd column

> m.1 [2,3]

[1] 12

Accesing matrix elements

(cont.)
Display

two elements from the same row: m.1 [2,3]

and m.1[2,4]
> m.1 [2, c(3,4)]
[1] 12 17
Display three elements from the same column:
m.1[1,2], m1[2,2] and m.1[3,2]
> m.1 [c(1,2, 3), 2]
[1] 6 7 8
Display a "submatrix", from m1 [2,2] to m2[4.4]
> m.1 [ c(2,3,4), c(2,3,4)]
[,1] [,2] [,3]
[1,] 7 12 17
[2,] 8 13 18

Basic statistics on matrix

> m.4
col.1 col.2 col.3
row .1
1
2
3
row .2
4
5
6
row .3
7
8
9
row .4 10 11 12
Compute mean of all the cells in matrix m.4
> mean(m.4)
[1] 6.5
Compute mean of all the cells on the third column
> mean(m.4[,3])
[1] 7.5
Compute mean of all the cells on the third row
> mean(m.4[3,])
[1] 8

Basic statistics on matrix (cont.)

Compute

sum of
> sum(m.4)
[1] 78
Compute sum of
> sum(m.4[,3])
[1] 30
Compute sum of
> sum(m.4[3,])
[1] 24
Compute sum of
> sum(m.4)
[1] 78

all the cells in matrix m.4

all the cells on the third column

all the cells on the third row

all the cells in matrix m.4

rowSums/colSums
rowSums

calculates the sum of the cells for each row of a

matrix
> rowSums(m.4)
row .1 row .2 row .3 row .4
6 15 24 33
colSums

calculated the sums of the cells for each column of

a matrix
> colSums(m.4)
col.1 col.2 col.3
22 26 30
rowMeans/colMeans

> rowMeans(m.4)
row .1 row .2 row .3 row .4
2
5
8 11

> colMeans(m.4)
col.1 col.2 col.3
5.5 6.5 7.5

calculate mean of the every row/column

Adding total rows and columns to

a matrix
> m.4

col.1 col.2 col.3

row .1
1
2
3
row .2
4
5
6
row .3
7
8
9
row .4 10 11 12
Add

total column
> m.4 <- cbind(m.4, rowSums(m.4))
Setting the name for the total column
> column.names <- colnames(m.4)
> column.names
[1] "col.1" "col.2" "col.3" ""

> column.names[length(column.names)] <"col.total"

> colnames(m.4) <- column.names

Adding total rows and columns to

a matrix (cont.)

Check

the operation

> m.4
col.1 col.2 col.3 col.total
row .1
1
2
3
6
row .2
4
5
6
15
row .3
7
8
9
24
row .4 10 11 12
33

Add

total row

> m.4 <- rbind(m.4, colSums(m.4))

Setting

the name for the total column

> row.names <- rownames(m.4)

> row.names
[1] "row .1" "row .2" "row .3" "row .4" ""
> row.names[length(row.names)] <- "row.total"
> rownames(m.4) <- row.names

Adding total rows and columns to

a matrix (cont.)
Check

the operation; notice the

names of rows and columns and the
content of last row and column

> m.4
col.1 col.2 col.3 col.total
row .1

row .2

row .3

row .4

row .total 22

11
26

12
30

33
78

Arrays
Similar

to matrices but can have more than

two dimensions
Elements must be of the same type
Created with array function:
> myarray <- array(vector,
+
dimensions, dimnames)

vector contains the data for the array

dimensions is a numeric vector giving the maximal
index for each dimension
dimnames - optional list of dimension labels.

Elements

in arrays are accessed similar to

those in matrices

Create and access arrays

> dim1 <- c("A1", "A2")
> dim2 <- c("B1", "B2", "B3")
> dim3 <- c("C1", "C2",
+
"C3", "C4")
> a1 <- array(1:24, c(2, 3, 4), +
dimnames=list(dim1, dim2, + dim3))
>
> a1
,,C1
B1 B2 B3
A1 1 3 5
A2 2 4 6
,,C2
B1 B2 B3
A1 7 9 11
A2 8 10 12

Cont. of previous column

, , C3
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , C4
B1 B2 B3
A1 19 21 23
A2 20 22 24

display element [2,2,3]

> a1 [2,2,3]

[1] 16

Create and access arrays (cont.)

display a matrix from
elements of A and B for first
row/column of C
> a1 [,,1]

display a subarray containg all

elements from first two
rows/columns of A, B and C
> a1 [c(1,2),c(1,2),c(1,2)]

B1 B2 B3
A1 1 3 5

, , C1

A2 2 4 6
B1 B2
display elements of A for the
3rd "row" of B and 2nd
row/columns of C
> a1 [,3,2]

A1 A2
11 12

A1 1 3
A2 2 4
, , C2
B1 B2
A1 7 9
A2 8 10

Data Frames
Most

important data structure in R (at least

for us)
A data frame is a structure in R that holds
data and is similar to the datasets found in
standard statistical packages (for example,
SAS, SPSS, and Stata) and databases
The columns are variables and the rows
are observations
Variables can have different types (for
example, numeric, character) in the same
data frame

Create an empty data frame

> student_gi <- data.frame(studentID = numeric(),
name = character(), age = numeric(),
scholarship = character(),
lab_assessment = character(),
final_grade = numeric())
> class(student_gi)
[1] "data.fram e"
> str(student_gi)
'data.fram e': 0 obs. of 6 variables:
$ studentID
: num
$ nam e
: Factor w / 0 levels:
$ age
: num
$ scholarship : Factor w / 0 levels:
$ lab_assessm ent: Factor w / 0 levels:
$ fi
nal_grade : num

Create a data frame from vectors

Create

the vectors

> studentID <- c(1, 2, 3, 4, 5)

Create

the data frame using the above vectors

> student_gi <- data.frame(studentID, name, age,

+
scholarship, lab_assessment, final_grade)

Display data frame content

Display

data frame (content)

> student_gi
studentID
nam e age scholarship lab_assessm ent fi
nal_grade
1
1 Popescu I.Vasile 23
Social
Bine
9.00
2
2 Ianos W .Adriana 19
Studiu1 Foarte bine
9.45
3
3 Kovacz V.Iosef 21
Studiu2
Excelent
9.75
4
4 Babadag I.M aria 22
M erit
Bine
9.00
5
5
Pop P.Ion 31
Studiu1
Slab
6.00
Display one column of the data frame as a vector
> student_gi$name
[1] Popescu I.Vasile Ianos W .Adriana Kovacz V.Iosef Babadag I.M aria Pop P.Ion
Levels: Babadag I.M aria Ianos W .Adriana Kovacz V.Iosef Pop P.Ion Popescu I.Vasile
Display one column of the data frame as a... column
> student_gi["name"]
name
1 Popescu I.Vasile
2 Ianos W .Adriana
3 Kovacz V.Iosef
4 Babadag I.M aria
5
Pop P.Ion

Display data frame structure

Confirm

student_giis indeed a data frame

> class(student_gi)
[1] "data.fram e"

Display

structure of the data frame

> str(student_gi)
'data.fram e': 5 obs. of 6 variables:
$ studentID
: num 1 2 3 4 5
$ nam e
: Factor w / 5 levels "Babadag I.M aria",..: 5 2 3 1 4
$ age
: num 23 19 21 22 31
$ scholarship : Factor w / 4 levels "M erit","Social",..: 2 3 4 1 3
$ lab_assessm ent: Factor w / 4 levels "Bine","Excelent",..: 1 3 2 1 4
$ fi
nal_grade : num 9 9.45 9.75 9 6

Display

type of invididual variables within the data fra

> class(student_gi$studentID)
[1] "num eric"

> class(student_gi$name)
[1] "factor"

Useful functions for displaying

some data frame properties

Number

of observations (rows)

> nrow(student_gi)

[1] 5

Number

of variables (columns)

> ncol(student_gi)

[1] 6

Both

the number of observations (rows) and variables

(columns)

> dim(student_gi)

[1] 5 6

Display

the names of all the variables (columns)

> names(student_gi)

[1] "studentID "

"nam e"
"age"
"lab_assessm ent" "fi
n al_grade"

Display

"scholarship"

the names of the second, third and fourth

variable

> names(student_gi[2:4])

Selecting columns

Select/display

first two columns (studentID and

name )
> student_gi [1:2]

studentID
nam e
1
1 Popescu I. Vasile
2
2 Ianos W . Adriana
3
3 Kovacz V. Iosef
4
4 Babadag I. M aria
5
5
Pop P. Ion

or
> student_gi [, 1:2]

> student_gi [c("studentID", "name")]

(see on next slide)

Selecting columns (cont.)

Select/display

first two columns (studentID and

name ) other solutions

> student_gi [, c("studentID", "name")]

Using

a vector for storing indices of the first two

columns

> cols <- c("studentID", "name")

> student_gi[cols]

> student_gi[, names(student_gi) %in% cols]

Return

"final_grade" variable (column) as a vector

> student_gi$final_grade
[1] 9.00 9.45 9.75 9.00 6.00

or ... See on the next slide

Selecting columns (cont.)

Return

"final_grade" variable (column) as a vector

(cont.)
> student_gi[ , 6]

or
> student_gi[ , "final_grade"]

Return

"final_grade" variable (column) as a one-column

data frame
> student_gi[ , "final_grade", drop=FALSE]
fi
nal_grade
1
9.00
2
9.45
3
9.75
4
9.00
5
6.00

Selecting rows

Display

first two observations (rows)

> student_gi [1:2,]
studentID
nam e age scholarship
1
1 Popescu I. Vasile 23
Social
2
2 Ianos W . Adriana 19
Studiu1
lab_assessm ent fi
n al_grade
1
Bine
9.00
2 Foarte bine
9.45

Display

display observations 1, 2 and 5

> student_gi [c(1:2, 5),]
studentID
nam e age scholarship lab_assessm ent
fi
nal_grade
1
1 Popescu I. Vasile 23
Social
Bine
9.00
2
2 Ianos W . Adriana 19
Studiu1 Foarte bine
9.45
5
5
Pop P. Ion 31
Studiu1
Slab
6.00

attach function
attach

adds the data frame to the R search path

> search()
[1] ".G lobalEnv"
"tools:rstudio"
[3] "package:stats" "package:graphics"
[5] "package:grD evices" "package:utils"
[7] "package:datasets" "package:m ethods"
[9] "Autoloads"
"package:base"
When a variable name is encountered, data
frames in the search path are checked in order to
locate the variable.
Commands

without attach
> student_gi$final_grade
> table (student_gi$lab_assessment,
student_gi$final_grade)
> summary(student_gi$final_grade)

attach vs. with

The

>
>
>
>
>

same commands using attach

attach(student_gi)
final_grade
table (lab_assessment, final_grade)
summary(final_grade)
plot(age, final_grade)

detach

removes an objects from the search path

> detach(student_gi)
It

is advisable to use
> with (student_gi,
> with (student_gi,
final_grade))
> with (student_gi,
final_grade) )

with instead of attach:

final_grade)
table (lab_assessment,
plot(lab_assessment,

Case (row) identifiers

Act

like primary/unique keys in relational tables

Can be specified by rowname option within the
data.frame function
We allocate new values for studentID (to avoid
confusion with row numbers); the remaining
vectors are identical
> studentID <- c(1001, 1002, 1003, 1004,
1005)
> name <- c("Popescu I. Vasile",
+
"Ianos W. Adriana", "Kovacz V. Iosef",
+
"Babadag I. Maria", "Pop P. Ion")
> age <- c(23, 19, 21, 22, 31)
> scholarship <- c("Social", "Studiu1",
+
"Studiu2", "Merit", "Studiu1")
> lab_assessment <- c("Bine", "Foarte bine",
+
"Excelent", "Bine", "Slab")

Case (row) identifiers (cont.)

(slightly) new version of the data frame:

> student_gi <- data.frame(studentID, name,
age,
+
scholarship, lab_assessment,
+ final_grade, row.names = studentID)
studentID is the variable to use in labeling cases
on various printouts and graphics produced with
R.
display

the name of the rows (observations)

> rownames(student_gi)
[1] "1001" "1002" "1003" "1004" "1005"
> student_gi
studentID
nam e age scholarship lab_assessm ent
1001
1001 Popescu I. Vasile 23
Social
Bine
1002
1002 Ianos W . Adriana 19
Studiu1 Foarte bine
1003
1003 Kovacz V. Iosef 21
Studiu2
Excelent

Case (row) identifiers (cont.)

display

the name of the rows (observations)

> rownames(student_gi)
[1] "1001" "1002" "1003" "1004" "1005"
Notice

the leftmost column of the data frame

display
> student_gi
studentID
1001
1001
1002
1002
1003
1003
1004
1004
1005
1005

nam e age scholarship lab_assessm ent

Popescu I. Vasile 23
Social
Bine
Ianos W . Adriana 19
Studiu1 Foarte bine
Kovacz V. Iosef 21
Studiu2
Excelent
Babadag I. M aria 22
M erit
Bine
Pop P. Ion 31
Studiu1
Slab

fi
nal_grade
1001
9.00
1002
9.45
1003
9.75
1004
9.00

Case (row) identifiers (cont.)

Display

the observation (row) corresponding to

student Ianos W. Adriana using her case
identifier ("1002")
> student_gi["1002",]
studentID
nam e age scholarship lab_assessm ent
1002
1002 Ianos W . Adriana 19
Studiu1 Foarte bine
fi
nal_grade
1002
9.45

Display

the observations corresponding to

students Ianos W. Adriana and Pop P. Ion using
their case identifier ("1002" and "1005")
> student_gi[c("1002", "1005"),]
studentID
nam e age scholarship lab_assessm ent
1002
1002 Ianos W . Adriana 19
Studiu1 Foarte bine
1005
1005
Pop P. Ion 31
Studiu1
Slab
fi
nal_grade
1002
9.45
1005
6.0

Factors (reprise)
In

presentation 02a, variables were described as

nominal, ordinal, interval, and ratio
Nominal variables are categorical, without an
implied order. Examples: MaritalStatus, Sex, Job,
MasterProgramme
Ordinal variables imply order but not amount.
Examples: Status (poor, improved, excellent ),
LabAssessment (slab, bine, foarteBine, excelent)
Interval and Ratio variables can take on any
value within some range, and both order and
amount are implied. Examples: LitersPer100Km,
Height, Weight, FinalGrade (with decimals)
Categorical (nominal) and ordered categorical
(ordinal) variables are called factors.

Function factor
Factors

determine how data will be analyzed and

presented visually
The function factor() stores the categorical
values as a vector of integers in the range [1... k ]
(where k is the number of unique values in the
nominal variable), and an internal vector of
character strings (the original values) mapped to
these integers
Initially vector scholarship is a nominal variable
> scholarship <- c("Social", "Studiu1",
"Studiu2",
+
"Merit", "Studiu1")
Now

it will be converted into a factor:

> scholarship_f <- factor(scholarship)

> scholarship_f
[1] Social Studiu1 Studiu2 M erit Studiu1
Levels: M erit SocialStudiu1 Studiu2

Ordered factors
Another

ordinal variable
> lab_assessment <- c("Bine", "Foarte bine",
+
"Excelent", "Bine", "Slab")
Notice the way of dispaying
> lab_assessment
[1] "Bine"
"Foarte bine" "Excelent" "Bine"
[5] "Slab"
Now declare the vector as an ordered factor
> lab_assessment <- factor(lab_assessment,
+
order=TRUE, levels=c("Slab", "Bine",
+
"Foarte bine", "Excelent"))
Notice the new way of displaying the vector
> lab_assessment
[1] Bine
Foarte bine Excelent Bine
Slab
Levels: Slab < Bine < Foarte bine < Excelent

Factors in data frames

Re-create

the data frame using factors

> studentID <- c(1001, 1002, 1003, 1004, 1005)

> name <- c("Popescu I. Vasile", "Ianos W.
Adriana",
+
"Kovacz V. Iosef", "Babadag I. Maria",
+
"Pop P. Ion")
> age <- c(23, 19, 21, 22, 31)
> scholarship <- c("Social", "Studiu1",
"Studiu2",
+
"Merit", "Studiu1")
> scholarship <- factor(scholarship)
> lab_assessment <- c("Bine", "Foarte bine",
+
"Excelent", "Bine", "Slab")
> lab_assessment <- factor(lab_assessment,
+
order=TRUE, levels=c("Slab", "Bine",
+
"Foarte bine", "Excelent"))
> final_grade <- c(9, 9.45, 9.75, 9, 6)

Factors in data frames (cont.)

Another

version of the data frame

> student_gi <- data.frame(name, age,

scholarship,
+
lab_assessment, final_grade,
+
row.names = studentID)
Display

the structure of the data frame

> str(student_gi)
'data.fram e':5 obs.of 5 variables:
$ nam e
: Factor w / 5 levels "Babadag I.M aria",..: 5
2314
$ age
: num 23 19 21 22 31
$ scholarship : Factor w / 4 levels "M erit","Social",..: 2 3
413
$ lab_assessm ent: O rd.factor w / 4 levels
"Slab"< "Bine"< ..: 2 3 4 2 1
$ fi
n al_grade : num 9 9.45 9.75 9 6

Factors in data frames (cont.)

Basic

statistics about variables in data frame

> summary(student_gi)
nam e
age
scholarship
Babadag I.M aria :1 M in. :19.0 M erit :1
Ianos W .Adriana :1 1st Q u.:21.0 Social:1
Kovacz V.Iosef :1 M edian :22.0 Studiu1:2
Pop P.Ion
:1 M ean :23.2 Studiu2:1
Popescu I. Vasile:1 3rd Q u.:23.0
M ax. :31.0
lab_assessm ent fi
nal_grade
Slab
:1
M in. :6.00
Bine
:2
1st Q u.:9.00
Foarte bine:1
M edian :9.00
Excelent :1
M ean :8.64
3rd Q u.:9.45
M ax. :9.75

Factors and value labels

> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1",
"Type1")
> status <- c("Poor", "Improved", "Excellent",
+
"Poor")
> diabetes <- factor(diabetes)
> status <- factor(status, order=TRUE)
> gender <- c(1, 2, 2, 1)
> patientdata <- data.frame(patientID, age,
+
diabetes, status, gender)
For

variable gender (coded 1 for males and 2 for

females) the value labels are declared with options
levels (indicating the values) and labels
(indicating the labels):

> patientdata$gender <-

Factors and value labels (cont.)

For

gender, labels (instead of of values) are displayed

> patientdata
patientID age diabetes status gender
1
1 25 Type1
Poor m ale
2
2 34 Type2 Im proved fem ale
3
3 28 Type1 Excellent fem ale
4
4 52 Type1
Poor m ale
Data

frame structure (see information about gender):

> str(patientdata)
'data.fram e':4 obs.of 5 variables:
$ patientID : num 1 2 3 4
$ age
: num 25 34 28 52
$ diabetes : Factor w / 2 levels "Type1","Type2": 1 2 1 1
$ status : O rd.factor w / 3 levels "Excellent"< "Im proved"< ..: 3
213
$ gender : Factor w / 2 levels "m ale","fem ale": 1 2 2 1

Lists
Lists

are the most complex of the R data types

A list is an ordered collection of objects
(components).
A list allows gathering a large variety of (possibly
unrelated) objects under one name.
A list can contain a combination of vectors,
matrices, data frames, and even other list
Created using list() function :
mylist <- list(object1, object2, )

where the objects are any of the structures seen so far

Optionally, the objects in a list can be named:
mylist <- list(name1=object1,
+
name2=object2, )

First example of list: POSIXlt variables

Variable

t gets the current system timestamp:

> t = Sys.time()

POSIXlt

objects are actually lists

> l.1 <- as.POSIXlt(t)

> l.1
[1] "2014-09-25 08:37:24 EEST"
> typeof(l.1)
[1] "list"
> names(l.1)
NULL
> unclass(l.1)
$sec
[1] 24.19267
$min
[1] 37
$hour
[1] 8
$mday
[1] 25
...

First example of list: POSIXlt variables (cont.)

Extract

list components values (seconds, minutes,

hours, ...) eqivalent to l.1$sec, l.1$min ...:

> l.1[[1]]
[1] 24.19267
> l.1[[2]]
[1] 37
> l.1[[3]]
[1] 8
> l.1[[4]]
[1] 25
...

Display

(horizontally) components of the timestamp

object
> unlist(l.1)
sec
min
24.19267 37.00000
wday
yday

hour
8.00000
isdst

mday
25.00000

mon
year
8.00000 114.00000

Matrices and lists

Matrix

dimension names (dimnames) object is a list

> m.3 <- matrix(cells, nrow=2, ncol=2,

+
byrow=FALSE,
+
dimnames=list(rownames, colnames))
> m.3
Col1 Col2
Row1
1
24
Row2
26
68
> dimnames(m.3)
[[1]]
[1] "Row1" "Row2"
[[2]]
[1] "Col1" "Col2"
> unlist(dimnames(m.3))
[1] "Row1" "Row2" "Col1" "Col2"

Creating and displaying simple lists

Create

two simple lists

> list.1 = list ("unu", "doi", "trei")
> list.2 = list( c("doi", "trei", "patru"))
Vizualizing

> list.1
[[1]]
[1] "unu"

[[2]]
[1] "doi"
[[3]]
[1] "trei"
> list.2

[[1]]

lists

Create a more complex list

list.3

contains two previous lists, a vector (sequence) and a data

frame:

> list.3 = list (list.1, list.2, 3:7, patientdata)

> list.3
[[1]]
[[1]][[1]]
[1] "unu"
[[1]][[2]]
[1] "doi"
[[1]][[3]]
[1] "trei"
[[2]]
[[2]][[1]]
[1] "doi"
"trei" "patru"
[[3]]
[1] 3 4 5 6 7
[[4]]
patientID age diabetes
status gender
1
1 25
Type1
Poor
male
2
2 34
Type2 Improved female
3
3 28
Type1 Excellent female
4
4 52
Type1
Poor
male

Create a more complex list (cont.)

Display

the structure of list.3:

> str(list.3)
List of 4
$ :List of 3
..$ : chr "unu"
..$ : chr "doi"
..$ : chr "trei"
$ :List of 1
..$ : chr [1:3] "doi" "trei" "patru"
$ : int [1:5] 3 4 5 6 7
$ :'data.frame': 4 obs. of 5 variables:
..$ patientID: num [1:4] 1 2 3 4
..$ age
: num [1:4] 25 34 28 52
..$ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1
..$ status
: Ord.factor w/ 3 levels
"Excellent"<"Improved"<..: 3 2 1 3
..$ gender
: Factor w/ 2 levels "male","female": 1 2 2 1

Accessing list components

Display

the number of objects in a list

> length(list.3)
[1] 4

Access

the first object of the list

> list.3[[1]]
[[1]]
[1] "unu"

[[2]]
[1] "doi"
[[3]]
[1] "trei"
> class(list.3[[1]])
[1] "list"

Accessing list components (cont)

Access

the second component of the list

> list.3[[2]]
[[1]]
[1] "doi"
"trei" "patru"
> class(list.3[[2]])
[1] "list"
...

and the fourth component

> list.3[[4]]
patientID age diabetes
status gender
1
1 25
Type1
Poor
male
2
2 34
Type2 Improved female
3
3 28
Type1 Excellent female
4
4 52
Type1
Poor
male
> class(list.3[[4]])
[1] "data.frame"

List component attributes/names

Function

names display the names of

designated components of a list

The

first object of list.3 is a list whose

components have no name:

> names(list.3[[1]])
NULL
The

fourth object of list.3 is a data frame

called patientdata; this data frame have four
variables (columns) whose names can be
displayed with function names:

> names(list.3[[4]])
[1] "patientID" "age"
"gender"

"diabetes"

"status"

Accessing components within components

Display

the third object within the first component in list.3

> list.3[[1]][[3]]
[1] "trei"
Display, in the data

frame patientdata (the data frame is

the 4th component of the list) the values of column age (this
column is the 2nd of the data frame)

list.3[[4]][,

2]
[1] 25 34 28 52

Display

> list.3[[4]][, "age"]

age as a column (not a vector)

> list.3[[4]][, "age", drop=FALSE]

age
1 25
2 34
3 28
4 52

Display

age of the third patient

> list.3[[4]][, 2][3]

> list.3[[4]][, "age", drop=FALSE]$age[3]
[1] 28

Tables in R
Not

full-fledged data structure, but a sort of

labeled (named) arrays
Some functions (e.g. graphic functions,
categorical data analysis functions) accept
only tables as arguments
More about tables in script 06c
Two

main types of tables:

tables of frequencies counts number of occurences

for each value of a (usually) categorical variable
tables of proportions which divides number of
occurences of each value to total number of
occurences of a (usually) categorical variable

Uni-dimensional tables
Create

a table with frequencies of scholarship in data frame

student_gi
> table.1 <- with(student_gi, table(scholarship))
> table.1
scholarship
Merit Social Studiu1 Studiu2
1
1
2
1
Display structure of table.1
> str(table.1)
'table' int [1:4(1d)] 1 1 2 1
- attr(*, "dimnames")=List of 1
..$ scholarship: chr [1:4] "Merit" "Social" "Studiu1"
"Studiu2"
> class(table.1)
[1] "table"
Unidimensional

tables are vectors with labeled elements (each

element's label is a value of the attribute used in function table)
> names(table.1)
[1] "Merit"
"Social" "Studiu1" "Studiu2"

Access/display uni-dimensional tables

tables.1

is not a data frame, so we cannot qualify the variable using

$...
> table.1$Merit
Error in table.1$Merit : $ operator is invalid for atomic vectors
...

but we can access with vector indices

> table.1[1]
Merit
1
...

or list indices

> table.1[[1]]
[1] 1
Display

both label and the of the 3rd element in table table.1:

> table.1[3]
Studiu1
2
...

or
> unlist(table.1)[3]
Studiu1
2

Access/display uni-dimensional tables (cont.)

Display

only the label of the 3rd element of the table table.1:

> names(table.1) [3]

[1] "Studiu1"
Display

only the value of the 3rd element in table.1:

> unlist(table.1)[[3]]
[1] 2
Display

3rd elements' both name and value by the name:

> table.1["Studiu1"]
Studiu1
2
Display

both names and values of two elements by their

names:
> table.1[c("Merit", "Studiu1")]
scholarship
Merit Studiu1
1
2

Bi-dimensional tables
Similar

to pivot tables in Excel

Create

a contingency (pivot) table with frequencies of

scholarship by lab_assessment

> table.2 <- with(student_gi, table(scholarship, lab_assessment))

> table.2
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 1
0
0
Social 0 1
0
0
Studiu1 1 0
1
0
Studiu2 0 0
0
1
Structure

of table.2

> str(table.2)
'table'int [1:4,1:4] 0 0 1 0 1 1 0 0 0 0 ...
- attr(*,"dim nam es")= List of 2
..$ scholarship : chr [1:4] "M erit" "Social" "Studiu1" "Studiu2"
..$ lab_assessm ent: chr [1:4] "Slab" "Bine" "Foarte bine" "Excelent"
> class(table.2)
[1] "table"

Accessing bi-dimensional tables

Any

cell can be accessed using indices of row and column...

> table.2[1,2]

[1] 1
...

or the names/labels
> table.2["Merit", "Bine"]
[1] 1
Display

the second column (associated with value Bine of

lab_assessment) as a vector using the index (2)...
> table.2[, 2]
M erit SocialStudiu1 Studiu2
1

...

or the name of the column (Bine)

> table.2[, "Bine"]
M erit SocialStudiu1 Studiu2
1

Accessing bi-dimensional tables (cont.)

Similarly,

Access

one can access individual (or group of) rows

particular rows and columns in a table

> table.2[c("Merit", "Studiu1"), c("Slab", "Excelent")]

lab_assessm ent
scholarship Slab Excelent
M erit
Studiu1

0
1

Tri-dimensional tables
Create

a three-dimensional table with frequencies of scholarship by

lab_assessment by final_grade

> table.3 <- with(student_gi, table(scholarship, lab_assessment,

final_grade))
Display

table.3

> table.3
,,fi
nal_grade = 6
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 0
0
0
Social 0 0
0
0
Studiu1 1 0
0
0
Studiu2 0 0
0
0
,,fi
nal_grade = 9
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 1
0
0
Social 0 1
0
0
Studiu1 0 0
0
0
Studiu2 0 0
0
0

Tri-dimensional tables (cont.)

Display

table.3 (cont.)

, , fi
n al_grade = 9.45
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 0
0
0
Social 0 0
0
0
Studiu1 0 0
1
0
Studiu2 0 0
0
0
, , fi
n al_grade = 9.75
lab_assessm ent
scholarship Slab Bine Foarte bine Excelent
M erit
0 0
0
0
Social 0 0
0
0
Studiu1 0 0
0
0
Studiu2 0 0
0
1

ftable
ftable

improves the display of three-dimensional tables

> ftable(table.3)
fi
n al_grade 6 9 9.45 9.75
scholarship lab_assessm ent
M erit
Slab
00 0 0
Bine
01 0 0
Foarte bine
00 0 0
Excelent
00 0 0
Social
Slab
00 0 0
Bine
01 0 0
Foarte bine
00 0 0
Excelent
00 0 0
Studiu1
Slab
10 0 0
Bine
00 0 0
Foarte bine
00 1 0
Excelent
00 0 0
Studiu2
Slab
00 0 0
Bine
00 0 0
Foarte bine
00 0 0
Excelent
00 0 1

Accessing three-dimensional tables

Any

cell can be accessed using indices of the three axes...

> table.3[3, 3, 3]
[1] 1
...

or the names/labels

> table.3["Studiu2", "Excelent", "9.75"]

[1] 1
Display,

as an one-dimensional table, the values of the

lab_assessment which corespond to value Studiu2 (4th) of
scholarship and the value 9.75 (4th) of final_grade

one can use the indexes ...

> table.3[4, , 4]
Slab
0

Bine Foarte bine Excelent

0
0
1

... or the label/names

> table.3[ "Studiu2", , "9.75" ]
Slab
0

Bine Foarte bine Excelent

0
0
1

Accessing three-dimensional tables (cont.)

Display,

as a bi-dimensional table, the values of the first

(scholarship) and the third (final_grade) axes associated with
the 4th value (Excelent) of the second axis (lab_assessment)

one can use the index...

> table.3[, 4, ]
fi
nal_grade
scholarship 6 9 9.45 9.75
M erit 0 0 0 0
Social 0 0 0 0
Studiu1 0 0 0 0
Studiu2 0 0 0 1

... or the label/name

> table.3[, "Excelent", ]
fi
nal_grade
scholarship 6 9 9.45 9.75
M erit 0 0 0 0
Social 0 0 0 0
Studiu1 0 0 0 0
Studiu2 0 0 0 1

Accessing three-dimensional tables (cont.)

One

can access particular ranges on each axis

> table.3[c("Merit", "Studiu1"), c("Slab",

"Excelent"), c("9.45", "9.75") ]
, , fi
n al_grade = 9.45
lab_assessm ent
scholarship Slab Excelent
M erit
0
0
Studiu1 0
0
, , fi
n al_grade = 9.75
lab_assessm ent
scholarship Slab Excelent
M erit
0
0
Studiu1 0
0

Built-in datasets

Some

datasets are available in base (core) R (e.g. faithful)

> head(faithful, 3)
eruptions w aiting
1
3.600
79
2
1.800
54
3
3.333
74

Most

data sets are available in packages (e.g. ggplot2, vcd,

...)

most cases, data sets are stored as data frames, e.g.

the dataset movies from package ggplot2

Every

package must be installed (once per computer)

> install.packages("ggplot2")

After

installation, a package must be loaded (once for

every RStudio session)
> library(ggplot2)

Built-in datasets (cont.)

Display

the structure of dataset movies

> str(movies)
'data.fram e':58788 obs. of 24 variables:
$ title
: chr "$" "$1000 a Touchdow n" "$21 a D ay O nce a
M onth" "$40,000" ...
$ year
: int 1971 1939 1941 1996 1975 2000 2002
2002 1987 1917 ...
$ length
: int 121 71 7 70 71 91 93 25 97 61 ...
$ budget : int N A N A N A N A N A N A N A N A N A N A ...
$ rating
: num 6.4 6 8.2 8.2 3.4 4.3 5.3 6.7 6.6 6 ...
$ votes
: int 348 20 5 6 17 45 200 24 18 51 ...
$ r1
: num 4.5 0 0 14.5 24.5 4.5 4.5 4.5 4.5 4.5 ...
$ r2
: num 4.5 14.5 0 0 4.5 4.5 0 4.5 4.5 0 ...
$ r3
: num 4.5 4.5 0 0 0 4.5 4.5 4.5 4.5 4.5 ...
...

Built-in dataset stored as table

Data

set HairEyeColor in package vcd is stored as

three-dimensional table (http://cran.us.rproject.org/w eb/packages/vcdExtra/vignettes/vcdtutorial.pdf)
> install.packages("vcd")
> library(vcd)
> head(HairEyeColor)
[1] 32 53 10 3 11 50
> str(HairEyeColor)
table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
- attr(*, "dim nam es")= List of 3
..$ H air: chr [1:4] "Black" "Brow n" "Red" "Blond"
..$ Eye : chr [1:4] "Brow n" "Blue" "H azel" "G reen"
..$ Sex : chr [1:2] "M ale" "Fem ale"
> class(HairEyeColor)
[1] "table"

Package datasets

has a special package called datasets

> library(datasets)

function

data displays all the datasets in this package

> data()

Visualize

all the data sets available in all packages:

> data(package = .packages(all.available =
TRUE))

Display the datasets available in package ggplot2

> try(data(package = "ggplot2") )

...or

> data(package = "ggplot2")$results

list (made in 2012) of all datasets in R is available at

http://www.public.iastate.edu/~hofmann/data_in_r_sor

Data structures conversion

Not

all conversions from an object (of a data type) into

another object (of another data type) are possible

Generally,

function as.data.frame converts any other

data type object into a a data frame

Ex:

convert a vector into a data frame

> a_vector
[1] 2 3 4 8 9 10 11 12 13 14
> v_to_df.1 <- as.data.frame(a_vector)
> v_to_df.1
a_vector
1
2
2
3
3
4
...

Data structures conversion (cont.)

Convert

matrix m.4 into a data frame

> m_to_df.1 <- as.data.frame(m.4)
> m_to_df.1
col.1 col.2 col.3 col.total
row.1
1
2
3
6
row.2
4
5
6
15
row.3
7
8
9
24
row.4
10
11
12
33
row.total
22
26
30
78
> str(m_to_df.1)
'data.frame': 5 obs. of 4 variables:
$ col.1
: num 1 4 7 10 22
$ col.2
: num 2 5 8 11 26
$ col.3
: num 3 6 9 12 30
$ col.total: num 6 15 24 33 78

Data structures conversion (cont.)

Convert

a table into a data frame

> table_to_dataframe =
data.frame(unlist(HairEyeColor))
> head(table_to_dataframe, 3)
Hair
Eye Sex Freq
1 Black Brown Male
32
2 Brown Brown Male
53
3 Red Brown Male
10

Convert

>
+
>
1
2
3

a list into a data frame

df <- data.frame(matrix(unlist(list.1), nrow=132,
byrow=T))
head(df,3)
matrix.unlist.list.1...nrow...132..byrow...T.
unu
doi
trei

T24 Induction Business - AA Account
90% (10)
T24 Induction Business - AA Account
44 pages
GMP Certificate: Food & Drugs Administration (Maharashtra State)
100% (2)
GMP Certificate: Food & Drugs Administration (Maharashtra State)
1 page
Industrial Training Report-Prakash
No ratings yet
Industrial Training Report-Prakash
21 pages
HW1 For R-Lizhi Fu
No ratings yet
HW1 For R-Lizhi Fu
5 pages
The Statoil Book
100% (2)
The Statoil Book
77 pages
New NDA Template-2018
0% (1)
New NDA Template-2018
8 pages
ex3
No ratings yet
ex3
20 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Experiment2
No ratings yet
Experiment2
17 pages
My R Report
No ratings yet
My R Report
52 pages
Lab 1 22.7
No ratings yet
Lab 1 22.7
40 pages
Smda Unit III
No ratings yet
Smda Unit III
80 pages
MLlab5th
No ratings yet
MLlab5th
17 pages
Obejcts in R A13
No ratings yet
Obejcts in R A13
8 pages
Data Structure in
No ratings yet
Data Structure in
18 pages
BRM File
No ratings yet
BRM File
20 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Exercise - Commands in Blue, Comments in Green, Outputs in Black
No ratings yet
Exercise - Commands in Blue, Comments in Green, Outputs in Black
4 pages
Machine Learning in R: Alexandros Karatzoglou
No ratings yet
Machine Learning in R: Alexandros Karatzoglou
151 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
Linear Regression Chp1
No ratings yet
Linear Regression Chp1
102 pages
R Basics: Daniel Stegmueller
No ratings yet
R Basics: Daniel Stegmueller
14 pages
Expt. No. Basic Math Date
No ratings yet
Expt. No. Basic Math Date
24 pages
Babd Mid-Term
No ratings yet
Babd Mid-Term
16 pages
Practice of Introductory Time Series With R
No ratings yet
Practice of Introductory Time Series With R
22 pages
Intro R
No ratings yet
Intro R
38 pages
R Programming
No ratings yet
R Programming
48 pages
Vectors: Assistant Professor Department of Cse VFSTR Deemed To Be University Guntur, Ap, India
No ratings yet
Vectors: Assistant Professor Department of Cse VFSTR Deemed To Be University Guntur, Ap, India
30 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
R Commands
No ratings yet
R Commands
18 pages
2 7 Structured Data Types en
No ratings yet
2 7 Structured Data Types en
23 pages
R Studio
No ratings yet
R Studio
42 pages
Vectors
No ratings yet
Vectors
39 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
Introduction To Spatial Data Handling in R
No ratings yet
Introduction To Spatial Data Handling in R
25 pages
Sem-Iv Class-1: The R Environment
No ratings yet
Sem-Iv Class-1: The R Environment
32 pages
Lab3-Lists, Matrices and Arrays
No ratings yet
Lab3-Lists, Matrices and Arrays
6 pages
Sam BRM Rstudio
No ratings yet
Sam BRM Rstudio
43 pages
DAV LAB
No ratings yet
DAV LAB
54 pages
DMDWLab Book Answers
100% (2)
DMDWLab Book Answers
44 pages
data anlytics using r notes
No ratings yet
data anlytics using r notes
14 pages
Machine Learning-Intro
No ratings yet
Machine Learning-Intro
7 pages
Minor Assignment-7 (Mutable and Immutable Objects)
No ratings yet
Minor Assignment-7 (Mutable and Immutable Objects)
5 pages
R Funda
No ratings yet
R Funda
7 pages
DSA Full Notes
No ratings yet
DSA Full Notes
267 pages
ATA Tructures IN: Pavan Kumar A Senior Project Engineer Big Data Analytics Team Cdac-Kp
No ratings yet
ATA Tructures IN: Pavan Kumar A Senior Project Engineer Big Data Analytics Team Cdac-Kp
32 pages
Tuples: Python For Everybody
No ratings yet
Tuples: Python For Everybody
16 pages
Pythonlearn 10 Tuples
No ratings yet
Pythonlearn 10 Tuples
16 pages
UNIT-1
No ratings yet
UNIT-1
4 pages
Dar 3 Username Output
No ratings yet
Dar 3 Username Output
1 page
3
No ratings yet
3
5 pages
Lab01 PDF
No ratings yet
Lab01 PDF
17 pages
R_PROG_PRACT3
No ratings yet
R_PROG_PRACT3
7 pages
4-Creation and handling Lists
No ratings yet
4-Creation and handling Lists
4 pages
ATA Tructures In: Pavan Kumar A
No ratings yet
ATA Tructures In: Pavan Kumar A
35 pages
CSC 202 Session 6
No ratings yet
CSC 202 Session 6
15 pages
Basic R Programming
No ratings yet
Basic R Programming
16 pages
R Tutorial
100% (1)
R Tutorial
41 pages
DS_Week6-1__Data Structures1_1
No ratings yet
DS_Week6-1__Data Structures1_1
34 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
R97 Mod 01
No ratings yet
R97 Mod 01
40 pages
Chapter_3_R objects or data types
No ratings yet
Chapter_3_R objects or data types
7 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Comanda de L Client
No ratings yet
Comanda de L Client
1 page
Registru de Casa - : Soldul Zilei Precedente Cont Corespondent Data NR
No ratings yet
Registru de Casa - : Soldul Zilei Precedente Cont Corespondent Data NR
1 page
Simply The Best: in Company Upper-Intermediate - Second Edition Answer Key: Unit 11
No ratings yet
Simply The Best: in Company Upper-Intermediate - Second Edition Answer Key: Unit 11
3 pages
Do The Right Thing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 3
No ratings yet
Do The Right Thing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 3
1 page
Case Study The Futures Unwritten
No ratings yet
Case Study The Futures Unwritten
1 page
BA Opgave
No ratings yet
BA Opgave
32 pages
Promoting Your Ideas 8
100% (1)
Promoting Your Ideas 8
7 pages
E-Mailing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 12
No ratings yet
E-Mailing: in Company Upper-Intermediate - Second Edition Answer Key: Unit 12
6 pages
03 Data Input Output
No ratings yet
03 Data Input Output
43 pages
Data Analysis & Data Science With R
No ratings yet
Data Analysis & Data Science With R
6 pages
Nume Nota ISA Nota Statistica Nota Economie Nota LB - STR
No ratings yet
Nume Nota ISA Nota Statistica Nota Economie Nota LB - STR
1 page
BIBLIOGRAFIE
No ratings yet
BIBLIOGRAFIE
1 page
Employee Satisfaction
100% (3)
Employee Satisfaction
106 pages
201.26-RP1 5-08 YCWL0056 Through YCWL0610 R-410A PDF
No ratings yet
201.26-RP1 5-08 YCWL0056 Through YCWL0610 R-410A PDF
72 pages
Career Guide 2020-Final PDF
No ratings yet
Career Guide 2020-Final PDF
851 pages
Smart Hot Chamber in Refrigeration System Based
No ratings yet
Smart Hot Chamber in Refrigeration System Based
5 pages
4) Size of Business Y12 2019 (D)
No ratings yet
4) Size of Business Y12 2019 (D)
11 pages
Dave Pelz's Short Game Bible (PDFDrive)
No ratings yet
Dave Pelz's Short Game Bible (PDFDrive)
508 pages
CHEM 2122 (G12) .Docx Versn 1
No ratings yet
CHEM 2122 (G12) .Docx Versn 1
62 pages
Recording Neutral Zone.
No ratings yet
Recording Neutral Zone.
3 pages
Sander and Sandpaper Storage
100% (2)
Sander and Sandpaper Storage
16 pages
(Piatkus Guides) Keith Mason-The Radionics Handbook - How To Improve Your Health With A Powerful Form of Energy Therapy-Piatkus Books (2001)
100% (5)
(Piatkus Guides) Keith Mason-The Radionics Handbook - How To Improve Your Health With A Powerful Form of Energy Therapy-Piatkus Books (2001)
173 pages
Brochure Technical Analysis-Chart Patterns-Capital Markets
No ratings yet
Brochure Technical Analysis-Chart Patterns-Capital Markets
19 pages
MCE17 - Module 1
No ratings yet
MCE17 - Module 1
7 pages
Hanzi Pinyin English: HSK Level 2
No ratings yet
Hanzi Pinyin English: HSK Level 2
5 pages
The Game of Life: Inal Roject Eport
No ratings yet
The Game of Life: Inal Roject Eport
28 pages
The Berlin Block As A Urban Tool Rethink PDF
No ratings yet
The Berlin Block As A Urban Tool Rethink PDF
49 pages
Voucher2 RKO
No ratings yet
Voucher2 RKO
3 pages
Tybms Black Book Project
No ratings yet
Tybms Black Book Project
13 pages
Brand Guide Format
No ratings yet
Brand Guide Format
36 pages
Is Shopping A Hobby These Days
No ratings yet
Is Shopping A Hobby These Days
3 pages
Test AK
No ratings yet
Test AK
4 pages
COMM 292 Group EXERCISE - Leadership
No ratings yet
COMM 292 Group EXERCISE - Leadership
7 pages
Lecture27-Amplifier Configurations
No ratings yet
Lecture27-Amplifier Configurations
42 pages
LTE Parameters and Tuning
No ratings yet
LTE Parameters and Tuning
2 pages
Astm b117 1973 PDF
No ratings yet
Astm b117 1973 PDF
10 pages
Satya New Appl Javascript
No ratings yet
Satya New Appl Javascript
28 pages
Research2 q4 Mod1.2 Constructingscienceprojectdisplayboard v2
100% (1)
Research2 q4 Mod1.2 Constructingscienceprojectdisplayboard v2
22 pages