Hadley Wickham
Stat405Data structures
Thursday, 11 November 2010
Assessment
• Final: there is no final
• Two groups still haven’t sent me
electronic versions of their project
• Many of you STILL HAVEN’T returned
your team evaluations
Thursday, 11 November 2010
1. Basic data types
2. Vectors, matrices & arrays
3. Lists & data.frames
Thursday, 11 November 2010
Vector
Matrix
Array
List
Data frame
1d
2d
nd
Same types Different types
Thursday, 11 November 2010
character
numeric
logical
mode()
length() A scalar is a vector of length 1
as.character(c(T, F))
as.character(seq_len(5))
as.logical(c(0, 1, 100))
as.logical(c("T", "F", "a"))
as.numeric(c("A", "100"))
as.numeric(c(T, F))
When vectors of
different types occur
in an expression,
they will be
automatically
coerced to the same
type: character >
numeric > logical
names() Optional, but useful
Technically, these are all atomic vectors
Thursday, 11 November 2010
Your turn
Experiment with automatic coercion.
What is happening in the following cases?
104 & 2 < 4
mean(diamonds$cut == "Good")
c(T, F, T, T, "F")
c(1, 2, 3, 4, F)
Thursday, 11 November 2010
Matrix (2d)
Array (>2d)
a <- seq_len(12)
dim(a) <- c(1, 12)
dim(a) <- c(4, 3)
dim(a) <- c(2, 6)
dim(a) <- c(3, 2, 2)
1 5 9
2 6 10
3 7 11
4 8 12
1 2 3 4 5 6 7 8 9 10 11 12
1 3 5 7 9 11
2 4 6 8 10 12
(1, 12)
(4, 3) (2, 6)Just like a vector.
Has mode() and
length().
Create with matrix
() or array(), or
from a vector by
setting dim()
as.vector()
converts back to a
vector
Thursday, 11 November 2010
List
Is also a vector (so has
mode, length and names),
but is different in that it can
store any other vector inside
it (including lists).
Use unlist() to convert to
a vector. Use as.list() to
convert a vector to a list.
c(1, 2, c(3, 4))
list(1, 2, list(3, 4))
c("a", T, 1:3)
list("a", T, 1:3)
a <- list(1:3, 1:5)
unlist(a)
as.list(a)
b <- list(1:3, "a", "b")
unlist(b)
Technically a recursive vector
Thursday, 11 November 2010
Data frame
List of vectors, each of the
same length. (Cross
between list and matrix)
Different to matrix in that
each column can have a
different type
Thursday, 11 November 2010
# How do you convert a matrix to a data frame?
# How do you convert a data frame to a matrix?
# What is different?
# What does these subsetting operations do?
# Why do they work? (Remember to use str)
diamonds[1]
diamonds[[1]]
diamonds[["cut"]]
Thursday, 11 November 2010
x <- sample(12)
# What's the difference between a & b?
a <- matrix(x, 4, 3)
b <- array(x, c(4, 3))
# What's the difference between x & y
y <- matrix(x, 12)
# How are these subsetting operations different?
a[, 1]
a[, 1, drop = FALSE]
a[1, ]
a[1, , drop = FALSE]
Thursday, 11 November 2010
1d names() length() c()
2d
colnames()
rownames()
ncol()
nrow()
cbind()
rbind()
nd dimnames() dim() abind()
(special package)
Thursday, 11 November 2010
b <- seq_len(10)
a <- letters[b]
# What sort of matrix does this create?
rbind(a, b)
cbind(a, b)
# Why would you want to use a data frame here?
# How would you create it?
Thursday, 11 November 2010
load(url("http://had.co.nz/stat405/data/quiz.rdata"))
# What is a? What is b?
# How are they different? How are they similar?
# How can you turn a in to b?
# How can you turn b in to a?
# What are c, d, and e?
# How are they different? How are they similar?
# How can you turn one into another?
# What is f?
# How can you extract the first element?
# How can you extract the first value in the first
# element?
Thursday, 11 November 2010
# a is numeric vector, containing the numbers 1 to 10
# b is a list of numeric scalars
# they contain the same values, but in a different format
identical(a[1], b[[1]])
identical(a, unlist(b))
identical(b, as.list(a))
# c is a named list
# d is a data.frame
# e is a numeric matrix
# From most to least general: c, d, e
identical(c, as.list(d))
identical(d, as.data.frame(c))
identical(e, data.matrix(d))
Thursday, 11 November 2010
# f is a list of matrices of different dimensions
f[[1]]
f[[1]][1, 2]
Thursday, 11 November 2010
# What does these subsetting operations do?
# Why do they work? (Remember to use str)
diamonds[1]
diamonds[[1]]
diamonds[["cut"]]
diamonds[["cut"]][1:10]
diamonds$cut[1:10]
Thursday, 11 November 2010
Vectors x[1:4] —
Matrices
Arrays
x[1:4, ]
x[, 2:3, ]
x[1:4, ,
drop = F]
Lists
x[[1]]
x$name
x[1]
Thursday, 11 November 2010
# What's the difference between a & b?
a <- matrix(x, 4, 3)
b <- array(x, c(4, 3))
# What's the difference between x & y
y <- matrix(x, 12)
# How are these subsetting operations different?
a[, 1]
a[, 1, drop = FALSE]
a[1, ]
a[1, , drop = FALSE]
Thursday, 11 November 2010
1d c()
2d
matrix()
data.frame()
t()
nd array() aperm()
Thursday, 11 November 2010
b <- seq_len(10)
a <- letters[b]
# What sort of matrix does this create?
rbind(a, b)
cbind(a, b)
# Why would you want to use a data frame here?
# How would you create it?
Thursday, 11 November 2010

23 data-structures

  • 1.
  • 2.
    Assessment • Final: thereis no final • Two groups still haven’t sent me electronic versions of their project • Many of you STILL HAVEN’T returned your team evaluations Thursday, 11 November 2010
  • 3.
    1. Basic datatypes 2. Vectors, matrices & arrays 3. Lists & data.frames Thursday, 11 November 2010
  • 4.
    Vector Matrix Array List Data frame 1d 2d nd Same typesDifferent types Thursday, 11 November 2010
  • 5.
    character numeric logical mode() length() A scalaris a vector of length 1 as.character(c(T, F)) as.character(seq_len(5)) as.logical(c(0, 1, 100)) as.logical(c("T", "F", "a")) as.numeric(c("A", "100")) as.numeric(c(T, F)) When vectors of different types occur in an expression, they will be automatically coerced to the same type: character > numeric > logical names() Optional, but useful Technically, these are all atomic vectors Thursday, 11 November 2010
  • 6.
    Your turn Experiment withautomatic coercion. What is happening in the following cases? 104 & 2 < 4 mean(diamonds$cut == "Good") c(T, F, T, T, "F") c(1, 2, 3, 4, F) Thursday, 11 November 2010
  • 7.
    Matrix (2d) Array (>2d) a<- seq_len(12) dim(a) <- c(1, 12) dim(a) <- c(4, 3) dim(a) <- c(2, 6) dim(a) <- c(3, 2, 2) 1 5 9 2 6 10 3 7 11 4 8 12 1 2 3 4 5 6 7 8 9 10 11 12 1 3 5 7 9 11 2 4 6 8 10 12 (1, 12) (4, 3) (2, 6)Just like a vector. Has mode() and length(). Create with matrix () or array(), or from a vector by setting dim() as.vector() converts back to a vector Thursday, 11 November 2010
  • 8.
    List Is also avector (so has mode, length and names), but is different in that it can store any other vector inside it (including lists). Use unlist() to convert to a vector. Use as.list() to convert a vector to a list. c(1, 2, c(3, 4)) list(1, 2, list(3, 4)) c("a", T, 1:3) list("a", T, 1:3) a <- list(1:3, 1:5) unlist(a) as.list(a) b <- list(1:3, "a", "b") unlist(b) Technically a recursive vector Thursday, 11 November 2010
  • 9.
    Data frame List ofvectors, each of the same length. (Cross between list and matrix) Different to matrix in that each column can have a different type Thursday, 11 November 2010
  • 10.
    # How doyou convert a matrix to a data frame? # How do you convert a data frame to a matrix? # What is different? # What does these subsetting operations do? # Why do they work? (Remember to use str) diamonds[1] diamonds[[1]] diamonds[["cut"]] Thursday, 11 November 2010
  • 11.
    x <- sample(12) #What's the difference between a & b? a <- matrix(x, 4, 3) b <- array(x, c(4, 3)) # What's the difference between x & y y <- matrix(x, 12) # How are these subsetting operations different? a[, 1] a[, 1, drop = FALSE] a[1, ] a[1, , drop = FALSE] Thursday, 11 November 2010
  • 12.
    1d names() length()c() 2d colnames() rownames() ncol() nrow() cbind() rbind() nd dimnames() dim() abind() (special package) Thursday, 11 November 2010
  • 13.
    b <- seq_len(10) a<- letters[b] # What sort of matrix does this create? rbind(a, b) cbind(a, b) # Why would you want to use a data frame here? # How would you create it? Thursday, 11 November 2010
  • 14.
    load(url("http://had.co.nz/stat405/data/quiz.rdata")) # What isa? What is b? # How are they different? How are they similar? # How can you turn a in to b? # How can you turn b in to a? # What are c, d, and e? # How are they different? How are they similar? # How can you turn one into another? # What is f? # How can you extract the first element? # How can you extract the first value in the first # element? Thursday, 11 November 2010
  • 15.
    # a isnumeric vector, containing the numbers 1 to 10 # b is a list of numeric scalars # they contain the same values, but in a different format identical(a[1], b[[1]]) identical(a, unlist(b)) identical(b, as.list(a)) # c is a named list # d is a data.frame # e is a numeric matrix # From most to least general: c, d, e identical(c, as.list(d)) identical(d, as.data.frame(c)) identical(e, data.matrix(d)) Thursday, 11 November 2010
  • 16.
    # f isa list of matrices of different dimensions f[[1]] f[[1]][1, 2] Thursday, 11 November 2010
  • 17.
    # What doesthese subsetting operations do? # Why do they work? (Remember to use str) diamonds[1] diamonds[[1]] diamonds[["cut"]] diamonds[["cut"]][1:10] diamonds$cut[1:10] Thursday, 11 November 2010
  • 18.
    Vectors x[1:4] — Matrices Arrays x[1:4,] x[, 2:3, ] x[1:4, , drop = F] Lists x[[1]] x$name x[1] Thursday, 11 November 2010
  • 19.
    # What's thedifference between a & b? a <- matrix(x, 4, 3) b <- array(x, c(4, 3)) # What's the difference between x & y y <- matrix(x, 12) # How are these subsetting operations different? a[, 1] a[, 1, drop = FALSE] a[1, ] a[1, , drop = FALSE] Thursday, 11 November 2010
  • 20.
    1d c() 2d matrix() data.frame() t() nd array()aperm() Thursday, 11 November 2010
  • 21.
    b <- seq_len(10) a<- letters[b] # What sort of matrix does this create? rbind(a, b) cbind(a, b) # Why would you want to use a data frame here? # How would you create it? Thursday, 11 November 2010