Biostat S1 Handout
Biostat S1 Handout
Numeric
- Description: Represents numbers, which can be integers or real numbers (decimals).
- Examples: `42`, `3.14`, `-100`
- Note: By default, numbers in R are treated as double precision (real numbers). To explicitly
define an integer, you can use the `L` suffix (e.g., `42L`).
Integer
- Description: Represents whole numbers. Although R automatically treats numbers as
numeric (double), you can force a number to be an integer.
- Examples: `42L`, `-10L`
- Note: Use `L` after a number to indicate it's an integer. Otherwise, R treats it as numeric.
Character
- Description: Represents text strings.
- Examples: `"Hello"`, `"R language"`, `'A'`
- Note: Character data is always enclosed in either double (`" "`) or single (`' '`) quotes.
Logical
- Description: Represents Boolean values, used to indicate `TRUE` or `FALSE`.
- Examples: `TRUE`, `FALSE`
- Note: Logical values are typically used in conditional statements and are case-sensitive
(`True` and `true` are not the same as `TRUE`).
Complex
- Description: Represents complex numbers with real and imaginary parts.
- Examples: `3 + 4i`, `5i`
- Note: The imaginary part is denoted by `i`. Complex numbers are less commonly used in
everyday data analysis.
Examples
x <- 42 # numeric
y <- 42L # integer
z <- "Hello" # character
flag <- TRUE # logical
cnum <- 3 + 4i # complex
Data Structures:
• Vectors : Ordered collections of elements of the same type (e.g., c(1, 2, 3) for numeric).
• Factors : Categorical variables.
• Lists: Collections of elements of different types.
• Matrices : 2D arrays with elements of the same type.
• Data Frames : Table-like structures where each column can hold different types of data.
Vectors
When you digit a < −3 in the console you create an object containing one element only.
Suppose you want to create an object containing more than one element. For example,
suppose you want to create a vector containing a column (or row) of elements. Imagine this
vector v = [2, 6, 1, 3, 11]. You can create this:
Note that the command t() allows transposing a vector or a matrix (this will be discussed
below). The vector v is identified using [1] followed by the numbers.
Another way to create a vector is to create a sequence of numbers (in a column) as follows:
## If you want a sequence with a specific length then you must do:
seq(4, 5, length.out = 5)
Another way to create a vector is to create a repetition of numbers (in a column) as follows:
##The command below creates a vector (a column of numbers) repeating 5 times the
number 2.
rep(2,5)
rep(3,4)
Factors
Factors are tricky vectors; they have double faces: behind the labels for the categorical
variable, there are numbers (see storing needs and use in models).
gender <- c("male", "female", "male", "male", "male", "female")
genderf <- factor(gender)
gender
They have their ordered version, when the values have a specific order.
edu <- c("h", "h", "h", "u", "u", "p", "p", "u", "h", "p")
eduord <- ordered(edu, levels=c("h", "u", "p"))
eduord
Random numbers
You can read here about randomness:
https://www.techtarget.com/whatis/definition/random-numbers
And a video about pseudo random number generators:
https://www.youtube.com/watch?v=GtOt7EBNEwQ
In R, you can easily generate pseudo random numbers of different distributions, using the
relevant distribution:
The functions for the density/mass function, cumulative distribution function, quantile
function and random variate generation are named in the form dxxx, pxxx, qxxx and rxxx
respectively.
or simply
rnorm(1)
Let’s create 31 random numbers of a normal distribution with a mean of 14 and a standard
deviation of 2.
rnorm(n=31, mean = 14, sd = 2)
And now, let’s see a random number from a uniform distribution between 1 and 6:
runif(1, min = 1, max = 6)
How to use a seed for random numbers? set.seed(number). Try the following :
set.seed(23)
rnorm(5)
Matrices
Finally, a vector can be considered as a column of a row of a matrix. The latter can be
created in R as follows:
##The command below creates a 3 by 3 matrix repeating 4 across each element of the
matrix.
matrix(4,3,3)
##The command below creates a 2 by 2 matrix with the elements of a vector.
matrix(c(11,5,9,2),2,2)
##The command below creates a vector matrix with the elements of a vector.
matrix(c(1,2,3,4),4,1)
##The command below creates a vector matrix with the elements of a vector.
matrix(c(1,2,3,4),1,4)
You probably noted that, for example [3,], identifies the third row while [,2] identifies the
second column. More generally, don’t forget the following commands that are needed to
extract data from a matrix or a vector.
v1<-c(21,5,2,15)
v2<-c(-3,-6,1,-7)
v3<-c(102,10,-13,4)
x<-matrix(cbind(v1,v2,v3),4,3)
x
x[2] #the nth element of a vector
x[4:7] # the mth to nth element
x[c(2,9,4)]# specific elements
x[x>0&x<12]# : elements between m and n
x[[11]]# : idem
x[3,2] # : element at ith row and jth column
x[2,]# : row i in a matrix
x[3:4,1:2] # : submatrix
v3[-c(1,2,4)] #excluding elements
If you want to name the columns (or the rows) of a matrix you can do (but in this case a data
frame can be better):
colnames(x)<-c("A","B","C")
rownames(x)<-c("F","H","P","Z")
x
You cannot add/subtract/divide vectors of different dimensions. You can multiply vectors
with different dimensions by using the symbol %*%, it is the matrix multiplication in math.
v1<-c(3,2,6)
v2<-c(-2,-1)
v1%*%t(v2)
For matrices you can add/subtract/multiply/divide vectors with the same dimensions. You
cannot add/subtract/divide matrices of different dimensions. You can multiply matrices with
different dimensions such as m1%*%m2 ONLY IF the number of columns of m1 corresponds
to the number of rows of m2:
m1<-matrix(rnorm(6),2,3)
m2<-matrix(rnorm(9),3,3)
m1%*%m2
For matrices the concept of division is linked to the concept of inverse matrix. The matrix
inverse is calculated with this solve(m). You can do m1%*%solve(m2) ONLY IF m2 is a square
matrix number of rows = number of columns.
set.seed(123)
m1<-matrix(rnorm(6),1,2)
m2<-matrix(rnorm(4),2,2)
m1%*%solve(m2)
Here note that m1%*%solve(m2) is something like m1/m2 but this is a matrix algebra concept
that, for the purpose of this course, we do not need to develop more.
Data frames
A data frame is kind of similar to a matrix where the columns have a name, however, columns
may have different data types. This is the way you create it:
mydatafr<-data.frame(cbind(c(2,1,5),c(22,-4,11)))
colnames(mydatafr)<-c("Col1","Col2")
mydatafr
mydatafr$Col3<-c("Mark","David","Erika")
mydatafr
If you simply put vectors to data.frame(), they will keep their names and data type. In this
case, you don’t need cbind(). However, it does not work well with vectors of different
length.
Lists
Suppose that you have several objects of different types, and you want to collect them in a
list (one object). You can do that using the function list as follows:
a<-3
b<-c(2,6)
c<-matrix(rnorm(4),2,2)
d<-rep(4,3)
mylist<-list(a,b,c,d)
mylist
If you want to remove some elements of a list, you can do this: mylist[-c(2,4)]
& and
| or
== is equal
!= is not equal
Try the following ones (first in your mind, and test with the computer):
!TRUE
!(5 > 3)
!!FALSE
x=5
y=7
!(!(x < 4) & !!!(y > 12))
When applied on vectors, & and | will do it element-by-element, and the result is a vector.
However, the && and || operators evaluate only the first element of each logical vector
(recommended for scalars).