0% found this document useful (0 votes)
5 views7 pages

Biostat S1 Handout

This document serves as a handout for a biostatistics session, covering fundamental data types in R such as numeric, integer, character, logical, and complex, along with their examples. It also explains various data structures including vectors, factors, lists, matrices, and data frames, detailing how to create and manipulate them. Additionally, the document discusses generating random numbers and the use of logical operators in R.

Uploaded by

serenabattesti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Biostat S1 Handout

This document serves as a handout for a biostatistics session, covering fundamental data types in R such as numeric, integer, character, logical, and complex, along with their examples. It also explains various data structures including vectors, factors, lists, matrices, and data frames, detailing how to create and manipulate them. Additionally, the document discusses generating random numbers and the use of logical operators in R.

Uploaded by

serenabattesti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Biostatistic : Session 1 Handout

Variables, Vectors and factors. Random numbers


Matrix, Data Frame, List…

The primary data types in R:

Numeric
- Description: Represents numbers, which can be integers or real numbers (decimals).
- Examples: `42`, `3.14`, `-100`
- Note: By default, numbers in R are treated as double precision (real numbers). To explicitly
define an integer, you can use the `L` suffix (e.g., `42L`).

Integer
- Description: Represents whole numbers. Although R automatically treats numbers as
numeric (double), you can force a number to be an integer.
- Examples: `42L`, `-10L`
- Note: Use `L` after a number to indicate it's an integer. Otherwise, R treats it as numeric.

Character
- Description: Represents text strings.
- Examples: `"Hello"`, `"R language"`, `'A'`
- Note: Character data is always enclosed in either double (`" "`) or single (`' '`) quotes.

Logical
- Description: Represents Boolean values, used to indicate `TRUE` or `FALSE`.
- Examples: `TRUE`, `FALSE`
- Note: Logical values are typically used in conditional statements and are case-sensitive
(`True` and `true` are not the same as `TRUE`).

Complex
- Description: Represents complex numbers with real and imaginary parts.
- Examples: `3 + 4i`, `5i`
- Note: The imaginary part is denoted by `i`. Complex numbers are less commonly used in
everyday data analysis.

Examples
x <- 42 # numeric
y <- 42L # integer
z <- "Hello" # character
flag <- TRUE # logical
cnum <- 3 + 4i # complex
Data Structures:
• Vectors : Ordered collections of elements of the same type (e.g., c(1, 2, 3) for numeric).
• Factors : Categorical variables.
• Lists: Collections of elements of different types.
• Matrices : 2D arrays with elements of the same type.
• Data Frames : Table-like structures where each column can hold different types of data.

Vectors
When you digit a < −3 in the console you create an object containing one element only.
Suppose you want to create an object containing more than one element. For example,
suppose you want to create a vector containing a column (or row) of elements. Imagine this
vector v = [2, 6, 1, 3, 11]. You can create this:

##The command below creates a vector (a column of numbers)


v<-c(2,6,1,3,11)
v

# If you want to create a row containing the same numbers


# You can transpose the vector v using the command t().
myrow<-t(v)
myrow

Note that the command t() allows transposing a vector or a matrix (this will be discussed
below). The vector v is identified using [1] followed by the numbers.

Another way to create a vector is to create a sequence of numbers (in a column) as follows:

##The command below creates a vector (a column of numbers)


seq(from=-4, to=5,by=1.5)

## if you do not specify by= it is assumed by 1. as shown here


seq(from=-4, to=5)

## If you want a sequence with a specific length then you must do:
seq(4, 5, length.out = 5)

Another way to create a vector is to create a repetition of numbers (in a column) as follows:

##The command below creates a vector (a column of numbers) repeating 5 times the
number 2.
rep(2,5)
rep(3,4)
Factors
Factors are tricky vectors; they have double faces: behind the labels for the categorical
variable, there are numbers (see storing needs and use in models).
gender <- c("male", "female", "male", "male", "male", "female")
genderf <- factor(gender)
gender

They have their ordered version, when the values have a specific order.
edu <- c("h", "h", "h", "u", "u", "p", "p", "u", "h", "p")
eduord <- ordered(edu, levels=c("h", "u", "p"))
eduord

The same with labels:


eduord <- ordered(edu, levels=c("h", "u", "p"), labels = c("high school", "undergraduate",
"postgraduate"))
eduord

Random numbers
You can read here about randomness:
https://www.techtarget.com/whatis/definition/random-numbers
And a video about pseudo random number generators:
https://www.youtube.com/watch?v=GtOt7EBNEwQ
In R, you can easily generate pseudo random numbers of different distributions, using the
relevant distribution:
The functions for the density/mass function, cumulative distribution function, quantile
function and random variate generation are named in the form dxxx, pxxx, qxxx and rxxx
respectively.

• For the binomial (including Bernoulli) distribution see dbinom.


• For the Cauchy distribution see dcauchy.
• For the chi-squared distribution see dchisq.
• For the exponential distribution see dexp.
• For the geometric distribution see dgeom.
• For the normal distribution see dnorm.
• For the Poisson distribution see dpois.
• For the Student's t distribution see dt.
• For the uniform distribution see dunif.
In the code, pay attention to the characteristics, and the parameters of the distribution. They
change from distribution to distribution.
Let’s create one random number of a standard normal distribution.
rnorm(n=1, mean = 0, sd = 1)

or simply

rnorm(1)

Let’s create 31 random numbers of a normal distribution with a mean of 14 and a standard
deviation of 2.
rnorm(n=31, mean = 14, sd = 2)

And now, let’s see a random number from a uniform distribution between 1 and 6:
runif(1, min = 1, max = 6)

If you want integer numbers, use the round function:


round(runif(1, min = 1, max = 6), 0)

How to use a seed for random numbers? set.seed(number). Try the following :
set.seed(23)
rnorm(5)

Matrices
Finally, a vector can be considered as a column of a row of a matrix. The latter can be
created in R as follows:
##The command below creates a 3 by 3 matrix repeating 4 across each element of the
matrix.
matrix(4,3,3)
##The command below creates a 2 by 2 matrix with the elements of a vector.
matrix(c(11,5,9,2),2,2)
##The command below creates a vector matrix with the elements of a vector.
matrix(c(1,2,3,4),4,1)
##The command below creates a vector matrix with the elements of a vector.
matrix(c(1,2,3,4),1,4)

You probably noted that, for example [3,], identifies the third row while [,2] identifies the
second column. More generally, don’t forget the following commands that are needed to
extract data from a matrix or a vector.

x[n]: the nth element of a vector


x[m:n]: the mth to nth element
x[c(k,m,n)] : specific elements
x[x>m & x<n] : elements between m and n
[i,j] : element at ith row and jth column
[i,] : row i in a matrix
[,j] : column j in a matrix
x[-c(3,5)] : x excluding the third and fifth elements

v1<-c(21,5,2,15)
v2<-c(-3,-6,1,-7)
v3<-c(102,10,-13,4)
x<-matrix(cbind(v1,v2,v3),4,3)
x
x[2] #the nth element of a vector
x[4:7] # the mth to nth element
x[c(2,9,4)]# specific elements
x[x>0&x<12]# : elements between m and n
x[[11]]# : idem
x[3,2] # : element at ith row and jth column
x[2,]# : row i in a matrix
x[3:4,1:2] # : submatrix
v3[-c(1,2,4)] #excluding elements

If you want to name the columns (or the rows) of a matrix you can do (but in this case a data
frame can be better):

colnames(x)<-c("A","B","C")
rownames(x)<-c("F","H","P","Z")
x

You cannot add/subtract/divide vectors of different dimensions. You can multiply vectors
with different dimensions by using the symbol %*%, it is the matrix multiplication in math.

v1<-c(3,2,6)
v2<-c(-2,-1)
v1%*%t(v2)

For matrices you can add/subtract/multiply/divide vectors with the same dimensions. You
cannot add/subtract/divide matrices of different dimensions. You can multiply matrices with
different dimensions such as m1%*%m2 ONLY IF the number of columns of m1 corresponds
to the number of rows of m2:

m1<-matrix(rnorm(6),2,3)
m2<-matrix(rnorm(9),3,3)
m1%*%m2

For matrices the concept of division is linked to the concept of inverse matrix. The matrix
inverse is calculated with this solve(m). You can do m1%*%solve(m2) ONLY IF m2 is a square
matrix number of rows = number of columns.
set.seed(123)
m1<-matrix(rnorm(6),1,2)
m2<-matrix(rnorm(4),2,2)
m1%*%solve(m2)

Here note that m1%*%solve(m2) is something like m1/m2 but this is a matrix algebra concept
that, for the purpose of this course, we do not need to develop more.

Data frames
A data frame is kind of similar to a matrix where the columns have a name, however, columns
may have different data types. This is the way you create it:

mydatafr<-data.frame(cbind(c(2,1,5),c(22,-4,11)))
colnames(mydatafr)<-c("Col1","Col2")
mydatafr

You can add (or remove) columns like this:

mydatafr$Col3<-c("Mark","David","Erika")
mydatafr

If you simply put vectors to data.frame(), they will keep their names and data type. In this
case, you don’t need cbind(). However, it does not work well with vectors of different
length.

Lists
Suppose that you have several objects of different types, and you want to collect them in a
list (one object). You can do that using the function list as follows:
a<-3
b<-c(2,6)
c<-matrix(rnorm(4),2,2)
d<-rep(4,3)
mylist<-list(a,b,c,d)
mylist

This is the second element: mylist[2]

If you want to remove some elements of a list, you can do this: mylist[-c(2,4)]

#You can also eliminate an element using this:


print("This is another example")
## [1] "This is another example"
mylist[4]<-NULL
mylist
Logical operators:

& and
| or
== is equal
!= is not equal

Try the following ones (first in your mind, and test with the computer):
!TRUE

!(5 > 3)

!!FALSE

x=5
y=7
!(!(x < 4) & !!!(y > 12))

When applied on vectors, & and | will do it element-by-element, and the result is a vector.
However, the && and || operators evaluate only the first element of each logical vector
(recommended for scalars).

You might also like