0% found this document useful (0 votes)
128 views37 pages

2.R Concepts - BDSM - Oct2020 PDF

R is a scripting language and environment for statistical computing and graphics. It is a successor to S and is widely used for data science and analytics. R allows users to analyze data, create visualizations and perform advanced statistical analysis through the use of various packages. Key data structures in R include vectors, matrices, arrays, data frames and lists. Functions like read.csv() and write.csv() allow users to import and export data in CSV format.

Uploaded by

rakesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views37 pages

2.R Concepts - BDSM - Oct2020 PDF

R is a scripting language and environment for statistical computing and graphics. It is a successor to S and is widely used for data science and analytics. R allows users to analyze data, create visualizations and perform advanced statistical analysis through the use of various packages. Key data structures in R include vectors, matrices, arrays, data frames and lists. Functions like read.csv() and write.csv() allow users to import and export data in CSV format.

Uploaded by

rakesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

An Introduction to R

Sandip Mukhopadhyay
What is R?

R is a scripting/programming language and


environment for statistical computing, data
science and graphics.
R is a successor of the proprietary statistical
computing programming language S.
It is an important tool for computational
statistics, visualization and data science.
WHAT IS R?
 GNU Project Developed by John Chambers @ Bell Lab
 Free software environment for statistical computing and graphics
 Functional programming language written primarily in C, Fortran
HISTORY AND EVOLUTION OF R
R has developed from the S language

S Version 1

S Version 2

S Version 3

S Version 4
Developed 30 years ago for research
applied to the high-tech industry
HISTORY AND EVOLUTION OF R
The regular development of R

1990’s: R developed
concurrently with S
1993: R made public

Acceleration of R development
 R-Help and R-Devl mailing-lists
 Creation of the R Core Group
HISTORY AND EVOLUTION OF R

Growing number of packages

2001: ~100 packages

Today: Over 10152 packages

2000: R version 1.0.1


Today: R version 3.6.1

Source: R Journal Vol 1/2


Reasons to learn R

• Free, Open source

• Preferred option in academia and research

• Great visualization

• Advanced statistics
Reasons to learn R

• Supportive open source community

• Easy extensibility via packages

• Many find it easier to learn compared to


Python
Limitations of R

• Lack of scalability

• Less acceptance in Industrial application


compared to its peer Python

• Application of R is limited to data-science,


while Python has wider usage

• Python is easier to deploy in commercial


setting
R Studio
R studio is a widely used IDE for writing, testing and executing R
codes. There are various parts in a typical screen of R studio IDE.
These are:
Console see the output
Syntax editor when we can write the code
Workspace tab where users can see active objects from the code written in
the console
History tab that shows a history of commands used in the code
File tab where folders and files can be seen in the default workspace
Plot tab shows graphs
Packages tab shows add-ons and packages required for running specific
process(s)
Help tab contains the information on IDE, commands, etc.
Syntax editor History

Console Help / Viewer


Packages in R

A package in R is the fundamental unit of


shareable code. It is a collection of the
following:
• Functions
• Data sets
• Compiled code
• Documentation for the package and for the functions
inside

Packages which are not part of core R need to be installed


This package also need to be loaded before every session.
library(“ggplot2”)
Few commands to get started

packageDescription(“ggplot2”)
help(package = “ggplot2”)
find.package(“ggplot2”)
install.packages(“ggplot2”)
Some basics about R coding

• R statements or commands can be separated by a semicolon (;) or a


new line.
• The assignment operator in R is "<-" (Although "=" also works)

• All characters after # are treated as comments.

• Single inverted comma ‘ ’ and double inverted comma “ ” work


similarly

• First bracket ( ) and third bracket [ ] work very similarly. Hardly there
is any use of second bracket { }.
Functions and Help in R

• There are over 1,000 functions at the core of R, and new R functions
are created all the time.
• Each R function comes with its own help page. To access a function’s
help page, type a question mark followed by the function’s name in
the console.
Reference materials / other R resources

1. R-blogs : https://www.r-bloggers.com
2. R tutorials :
https://www.programiz.com/r-
programming/
3. R Video book : https://www.r-
bloggers.com/in-depth-introduction-to-
machine-learning-in-15-hours-of-expert-
videos/
4. Stackoverflow
5. R pubs
Operators in R
Commonly used function in R
Commonly used function in R
Summary : what we have learnt

• Four types of operators in R are arithmetic,


relational, logical, and assignment.
• Two types of conditional statements in R are
if…else and nested if…else.
• Three types of loops in R are for loop, while
loop, and repeat loop.
• The commonly used functions in R
Types of Data Structure in R

• Scalars – single numbers; also called zero dimensional vector


• Vectors – a row of numbers; also called one dimensional array
• Matrices - These are two-dimensional data structures
• Arrays - Similar to matrices; these can have more than two
dimensions.
• Data frames - These are the most commonly used data structures in
R. A data frame is similar to a general matrix, but it can contain
different modes of data, such as a number and character.
• Lists - These are the most complex data structures. A list may contain
a combination of vectors, matrices, data frames, and even other lists.
Types of Data Structure in R : Scalar

Scalars – single numbers; also called zero dimensional vector

Example:
f <- 3 # numeric
f
g <- "US" # text
g
h <- TRUE # logical
h
Types of Data Structure in R : Vector

Vectors – a row of numbers; also called one dimensional array.


One dimensional
Example:
a <- c(1, 2, 5, 3, 6, -2, 4)
a
b <- c("one", "two", "three")
b
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
c
Vectors
• Vectors are stored like arrays in C
• Vector indices begin at 1
• All Vector elements must have the same mode such as integer,
numeric (floating point number), character (string), logical (Boolean),
complex, object etc.

Create a vector of numbers

The c function (c is short for combine) creates a new vector consisting of three
values: 4, 7, and 8.
Vectors
A vector cannot hold values of different data types.
Consider the example below. We are trying to place
integer, string and boolean values together in a
vector.

Note: All the values are converted to the same data


type, i.e. “character”.
Vectors

Accessing the value (s) in the vector


Create a variable by the name, “VariableSeq” and assign to it a vector
consisting of string values.

• Access values in a vector, specify the indices at which the value is


present in the vector. Indices start at 1.
Types of Data Structure in R : Matrices

Matrices - These are two-dimensional data structures

Example:
vector <- c(1,2,3,4)
f <- matrix(vector, nrow=2, ncol=2)
f
[,1] [,2]
[1,] 1 3
[2,] 2 4
Matrices
To access the 2nd column of the matrix, simply provide the column number and
omit the row number.

To access the 2nd and 3rd columns of the matrix, simply provide the column
numbers and omit the row number.
Types of Data Structure in R : Arrays

Arrays - Similar to matrices; these can have more than two dimensions.

a <- matrix(c(1,1,1,1) , 2, 2)
b <- matrix(c(2,2,2,2) , 2, 2)
x <- array(c(a,b), c(2,2,2))
Types of Data Structure in R : Data frames

Data frames - These are the most commonly used data structures in R.
A data frame is similar to a general matrix, but it can contain different
modes of data, such as a number and character.

name <- c( “Ram” , “Laxman” , “Sita”, “Urmila” )


gender <- c(“M”, “M”, “F”, “F”)
age <- c(27,26,25, 24)
df <- data.frame(name, gender, age)
df
Data Frames

Think of a data frame as something akin to a database table or an Excel


spreadsheet.

Create a data frame


• First create three vectors, “EmpNo”, “EmpName” and “ProjName”

• Then create a data frame, “Employee”


Types of Data Structure in R : Lists

Lists - These are the most complex data structures. A list may contain a
combination of vectors, matrices, data frames, and even other lists.

Example:
vec <- c(1,2,3,4)
mat <- matrix(vec,2,2)
x <- list (vec, mat)
Data Frame Access

There are two ways to access the content of data frames:


• By providing the index number in square brackets.
Example:

• By providing the column name as a string in double


brackets.
Example:
Few R functions for understanding data in data frames

• dim()
dim()function is used to obtain dimensions of a data frame.

• nrow()
nrow() function returns number of rows in a data frame.
• ncol()
ncol() function returns number of columns in a data frame.

• str()
str() function compactly displays the internal structure of R objects.

summary()
use the summary() function to return result summaries for each column of the
dataset.
Few R functions for understanding data in data frames
• head()
head()function is used to obtain the first n observations where n is set as 6 by
default.

• tail()
tail()function is used to obtain the last n observations where n is set as 6 by
default.

• edit()
• The edit() function will invoke the text editor on the R object.
Import and export of data in R

• Importing data from .csv file


• Two very important functions
• read.csv ()– it reads a .csv file from a specified file
• write.csv () – it creates a .csv file in the working directory
• read.csv () is a special case of read.table ()
• write.csv () is a special case of write.table()

Reading Spreadsheets
read.xlsx(“filename”,…)
where, filename argument defines the path of the file to be read; the
dots “…” define the other optional arguments.
Working with directory

getwd()
getwd() command returns the absolute filepath of the current working
directory.

setwd()
setwd() command resets the current working directory to another
location as per users’ preference.

dir()
This function returns a character vector of the names of files or
directories in the named directory.

version to view the version of the paper

You might also like