0% found this document useful (0 votes)
12 views

1.R Unit 1

This document provides an introduction to R and discusses its importance, features, advantages and disadvantages. It covers how to download and install RStudio, an overview of the R interface and operating environment, and how to work with objects and the R workspace. Key functions introduced include ls(), rm(), plot() and help().
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

1.R Unit 1

This document provides an introduction to R and discusses its importance, features, advantages and disadvantages. It covers how to download and install RStudio, an overview of the R interface and operating environment, and how to work with objects and the R workspace. Key functions introduced include ls(), rm(), plot() and help().
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

19EDS332

Data Visualization and Exploration


with R
Module I
 Introduction: Importance of R and R
Studio (IDE).
 R Language Constructs: Variables, Data
types, Arithmetic and Boolean
operators.
 R data structures : Introduction to Data
Structure in R, Vectors Lists, Data
Frames, Matrices, Arrays,
Introduction
• R is a programming language and software
environment for statistical analysis, graphics
representation and reporting.

• R: initially written by Ross Ihaka and Robert Gentleman at


Dep. of Statistics of U of Auckland, New Zealand during
1990s.

• The core of R is an interpreted computer language which


allows branching and looping as well as modular
programming using functions.
Why R?
 It's free!
 R allows integration with the procedures written in the C,
C++, .Net, Python or FORTRAN languages for efficiency.

 It runs on a variety of platforms including Windows, Unix


and MacOS.
 It provides an unparalleled platform for programming new
statistical methods in an easy and straightforward manner.

 It contains advanced statistical routines not yet available in


other packages.
 It has state-of-the-art graphics capabilities.
What is it?
R is an interpreted computer language.
R is used for data manipulation, statistics, and

graphics. It is made up of:


– operators (+ - <- * %*% …) for calculations

on arrays & matrices


– large, coherent, integrated collection of functions

– facilities for making unlimited types of publication

quality graphics
– user written functions & sets of functions

(packages); 800+ contributed packages so far &


growing
Features of R

R is an integrated suite of software for data


manipulation, calculation, and graphical display
• Effective data handling
• Various operators for calculations on arrays/matrices
• Graphical facilities for data analysis
• Well-developed language including conditionals, loops,
recursive functions and I/O capabilities.
R applications
 R used for projects with banks, political campaigns, tech
startups, food startups, international development and aid
organizations, hospitals and real estate developers.
 online advertising, insurance, ecology, genetics and
pharmaceuticals.

 R is used by statisticians with advanced machine learning


training and by programmers familiar with other languages.
R
Advantages
o Fast and free.
o State of the art: Statistical researchers
provide their methods as R packages.
SPSS and SAS are years behind R!
o 2nd only to MATLAB for graphics.
o Mx, WinBugs, and other programs use
or will use R.
o Active user community
o Excellent for simulation, programming,
computer intensive analyses, etc.
o Forces you to think about your analysis.
o Interfaces with database storage
software (SQL)
R
Advantages Disadvantages
o Fast and free. o Not user friendly @ start - steep
o State of the art: Statistical researchers learning curve, minimal GUI.
provide their methods as R packages. o No commercial support; figuring out
SPSS and SAS are years behind R! correct methods or how to use a
o 2nd only to MATLAB for graphics. function on your own can be frustrating.
o Mx, WinBugs, and other programs use o Easy to make mistakes and not know.
or will use R. o Working with large datasets is limited
o Active user community by RAM
o Excellent for simulation, programming, o Data prep & cleaning can be messier &
computer intensive analyses, etc. more mistake prone in R vs. SPSS or
SAS
o Forces you to think about your analysis.
o Some users complain about hostility on
o Interfaces with database storage
the R listserver
software (SQL)
How to download?
 Google it using R or CRAN
(Comprehensive R Archive Network)
 https://posit.co/download/rstudio-desktop/
R Overview
You can enter commands one at a time at the command
prompt (>) or run a set of commands from a source
file.
There is a wide variety of data types, including vectors
(numerical, character, logical), matrices, data frames,
and lists.
To quit R, use
>q()
R paradigm is different
Rather than setting up a complete analysis at once, the
process is highly interactive.
You run a command (say fit a model), take the results and
process it through another command (say a set of
diagnostic plots), take those results and process it through
another command (say cross-validation), etc.

The cycle may include transforming the data, and looping


back through the whole process again.
You stop when you feel that you have fully analyzed the
data.
R Overview
Most functionality is provided through built-in and user-
created functions and all data objects are kept in
memory during an interactive session.
Basic functions are available by default. Other functions
are contained in packages that can be attached to a
current session as needed
R Interface
Start the R system, the main window
(RGui) with a sub window (R Console)
will appear
In the `Console' window the cursor is
waiting for you to type in some R
commands.
R Operating Environment
RStudio

An Integrated Development Environment (IDE) for R


A gift, from J.J. Allaire (Macalester College, ‘91) to the world

An easy (easier) way to use R

Available as a desktop product or, as used at OC, run


off of a file server.
Free – unless you want the newest version, with more
bells and whistles, and you are not eligible for the
educational discount (= free)

R supports rpubs – see http://rpubs.com/jawitmer


RStudio
R Introduction
 Results of calculations can be stored in objects using
the assignment operators:


An arrow (<-) formed by a smaller than character
and a hyphen without a space!

 The equal character (=).


R Introduction
 These objects can then be used in other calculations.
 To print the object just enter the name of the object.
 There are some restrictions when giving an object a
name:
 Object names cannot contain `strange' symbols like !, +, -,
#.
 A dot (.) and an underscore (_) are allowed, also a name
starting with a dot.
 Object names can contain a number but cannot start with
a number.
 R is case sensitive, X and x are two different objects, as
well as temp and temP.
An example
> # An example
> x <- c(1:10)
> x[(x>8) | (x<5)]
> # yields 1 2 3 4 9 10
> # How it works
> x <- c(1:10)
>x
>1 2 3 4 5 6 7 8 9 10
>x>8
>FFFFFFFFTT
>x<5
>TTTTFFFFFF
>x>8|x<5
>TTTTFFFFTT
> x[c(T,T,T,T,F,F,F,F,T,T)]
> 1 2 3 4 9 10
R Introduction
 To list the objects that you have in your current R session use
the function ls or the function objects.
> ls()
[1] "x" "y"

 To list all objects starting with the letter x:


> x2 = 9
> y2 = 10
> ls(pattern="x")
[1] "x" "x2"
R Introduction
 If you assign a value to an object that already exists then the
contents of the object will be overwritten with the new value
(without a warning!).

 Use the function rm to remove one or more objects from your


session.
> rm(x, x2)

 Lets create two small vectors with data and a scatterplot.


z2 <- c(1,2,3,4,5,6)
z3 <- c(6,8,3,5,7,1)
plot(z2,z3)
title("My first scatterplot")
R Warning !
R is a case sensitive
language.
FOO, Foo, and foo are
three different objects
R Introduction
> x = sin(9)/75
> y = log(x) + x^2
>x
[1] 0.005494913
>y
[1] -5.203902
> m <- matrix(c(1,2,4,1), ncol=2)
>m
> [,1] [,2]
[1,] 1 4
[2,] 2 1
> solve(m)
[,1] [,2]
[1,] -0.1428571 0.5714286
[2,] 0.2857143 -0.1428571
R Workspace
 Objects that you create during an R session are hold in
memory, the collection of objects that you currently
have is called the workspace.

 This workspace is not saved on disk unless you tell R to


do so.
 This means that your objects are lost when you close R
and not save the objects, or worse when R or your
system crashes on you during a session.
R Workspace
 When you close the RGui or the R console
window, the system will ask if you want to save
the workspace image.
 If you select to save the workspace image then
all the objects in your current R session are
saved in a file .RData.
 This is a binary file located in the working
directory of R, which is by default the installation
directory of R.
R Workspace
 During your R session you can also explicitly save
the workspace image.
 Go to the `File‘ menu and then select `Save
Workspace...', or use the save.image function.
## save to the current working directory
save.image()
## just checking what the current working directory is
getwd()
## save to a specific file and location
save.image("C:\\Program Files\\R\\R-2.5.0\\bin\\.RData")

 You can also explicitly load a saved workspace, that


could be the workspace image of someone else. Go
the `File' menu and select `Load workspace...'.
R Workspace
getwd() # print the current working directory

ls() # list the objects in the current workspace

setwd(mydirectory) # change to mydirectory

setwd("c:/docs/mydir")
R Help
Once R is installed, there is a comprehensive
built-in help system. At the program's
command prompt you can use any of the
following:
help.start() # general help
help(foo) # help about function foo
?foo # same thing
apropos("foo") # list all function containing string foo
example(foo) # show an example of function foo
R Datasets
R comes with a number of sample datasets
that you can experiment with. Type
> data( )
to see the available datasets. The results
will depend on which packages you have
loaded. Type
help(datasetname)
for details on a sample dataset.
R Packages
 One of the strengths of R is that the system can easily be
extended.

 The system allows you to write new functions and package


those functions in a so called `R package' (or `R library’).

 There is a lively R user community and many R packages


have been written and made available on CRAN for other
users.

 Just a few examples, there are packages for portfolio


optimization, drawing maps, exporting objects to html, time
series analysis, spatial statistics and the list goes on and on.
R Packages
 When you download R, already a number (around 30) of
packages are downloaded as well.

 You can use the function search to see a list of packages


that are currently attached to the system, this list is also
called the search path.

> search()
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:datasets" "package:utils"
[7] "package:methods" "Autoloads" "package:base"
R Packages
 The function library used to list all the available libraries on
your system with a short description.
 Run the function without any arguments
> library()
Packages in library 'C:/PROGRA~1/R/R-25~1.0/library':
base The R Base Package
Boot Bootstrap R (S-Plus) Functions (Canty)
class Functions for Classification
cluster Cluster Analysis Extended Rousseeuw et al.
codetools Code Analysis Tools for R
datasets The R Datasets Package
DBI R Database Interface
foreign Read Data Stored by Minitab, S, SAS,
SPSS, Stata, Systat, dBase, ...
graphics The R Graphics Package
Getting help
 help(function_name)
 help(prcomp)
 ?function_name
 ?prcomp
 help.search(“topic”)
 ??topic or ??“topic”
 Search CRAN
 http://www.r-project.org
 From R GUI: Help  Search help…
 CRAN Task Views (for individual packages)
 http://cran.cnr.berkeley.edu/web/views/
R Useful Commands
• Commands can be expressions or assignments
• Separate by semicolon or new line
• Can split across multiple lines
• R will change prompt to + if command not finished
• Useful commands for variables
• ls(): List all stored variables
• rm(x): Delete one or more variables
• class(x): Describe what type of data a variable stores
• save(x,file=“filename”): Store variable(s) to a binary file
• load(“filename”): Load all variables from a binary file
• Save/load in current directory or My Documents by default
Variables in R Programming

 A variable is a name given to a memory location,


which is used to store values in a computer
program.
 Variables in R programming can be used to store
numbers (real and complex), words, matrices, and
even tables.
 R is a dynamically programmed language which
means that unlike other programming languages, we
do not have to declare the data type of a variable
before we can use it in our program.
Variables and assignment
• Use variables to store values
• Three ways to assign variables
•a=6
• a <- 6
• 6 -> a
• Update variables by using the current value in an assignment
•x=x+1
• Naming rules
• Can include letters, numbers, ., and _
• Names are case sensitive
• Must start with . or a letter
Variables
numeric
> a = 49
> sqrt(a)
[1] 7

> a = "The dog ate my homework" character


> sub("dog","cat",a) string
[1] "The cat ate my homework“

> a = (1+1==3)
>a logical
[1] FALSE
Basic usage: arithmetic in R
• You can use R as a calculator
• Typed expressions will be evaluated and printed out
• Main operations: +, -, *, /, ^
• Obeys order of operations
• Use parentheses to group expressions
• More complex operations appear as functions
• sqrt(2)
• sin(pi/4), cos(pi/4), tan(pi/4), asin(1), acos(1), atan(1)
• exp(1), log(2), log10(10)
Data type
 R provides the class() and typeof() functions to
find out what is the class and type of any variable.
 R has five data types which are:
1. Numeric
2. Integers
3. Complex
4. Logical
5. Characters
For a variable to be valid, it
should follow these rules
• It should contain letters, numbers, and only dot or
underscore characters.
• It should not start with a number (eg:- 2iota)
• It should not start with a dot followed by a number
(eg:- .2iota)
• It should not start with an underscore (eg:- _iota)
Reserved Keywords in R
Useful Functions
length(object) # number of elements or components
str(object) # structure of an object
class(object) # class or type of an object
names(object) # names
c(object,object,...) # combine objects into a vector
cbind(object, object, ...) # combine objects as columns
rbind(object, object, ...) # combine objects as rows
ls() # list current objects
rm(object) # delete an object
newobject <- edit(object) # edit copy and save a newobject
fix(object) # edit in place
operator
 An operator is a symbol that tells the compiler to
perform specific mathematical or logical
manipulations.

• Arithmetic Operators
• Relational Operators
• Logical Operators
• Assignment Operators
• Miscellaneous Operators
Arithmetic Operators

 These operators are used to carry out


mathematical operations like addition and
multiplication.
R Relational Operators

 Relational operators are used to compare


between values. Here is a list of relational
operators available in R.
R Logical Operators

 Logical operators are used to carry out


Boolean operations like And, or etc.
Assignment Operators

 These operators are used to assign values to


variables.
R Miscellaneous Operators
 Miscellaneous operators are used
to manipulate data.

You might also like