0% found this document useful (0 votes)
44 views

R Language

The document provides an overview of the R programming language and its features. It discusses R's history and development, data types, control structures, vectors, matrices and arrays, lists, data frames, factors and tables. It also lists some textbooks and reference books related to R programming.

Uploaded by

lekhyareddy29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

R Language

The document provides an overview of the R programming language and its features. It discusses R's history and development, data types, control structures, vectors, matrices and arrays, lists, data frames, factors and tables. It also lists some textbooks and reference books related to R programming.

Uploaded by

lekhyareddy29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

2nd BCA(Data Science)/IV-Semester ‘R’ - Language

SYLLABUS
‘R’ Language for Data Science (ABCA/IMCA)/
Programming using ‘R’ (MSDS)

UNIT – I
Introduction to R: R overview and history, Basic features of R, Benefits of R, data types in
R, Installing R, Getting started with the RStudio IDE, Running R, Packages in R, variable
names and assignment ,operators, Input/output functions , reading and writing data.

UNIT-II
Preview of Some Important R Data Structures: Vectors, Character Strings, Matrices,
Lists, Data Frames, and Classes.
Control structures: Conditional statements, Loops, dates and times functions, String
manipulations.
UNIT-III
VECTORS: Scalars, Vectors, Arrays and Matrices: Adding and Deleting Vector Elements,
Obtaining the Length of a Vector- Common vector operations: Arithmetic & logical
operations, Vector Indexing, Generating vector sequences with seq (), Repeating vector
constants with rep (), using all () and any () functions, Vectorized operations, NA and NULL
values.

UNIT-IV
MATRICES AND ARRAYS: Creating Matrices, General Matrix operations- linear algebra
operations, matrix indexing, filtering on matrices, using apply() function , Add and Delete
matrix rows and columns.

LISTS: Creating Lists, General List Operations, List Indexing Adding and Deleting List
Elements, Getting the Size of a List ,Accessing List Components and Values, Using lapply()
and sapply() functions.

UNIT-V
DATA FRAMES: Creating Data Frames, Accessing Data Frames - Other Matrix-Like
Operations: Extracting sub data frames, using rbind () and cbind () unctions.
FACTORS AND TABLES : Factors and Levels - Common Functions Used with Factors :
tapply(), split() and by() - Working with Tables, Matrix/Array-Like Operations on Tables,
Extracting a Sub table - Math Functions: aggregate() and cut() functions.

Text Books:
1. The Art of R Programming by Norman Matlof, No starch press, SAN FRANSISCO, 2011.
2. An Introduction to R for Beginners by SASHA HAFNER, on AUG-2019

Reference Books:
1. R Programming for Dummies, Andrie de Vries and Joris Meys, Wiley
2. R for Data Science, Hadley Wickham, Garrett Grolemund, O’Reilly Media
3. R Programming : A Step-By-Step Guide for Absolute Beginners-2nd Edition, Daniel
Daniel Bell
4. Learn R programming in 1 Day, Krishna Rungta, Published by Guru99

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 1 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

UNIT-I
Introduction to ‘R’
What is R?
 R is a popular interpreted programming language which is used as a leading tool
for machine learning, statistics, and data analysis. Objects, functions, and
packages can easily be created by R.
 It’s a platform-independent language. This means it can be applied to all
operating systems.
 It’s an open-source free language. That means anyone can install it in any
organization without purchasing a license.

R is a Interpreted programming and statistical language.

R is simple and easy to learn, read and write.

R is an example of a FLOSS (Free Library and Open Source Software) where one
can freely distribute copies of this software, read it’s source code, modify it, etc.
R – Overview & History
 R is an interpreted programming language and software environment for statistical
analysis, graphics representation and reporting.
 R is an implementation of S language.
 R was created by Ross Ihaka and Robert Gentleman at University of Auckland,
New Zealand in 1991, and is currently developed by the R Development Core Team.
 It’s name being inspired after the first character of its author’s name and as a play on
the name of S. In 1995, R declares as an open source project under GNU licenses.
 Finally, First stable beta version of R was released in 2000.

Features of R
Features of R Programming Language
R Packages:
One of the major features of R is it has a wide availability of libraries. R has
CRAN(Comprehensive R Archive Network), which is a repository holding more than 10,
0000 packages.
Distributed Computing:
Distributed computing is a model in which components of a software system are
shared among multiple computers to improve efficiency and performance. Two new
packages ddR and multidplyr used for distributed programming in R were released in
November 2015.
Statistical Features of R
Basic Statistics:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 2 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

The most common basic statistics terms are the mean, mode, and median. These are
all known as “Measures of Central Tendency.” So using the R language we can measure
central tendency very easily.
Static graphics:
R is rich with facilities for creating and developing interesting static graphics. R
contains functionality for many plot types including graphic maps, mosaic plots, biplots,
and the list goes on.
Probability distributions:
Probability distributions play a vital role in statistics and by using R we can easily
handle various types of probability distributions such as Binomial Distribution, Normal
Distribution, Chi-squared Distribution, and many more.
Data analysis:
It provides a large, coherent, and integrated collection of tools for data analysis.
Open-source:

R is an open-source software environment. It is free of cost and can be adjusted and


adapted according to the user’s and the project’s requirements. You can make improvements
and add packages for additional functionalities.

Perform Complex Statistical Calculations:

R can be used to perform simple and complex mathematical and statistical calculations
on data objects of a wide variety. It can also perform such operations on large data sets.

Cross-platform Support: R is machine-independent. It supports the cross-platform


operation. Therefore, it can be used on many different operating systems.

Compatible with Other Programming Languages:

While most of its functions are written in R itself, C, C++ or FORTRAN can be used for
computationally heavy tasks. Java, .NET, Python, C, C++, and FORTRAN can also be used to
manipulate objects directly.

5. Wide Selection of Packages:

R contains a sea of packages. CRAN or Comprehensive R Archive Network houses


has more than 10,000 different packages and extensions that help solve all sorts of
problems in data science. High-quality interactive graphics, web application development,
quantitative analysis or machine learning procedures, there is a package for every scenario
available.

6. A Large Variety of Libraries:

R’s massive community support has resulted in a very large collection of libraries. R
is famous for its graphical libraries. These libraries support and enhance the R development
environment. R has libraries with a huge variety of applications.

Benefits of R (or) Advantages of R


R is a great resource for data analysis, data visualization, data science and machine
learning. Here are the powerful advantages of R programming:
1. Excellent for Statistical Computing and Analysis:

R is the most used programming language for developing statistical tools. It provides
many statistical techniques such as statistical tests, classification, clustering and data
reduction.
2. Open-source:

R is an open-source programming language. Anyone can work with R without any


license or fee. Due to this, R has a huge community that contributes to its environment.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 3 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

3. A Large Variety of Libraries:

R’s massive community support has resulted in a very large collection of libraries. R
is famous for its graphical libraries. These libraries support and enhance the R development
environment. R has libraries with a huge variety of applications.

4. Cross-platform Support:

R is machine-independent. It supports the cross-platform operation. Thus, it is


usable on many different operating systems like windows, UNIX, LINUX, etc.

5. Supports various Data Types:

R can perform operations on vectors, arrays, matrices, and various other data objects
of varying sizes.

6. Can do Data Cleansing, Data Wrangling, and Web Scraping:

R can collect data from the internet through web scraping and other means. It can
also perform data cleansing. Data cleansing is the process
of detecting and removing/correcting inaccurate or corrupt records. R is also useful for
data wrangling which is the process of converting raw data into the desired format for easier
consumption.

7. Powerful Graphics:

R has extensive libraries that can produce quality graphs and visualizations. R is
easy to draw graphs like pie charts, histograms, box plot, scatter plot, etc.
8. Highly Active Community:

R has a large community support. R community is very active. There are users from
all around the world to help and support you. Many latest ideas and technology appear in
the R community.
9. Parallel and Distributed Computing:

Using libraries like ddR or multiDplyr, R can process large data sets using parallel or
distributed computing.

10. Doesn’t need a Compiler:

R is an interpreted language. This means that it does not need a compiler to turn
the code into an executable program. Instead, R interprets the provided code into lower-
level calls and pre-compiled code.

11. Compatible with other Programming Languages:

R is compatible with other languages like C, C++, and FORTRAN. Other languages like
.NET, Java, Python can also directly manipulate objects.

12. Used in Machine learning:

R can be useful for machine learning as well. Facebook does a lot of its machine
learning research with R. Sentiment analysis and mood prediction are all done using R.
The best use of R when it comes to machine learning is in case of exploration or when
building one-off models.

13. Can Interact with Databases:

R contains several packages that enable it to interact with databases. Some of these
packages are Roracle, Open Database Connectivity Protocol), RmySQL, etc.

14. Comprehensive Environment:

R has a very comprehensive development environment. It helps in statistical


computing as well as software development. R is an object-oriented programming

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 4 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

language. It also has a robust package called Rshiny which can produce full-fledged web
apps. R can also be useful for developing software packages.

Installation of R

Step 1 : Go to the link- https://cran.r-project.org/


Step 2 : Download and install the latest version of R for Windows on your system.
Step 3 : When you have downloaded and installed R, you can run R on your computer.
Step 4 : The screenshot below shows how it may look like when you run R on a Windows
PC.

Installing R-Studio
R-Studio is an IDE used for R Programming which is available as open-source and
commercial software for Desktop and Server products.
We can directly start coding in R by downloading RStudio IDE. To download this,
follow the below steps:
Step 1: Go to the link- https://www.rstudio.com/
Step 2: Download and install Rstudio on your system.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 5 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Step 3: On the successful download of the file, run the .exe file and complete the
installation.
Step 4: Open the RStudio App and you will see that the entire window is divided into 4
panes as below.

Source window:
We add the source code here and run the whole code by clicking on the source button.
To run selected lines, select lines and click Ctrl + Enter or Run button. Run a single line by
clicking on CTRL+ Enter.
R Console:
R displays error logs, warnings, executed statements with their outputs in this pane.
Environment and History:
This pane consists of 3 tabs. The Environment tab displays all variables defined and
used in the R session. The history tab displays the executed statements in R source and
Console. The Connections tab display database and external connection-related
information.
Files & Package Viewer:

This pane consists of 5 tabs. The Files tab displays the files in the current working
directory. The Plots tab displays graphs, charts created using R packages.
The Packages tab lists down installed packages. It also contains 2 buttons (install and
update). The Help tab displays the documentation of any package or function in R.
The Viewer tab displays web applications and maps that are created using R.
Note: In case any of the 4 panes are closed or hidden, Go to View -> Panes -> Show All
Panes to view all panes.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 6 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

R Comments
Comments can be used to explain R code, and to make it more readable. It can also
be used to prevent execution when testing alternative code.
Comments starts with a #. When executing the R-code, R will ignore anything that
starts with #.
This example uses a comment before a line of code:
Example:
# This is a comment
"Hello World!"
This example uses a comment at the end of a line of code:
Example:
"Hello World!" # This is a comment
Comments does not have to be text to explain the code, it can also be used to prevent
R from executing the code.
Multiline Comments:
Unlike other programming languages, there are no syntax in R for multiline
comments. However, we can just insert a # for each line to create multiline comments:
Example:
# This is a comment
# written in
# more than just one line
"Hello World!"
R package & Libraries
R packages are a group of functions bundled together. These functions are pre-
compiled and used in R scripts by preloading them. We can find the list of packages installed
in the packages tab at the bottom right window.

 To install a package, use the following syntax in R Source or R Console.

install.packages([package-name])

Example:

install.packages(c("vioplot", "MASS"))

By default, RStudio installs the packages from CRAN Repository. We can use the
functions by loading the package into memory.

 To load the package, use the following syntax.

library([package-name])

Update, Remove and Check Installed Packages in R:

 To check what packages are installed on your computer, type this command:

installed.packages()

 To update all the packages, type this command:

update.packages()

 To update a specific package, type this command:

install.packages("PACKAGE NAME")

Difference Between a Package and a Library:

There is always confusion between a package and a library, and we find people calling
libraries as packages.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 7 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

 library(): It is the command used to load a package, and it refers to the place where
the package is contained, usually a folder on our computer.

 Package: It is a collection of functions bundled conveniently. The package is an


appropriate way to organize our own work and share it with others.

List of R packages:

1) tidyr

The word tidyr comes from the word tidy, which means clear. So the tidyr package is
used to make the data' tidy'. This package works well with dplyr. This package is an
evolution of the reshape2 package.

2) ggplot2

R allows us to create graphics declaratively. R provides the ggplot package for this
purpose. This package is famous for its elegant and quality graphs which sets it apart from
other visualization packages.

3) ggraph

R provides an extension of ggplot known as ggraph. The limitation of ggplot is the


dependency on tabular data is taken away in ggraph.

4) dplyr

R allows us to perform data wrangling and data analysis. R provides the dplyr library
for this purpose. This library facilitates several functions for the data frame in R.

5) tidyquant

The tidyquant is a financial package which is used for carrying out quantitative
financial analysis. This package adds to the tidyverse universe as a financial package which
is used for importing, analyzing and visualizing the data.

6) dygraphs

The dygraphs package provides an interface to the main JavaScript library which we
can use for charting. This package is essentially used for plotting time-series data in R.

7) leaflet

For creating interactive visualization, R provides the leaflet package. This package is
an open-source JavaScript library. The world's popular websites like the New York Times,
Github and Flicker, etc. are using leaflet. The leaflet package makes it easier to interact with
these sites.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 8 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

8) ggmap

For delineating spatial visualization, the ggmap package is used. It is a mapping


package which consists of various tools for geolocating and routing.

9) glue

R provides the glue package to perform the operations of data wrangling. This
package is used for evaluating R expressions which are present within the string.

10) shiny

R allows us to develop interactive and aesthetically pleasing web apps by providing


a shiny package. This package provides various extensions with HTML widgets, CSS, and
JavaScript.

11) plotly

The plotly package provides online interactive and quality graphs. This package
extends upon the JavaScript library -plotly.js.

12) tidytext

The tidytext package provides various functions of text mining for word processing
and carrying out analysis through ggplot, dplyr, and other miscellaneous tools.

13) stringr

The stringr package provides simplicity and consistency to use wrappers for the
'stringi' package. The stringi package facilitates common string operations.

14) reshape2

This package facilitates flexible reorganization and aggregation of data using melt ()
and decast () functions.

15) dichromat

The R dichromat package is used to remove Red-Green or Blue-Green contrasts from


the colors.

16) digest

The digest package is used for the creation of cryptographic hash objects of R
functions.

17) MASS

The MASS package provides a large number of statistical functions. It provides


datasets that are in conjunction with the book "Modern Applied Statistics with S."

18) caret

R allows us to perform classification and regression tasks by providing the caret


package. CaretEnsemble is a feature of caret which is used for the combination of different
models.

19) e1071

The e1071 library provides useful functions which are essential for data analysis like
Naive Bayes, Fourier Transforms, SVMs, Clustering, and other miscellaneous functions.

20) sentimentr

The sentiment package provides functions for carrying out sentiment analysis. It is used
to calculate text polarity at the sentence level and to perform aggregation by rows or
grouping variables.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 9 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Variables
Variables are containers for storing data values. Variable is the name of the memory
location where data is stored. In other words, we can access memory data using variables. A
variable in R can store Numeric values, Complex Values, Words, Matrices and even a Table.
Creating Variables in R :
In R, we can assign variables using any of the following syntaxes.
 Name = “ABC”
 Age <- 30
 “VSU” -> Name

Fig: Creation of variables


The above image shows us how variables are created and how they are stored in
different memory blocks. In R, we don’t have to declare a variable before we use it, unlike
other programming languages like Java, C, C++, etc.
Variables can be categorized into Continuous and Categorical. If a variable can take on
any value between its minimum value and its maximum value, it is called a Continuous
variable. Categorical variables (sometimes called a nominal variable) are those that have
a fixed number of values or choices such as “Yes”, “No”, etc.
Variable Names :
A variable can have a short name (like x and y) or a more descriptive name (age,
car_name, total_volume).
Rules for R variables are:
 A variable name must start with a letter and can be a combination of letters, digits,
period(.) and underscore(_).
 If it starts with period(.), it cannot be followed by a digit.
 A variable name cannot start with a number or underscore (_)
 Variable names are case-sensitive (age, Age and AGE are three different variables)
 Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...etc.)
# Legal variable names:
myvar <- "ABC"
my_var <- "Raja"
myVar <- "ABC"
MYVAR <- "Raja"
myvar2 <- "VSU"
.myvar <- Raja

# Illegal variable names:


2myvar <- "ABC"
my-var <- "Raja"
my var <- "ABC”
_my_var <- "Raja"
my_v@ar <- "VSU"

Note: R variables are case-sensitive.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 10 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Example:
name <- "Raja"
age <- 20

name # output "Raja"


age # output 20
From the example above, name and age are variables, while "Raja" and 20 are values.
In other programming language, it is common to use = as an assignment operator. In
R, we can use both = and <- as assignment operators.
However, <- is preferred in most cases because the = operator can be forbidden in
some context in R.
Print / Output Variables :
Compared to many other programming languages, you do not have to use a function
to print/output variables in R. You can just type the name of the variable:
Example:
name <- "Raja Gopal"
name # auto-print the value of the name variable
Raja Gopal
However, R does have a print() function available if you want to use it. This might be
useful if you are familiar with other programming languages, which often use
a print() function to output variables.
Example:
name <- "Raja Gopal"

print(name) # print the value of the name variable


Raja Gopal
Concatenate Elements :
You can also concatenate, or join, two or more elements, by using
the paste() function. To combine both text and a variable, R uses comma (,).
Example:
text <- "awesome"
paste("R is", text)
You can also use , to add a variable to another variable:
Example:
text1 <- "R is"
text2 <- "awesome"
paste(text1, text2)
For numbers, the + character works as a mathematical operator:
Example:
num1 <- 5
num2 <- 10
num1 + num2
Note : If you try to combine a string (text) and a number, R will give you an error:
Example:
num <- 5
text <- "Some text"
num + text
Result:
Error in num + text : non-numeric argument to binary operator

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 11 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Multiple Variables :
R allows you to assign the same value to multiple variables in one line.
Example:
# Assign the same value to multiple variables in one line
var1 <- var2 <- var3 <- "Mango"
# Print variable values
var1
var2
var3

Data Types
Kind of data is known as data type. Variables can store data of different types, and
different types can do different things. R has a variety of data types and object classes.
In R, variables do not need to be declared with any particular type, and can even
change type after they have been set:
Example:
my_var <- 30 # my_var is type of numeric
my_var <- "Madhumika" # my_var is now of type character (String)
Basic Data Types
Basic data types in R can be divided into the following types:
 numeric - (10.5, 55, 787)
 integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
 complex - (9 + 3i, where "i" is the imaginary part)
 character (string) - ("k", "R is exciting", "FALSE", "11.5")
 logical (boolean) - (TRUE or FALSE)
We can use the class() function to check the data type of a variable:
Example:
# numeric
x <- 10.5
class(x)

# integer
x <- 1000L
class(x)

# complex
x <- 9i + 3
class(x)

# character/string
x <- "R is exciting"
class(x)

# logical/boolean
x <- TRUE
class(x)
Numbers :
There are three number types in R:
 numeric
 integer
 complex

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 12 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Variables of number types are created when you assign a value to them:
Example:
x <- 10.5 # numeric
y <- 10L # integer
z <- 1i # complex
Numeric :
A numeric data type is the most common type in R, and contains any number with or
without a decimal, like: 10.5, 55, 787:
Example:
x <- 10.5
y <- 55
# Print values of x and y
x
y

# Print the class name of x and y


class(x)
class(y)
Integer :
Integers are numeric data without decimals. This is used when you are certain that
you will never create a variable that should contain decimals. To create an integer variable,
you must use the letter L after the integer value:
Example:
x <- 1000L
y <- 55L

# Print values of x and y


x
y

# Print the class name of x and y


class(x)
class(y)
Complex :
A complex number is written with an "i" as the imaginary part:
Example:
x <- 3+5i
y <- 5i
# Print values of x and y
x
y
# Print the class name of x and y
class(x)
class(y)
Type Conversion
You can convert from one type to another with the following functions:
 as.numeric()
 as.integer()
 as.complex()
Example:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 13 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

x <- 1L # integer
y <- 2 # numeric

# convert from integer to numeric:


a <- as.numeric(x)
# convert from numeric to integer:
b <- as.integer(y)
# print values of x and y
x
y
# print the class name of a and b
class(a)
class(b)
Booleans (Logical Values):
In programming, you often need to know if an expression is true or false. You can
evaluate any expression in R, and get one of two answers, TRUE or FALSE.
When you compare two values, the expression is evaluated and R returns the logical
answer:
Example:
10 > 9 # TRUE because 10 is greater than 9
10 == 9 # FALSE because 10 is not equal to 9
10 < 9 # FALSE because 10 is greater than 9
You can also compare two variables:
Example:
a <- 10
b <- 9
a>b
You can also run a condition in an if statement,
Example:
a <- 200
b <- 33

if (b > a) {
print ("b is greater than a")
} else {
print("b is not greater than a")
}
Operators
Operators are the symbols directing the compiler to perform various kinds of
operations between the operands. Operators simulate the various mathematical, logical, and
decision operations performed on a set of Complex Numbers, Integers, and Numerical as
input operands.
There are mainly 4 data operators in R, they are as seen below:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 14 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Arithmetic Operators:
These operators help us perform the basic arithmetic operations like addition,
subtraction, multiplication, etc.
Name Operator Description Example
a = 1; b = 2; c = a+b;
Addition + Perform the sum of the variables
c=3
a = 5; b = 2; c = a-b;
Subtraction – Return difference of variables
c=3
a = 3; b = 2; c = a*b;
Multiplication * Return product of variables
c=6
a = 10; b = 2; c = a/b;
Division / Divide left operand by right operand
c=5
Modulo Remainder from division of first operand a = 11;b = 3; c = a %% b
%%
Division with second c=2
Performs exponential (power) a = 3; b = 2; c = a**b;
Exponent **
calculation on operators c=9
# R Arithmetic Operators Example for integers
a <- 7.5
b <- 2
print ( a+b ) #addition
print ( a-b ) #subtraction
print ( a*b ) #multiplication
print ( a/b ) #Division
print ( a%%b ) #Reminder
print ( a^b ) #Power of
Output:
[1] 9.5
[1] 5.5
[1] 15
[1] 3.75
[1] 1.5
[1] 56.25
Relational Operators:
These operators help us perform the relational operations like checking if a variable is
greater than, lesser than or equal to another variable. The output of a relational operation is
always a logical value.
Name Operator Description Example
Return True if both operands are
Equal to == a = 1; b = 2; a==b; FALSE
equal
Return True; If both operands are
Not Equal to != a = 5; b = 2; a!=b; TRUE
not equal
Greater/ Return True; If left operand greater
> and < a = 3; b = 2; a>b; TRUE
Lesser than right operand and vice vera.
Greater than Return True; If left operand greater
>= a = 3; b = 2; a>=b; TRUE
equal to than or equal to right operand

Less than Return True; If left operand lesser a = 3; b = 2; a<=


<=
equal to than or equal to right operand b; FALSE

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 15 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

# R Relational Operators Example for Numbers


a <- 7.5
b <- 2
print ( a<b ) # less than
print ( a>b ) # greater than
print ( a==b ) # equal to
print ( a<=b ) # less than or equal to
print ( a>=b ) # greater than or equal to
print ( a!=b ) # not equal to
Output:
[1] FALSE
[1] TRUE
[1] FALSE
[1] FALSE
[1] TRUE
[1] TRUE
Assignment Operators:
These operators are used to assign values to variables in R. The assignment can be
performed by using either the assignment operator (<-) or equals operator (=). The value of
the variable can be assigned in two ways, left assignment and right assignment.

The assignment operators are =, <-, ->.


Examples:
10 -> b
a=5
c <- a+b
Logical Operators:
These operators compare the two entities and are typically used with Boolean (logical)
values such as ‘and’, ‘or’ and ‘not’.
Name Operator Description Example
Return TRUE, if both elements are
Logical AND & a = 5; b = 2; a!=b; TRUE
TRUE.
Return TRUE, if at least one
Logical OR | a = 1; b = 2; a==b; FALSE
element is TRUE
Return opposite or negation of
Logical NOT ! a = 3; b = 2; a>b; TRUE
element
# R Logical Operators Example for basic logical elements
a <- 0 # logical FALSE

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 16 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

b <- 2 # logical TRUE


print ( a & b ) # logical AND element wise
print ( a | b ) # logical OR element wise
print ( !a ) # logical NOT element wise
print ( a && b ) # logical AND consolidated for all elements
print ( a || b ) # logical OR consolidated for all elements
Output:
[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] TRUE

Miscellaneous Operators:
These operators does not fall into any of the categories mentioned above. These
operators are used for specific purpose, not for logical computation.

Operator Description Usage

: Creates series of numbers from left operand to right operand a:b

%in% Identifies if an element(a) belongs to a vector(b) a %in% b

%*% Performs multiplication of a vector with its transpose A %*% t(A)

# R Misc Operators Example


a = 23:31
print ( a )
a = c(25, 27, 76)
b = 27
print ( b %in% a )
M = matrix(c(1,2,3,4), 2, 2, TRUE)
print ( M %*% t(M) )
Output:
[1] 23 24 25 26 27 28 29 30 31
[1] TRUE
[,1] [,2]
[1,] 5 11
[2,] 11 25

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 17 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Input & Output Functions


Print :
Unlike many other programming languages, you can output code in R without using a
print function:
Example:
"Hello World!"
However, R does have a print() function available if you want to use it. This might be
useful if you are familiar with other programming languages, which often uses
the print() function to output code.
Example:
print("Hello World!")
And there are times you must use the print() function to output code, for example
when working with for loops.
for (x in 1:10) {
print(x)
}
Note: To get output text in R, use single or double quotes: "Hello World!"
Reading & Writing Data
Reading data: Reading data from an input by the given user. Now you can use scan():
> z <- scan()
1: 12 5
3: 2
4:
Read 3 items
>z
[1] 12 5 2
You can use readline() for inputing a line from the keyboard in the form of a string:
> w <- readline()
Welcome to all
>w
[1] "Welcome to all"
Printing to the Screen:
In interactive mode, one can print the value of that variable by just typing the variable
name or expression. In batch mode, one can use the print() function, e.g.
print(x)
The argument might be an object. So it is a little better to use cat() instead of
print(), as the last one can print only one expression and its result is numbered, which may
be a nuisance to us. Here is an example written below:
> print("xyz")
[1] "xyz"
> cat("xyz \n")
xyz
The arguments to cat() will be printed out with intervening spaces, for instance
> g <- 62
> cat(g,"xyz","gs\n")
62 xyz gs

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 18 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Reading a Matrix or Data Frame from a File:


The function read.table() is used usually. Basically, the character strings are
considered as R factors. For turning this "feature" off, you can include the argument as.is=T
in your call to read.table().
When you have a spreadsheet export file, i.e. having a type .csv where the fields are
divided by commas in place of spaces, use read.csv() in place of read.table(). You can also
use read.xls for reading core spreadsheet files.
For example, let say the matrix x becomes:
101
111
110
110
001
You can read it into a matrix form like this
> x <- matrix(scan("x"),nrow=5,byrow=T)
Reading a single File One Line at a Time:
You can use readLines() for this. You need to produce a connection first, by calling the
file(). Here is a simple example below:
> c <- file("z","r")
> readLines(c,n=1)
[1] "1 3"
Writing a Table to a File:
R uses the function write.table() that works very much similar to read.table()which
writes a data frame instead of reading one. In the case of writing a matrix, to a file, you just
need not have to know row or column names like:

> write.table(xc,"xcnew",row.names=F,col.names=F)

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 19 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

UNIT-II
Data Structures
Data types are used to store information. In R, we do not need to declare a variable
as some data type. The variables are assigned with R-Objects and the data type of the R-
object becomes the data type of the variable. There are mainly six data types present in R:

Vector:
A Vector is a sequence of data elements of the same basic type.
Example:
vtr = (1, 3, 5 ,7, 9 ) or
vtr <- (1, 3, 5 ,7, 9)
There are 5 Atomic vectors, also termed as five classes of vectors.

List:
Lists are the R objects which contain elements of different types like − numbers,
strings, vectors and another list inside it.
>n = c(2, 3, 5)
>s = c("aa", "bb", "cc", "dd", "ee")
>x = list(n, s, TRUE)
>x
Output:
[[1]]
[1] 2 3 5
[[2]]
[1] "aa" "bb" "cc" "dd" "ee"
[[3]]
[1] TRUE

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 20 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Arrays:
Arrays are the R data objects which can store data in more than two dimensions. It
takes vectors as input and uses the values in the dim parameter to create an array.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
result <- array(c(vector1,vector2),dim = c(3,3,2))
Output –
,,1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
,,2
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

Matrices:
Matrices are the R objects in which the elements are arranged in a two-dimensional
rectangular layout. A Matrix is created using the matrix() function.
Example:
matrix(data, nrow, ncol, byrow, dimnames) where,
data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns.
>Mat <- matrix(c(1:16), nrow = 4, ncol = 4 )
>Mat
Output :
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16

Factors:
Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in data analysis for
statistical modeling.
>data <- c("East","West","East","North","North","East","West","West“,"East“)
>factor_data <- factor(data)
>factor_data
Output :
[1] East West East North North East West West East
Levels: East North West

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 21 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Data Frames:
A data frame is a table or a two-dimensional array-like structure in which each column
contains values of one variable and each row contains one set of values from each column.
>std_id = c (1:5)
>std_name = c("Raja","Maneesh","ABC","Jessica","Sumaya")
>marks = c(623.3,515.2,611.0,729.0,843.25)
>std.data <- data.frame(std_id, std_name, marks)
>std.data
Output :
std_id std_name marks
1 1 Raja 623.30
2 2 Maneesh 515.20
3 3 ABC 611.00
4 4 Jessica 729.00
5 5 Sumaya 843.25
String Literals
A character, or strings, are used for storing text. A string is surrounded by either
single quotation marks, or double quotation marks:
"hello" is the same as 'hello':
Example:
"hello"
'hello'
Assign a String to a Variable:
Assigning a string to a variable is done with the variable followed by the <- operator
and the string:
Example:
str <- "Hello"
str # print the value of str
Multiline Strings:
You can assign a multiline string to a variable like this:
Example:
str <- "R is wonderful language,
R used for graphics,
R used for Statistical Representations."

str # print the value of str


However, note that R will add a "\n" at the end of each line break. This is called an
escape character, and the n character indicates a new line.
If you want the line breaks to be inserted at the same position as in the code, use
the cat() function:
Example:
str <- "R is wonderful language,
R used for graphics,
R used for Statistical Representations."
cat(str)
String Length:
There are many useful string functions in R.
For example, to find the number of characters in a string, use the nchar() function:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 22 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Example: str <- "Hello World!"

nchar(str)
Check a String:
Use the grepl() function to check if a character or a sequence of characters are
present in a string:
Example:
str <- "Hello World!"

grepl("H", str)
grepl("Hello", str)
grepl("X", str)
Combine Two Strings:
Use the paste() function to merge/concatenate two strings:
Example:
str1 <- "Hello"
str2 <- "World"
> paste(str1, str2)
[1] “HelloWorld”
Escape Characters:
 To insert characters that are illegal in a string, you must use an escape character.
 An escape character is a backslash \ followed by the character you want to insert.
An example of an illegal character is a double quote inside a string that is surrounded
by double quotes:
Example:
str <- "We are the Data Science "Professionals", from VSU."

> str
Result:
Error: unexpected symbol in "str <- "We are the Data Science "Professionals"
To fix this problem, use the escape character \":
Example:
str <- "We are the Data Science \"Professionals\", from VSU."
> str
> cat(str)
Note that auto-printing the str variable will print the backslash in the output. You can use
the cat() function to print it without backslash.
Other escape characters in R:
\\ Backslash
\n Newline
\r Carriage Return
\t Tab
\b Back Space

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 23 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Control Structures
Flow control statements play a very important role as they allow you to control the
flow of execution of a script inside a function. The most commonly used flow control
statements are represented in the below.

Conditional statements:
R - Language Supports 3 conditional statements which are –
 If
 Else If
 If Else If
 Switch
If Statement:
The flow of If Statement is as follows. As shown in the picture, if the condition is
true, then execute If code else executes the remaining statements that come after if body.
Syntax:
if (condition) {
statements
}
Example:
a <- 20
b <- 30
if (b > a) {
print ("b is greater than a")
}
Output:
[1] “b is greater than a”

Else If Statement:
As shown in the above picture, if the condition is true, then execute If code else
executes Else code and then follow the statements that come after the if-else body. The
flow of Else If Statement is as follows.
Syntax:
Syntax:
if(condition) {
statements
} else {
statements
}
Example:
a <- 20
b <- 13

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 24 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

if (b > a) {
print ("b is greater than a")
} else {
print("b is not greater than a")
}
Output:
[1] “b is not greater than a”
If Else If Statement:
The flow of If Else If Statement is as follows. As shown in the picture, if the condition
is true, then execute If code else checks the second condition. If the condition is true,
execute Else If code otherwise executes Else code followed by statements that come after
if-else-if body.
Syntax:
Syntax:
if(condition) {
statements
} else if (condition){
statements
}else {
statements
}
Example:
a <- 20
b <- 13
if(b == a)
print (“b is equal to a”)
else if (b > a) {
print ("b is greater than a")
} else {
print("b is not greater than a")
}
Output:
[1] "b is not greater than a"

Switch statement:
The flow of Switch Statement is as follows. A switch is another conditional statement
used in R. If statements are generally preferred over switch statements. The basic syntax of
the switch statement is –
Syntax:
switch (expression, list)
Example:
y <- 3
x <- switch(
y,
"Good Morning",
"Good Afternoon",
"Good Evening",
"Good Night"
)
print(x)
Output:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 25 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

[1] “Good Evening”


Looping Statements

Looping statements reduce the work of a user to perform a task multiple times. These
statements execute a segment of code repeatedly until the condition is met. R – Language
supports 3 looping statements which are,

 For
 While
 Repeat

For Loop:
For loop is the most common looping statement used for repeating a task. A for loop
executes statements for a known number of times. Define a for loop using the following
syntax:
Syntax:
for(var in range){
statements
}
Example:
for(x in 1:10){
print(x)
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
While Loop:
A while loop repeats a statement or group of statements until the condition is true. It
tests the condition before executing the loop body. A while loop is created using the
following syntax:
Syntax:
while(condition) {
Statements
}
Example:
a=5
while(a>0){
a = a-1
print(a)
}

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 26 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Output:
[1] 4
[1] 3
[1] 2
[1] 1
[1] 0
Repeat:
Repeat loop in R is used to execute statements multiple number of times. And also it
executes the same code again and again until a break statement is found.
Repeat loop doesn’t use a condition to exit the loop instead it looks for
a break statement otherwise an infinite loop in R can be created. Create a repeat loop using
the following syntax:
Syntax:
repeat
{
statements
}
Here, it executes statements repeatedly. But if we want to terminate the loop, we must use
‘break’ statement within the loop.
Example:
m=5
repeat {
m = m+2
print(m)
if(m>15) {
break
}
}
Output:
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
[1] 17
Control statements
R – Language supports the following control statements,

Break:
A break statement is used to stop or terminate the execution of statements. When
the break statement is encountered inside a loop, the loop is immediately terminated and
program control resumes at the next statement following the loop. If else and switch
statements contain break statements usually to stop the execution. The syntax to use the
break statement is –

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 27 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Syntax:
break
Example:
m=5
repeat {m = m+2
print(m)
if(m>15) {
break
}
}
Output:
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
[1] 17
Next:
The next statement is used to skip the current iteration of a loop without terminating
or ending it. The syntax of the next statement is –
Syntax: next
Example:
for(i in c(1:6)) {
if(i == “3”) {
next
}
print(i)
}
Output:
[1] 1
[1] 2
[1] 4
[1] 5
[1] 6

Functions
A function is a set of statements to perform a specific task. R has in-built functions
and also allows the user to create their own functions. A function performs a task and
returns a result into a variable or print the output in the console. There are mainly two types
of functions in R:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 28 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Pre-defined / Built-in Functions:


These are built in functions that can be used by the user to make their work easier.
Eg: mean(x), sum(x), sqrt(x), toupper(x), median(x), etc.
Built-in Math Functions:
R also has many built-in math functions that allows you to perform mathematical
tasks on numbers. For example, the min() and max() functions can be used to find the
lowest or highest number in a set:
Example:
max(5, 10, 15)

min(5, 10, 15)


sqrt():
The sqrt() function returns the square root of a number:
Example:
sqrt(16)
abs():
The abs() function returns the absolute (positive) value of a number:
Example: abs(-4.7)
ceiling() and floor():
The ceiling() function rounds a number upwards to its nearest integer, and
the floor() function rounds a number downwards to its nearest integer, and returns the
result:
Example:
ceiling(1.4)
floor(1.4)
User-Defined Functions:
User-Defined functions are defined as per the requirements. Define a function using
the following syntax:
Function definition:
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Store the function definition in a variable and call the function using variable followed
by optional parameters inside the parenthesis ( ).
Example:
factorial <- function(n) {
if(n<=1) { return(1)
}
else {
return(n * factorial(n-1))
}
}
factorial(3)
Output:
[1] 6

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 29 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Dates and Times Classes


R has developed a special representation for dates and times. Dates are represented
by the Date class and times are represented by the POSIXct or the POSIXlt class. Dates are
stored internally as the number of days since 1970-01-01 while times are stored internally
as the number of seconds since 1970-01-01.
There are three types of date and time classes which arrive with R programming.
These are: POSIXct, POSIXlt and Date.
POSIX Dates and Times Classes:
POSIX dates and times are classic R: brightly thorough in their implementation,
navigating all sorts of obscure technical issues, but with awful Unixy names that make
everything seem more complicated than it really is.
The function Sys.time() is used to return the current date and time in POSIXct
notation:
(now_ct <- Sys.time ())
# [1] "2016-10-28 20:48:02 BST"
Here, ct is the short form for calendar time.
Again, when the date needs to be printed, you just see a formatted version of it, so it
won't go obvious how the date is stored. By using 'unclass', you can see where it is indeed
just a number:
unclass (now_ct)
# [1] 1.374e+09
The Date Class:
The third is the 'date' class in base R which is better named as the 'Date' class. It
keeps dates as if the number of days since the starting of 1970. The 'Date' class is finely
used in cases where programmers' do not bother about the time of day. Fractional days are
probable which can get generated by computing a mean Date (suppose), but the POSIX
classes are better for those situations like:
(now_date <- as. Date (now_ct))
## [1] "2022-04-28"
class (now_date)
## [1] "Date"
unclass(now_date)
## [1] 16903
Other classes for date and time have add-on packages which include date, dates,
chron, year mon, yearqtr, timeDate, ti, and jul.
Lubridate:
Lubridate as the name implies, put in some much needed lubrication to the practice of
date manipulation. It does not include many new features over base R, but makes your code
more readable and facilitates you to avoid thinking too much. The real beauty is dissimilar
elements in the same vector be able to have different formats (as long as the year is
followed by the month that is followed by the day):
library (lubridate)
# Attaching the package: 'lubridate' in your program
# The following object gets masked from 'package:chron':
# for - days, hours, minutes, seconds, years
karlos_rays_birth_date <- c(
"1994 - 08 - 28",
"1993/08/28",
"Saturday + 1993.08*28" )
ymd (karlos_rays _birth_date)
## [1] "1993 - 08 - 28 UTC" "1993-08-28 UTC" "1993-08-28 UTC"

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 30 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

UNIT-III
VECTORS
Scalars in R:
A scalar data structure is the most basic data type that holds only a single atomic
value at a time. Using scalars, more complex data types can be constructed. The most
commonly used scalar types in R.
 Numeric
 Character
 Integer
 Logical
 Complex
Vectors:
 A vector is simply a list of items that are of the same type.
 To combine the list of items to a vector, use the c() function and separate the items
by a comma.
 In the example below, we create a vector variable called fruits, that combine strings:
Example:
# Vector of strings
> fruits <- c("banana", "apple", "orange")

# Print fruits
> fruits
[1] "banana" "apple" "orange"
In this example, we create a vector that combines numerical values:
Example:
# Vector of numerical values
> numbers <- c(1, 2, 3)

# Print numbers
> numbers
[1] 1 2 3
To create a vector with numerical values in a sequence, use the : operator:
Example:
# Vector with numerical values in a sequence
> numbers <- 1:10
> numbers
[1] 1 2 3 4 5 6 7 8 9 10
You can also create numerical values with decimals in a sequence, but note that if the
last element does not belong to the sequence, it is not used:
Example:
# Vector with numerical decimals in a sequence
> numbers1 <- 1.5:6.5
> numbers1
[1] 1.5 2.5 3.5 4.5 5.5 6.5

# Vector with numerical decimals in a sequence where the last element is not used
> numbers2 <- 1.5:6.3
> numbers2
[1] 1.5 2.5 3.5 4.5 5.5
In the example below, we create a vector of logical values:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 31 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Example:
# Vector of logical values
> log_values <- c(TRUE, FALSE, TRUE, FALSE)

> log_values
[1] TRUE FALSE TRUE FALSE

# Matrices and Arrays as Vectors


Arrays and matrices are actually vectors too. They merely have extra class attributes.
For example, matrices have the number of rows and columns. Consider the following
example:
>m
[,1] [,2]
[1,] 1 2
[2,] 3 4
> m + 10:13
[,1] [,2]
[1,] 11 14
[2,] 14 17
The 2-by-2 matrix m is stored as a four-element vector, column-wise, as (1,3,2,4).
We then added (10,11,12,13) to it, yielding (11,14,14,17), but R remembered that we were
working with matrices and thus gave the 2-by-2 result you see in the example.

Adding and Deleting Vector Elements


Modifying a Vector:
Modification of a Vector is the process of applying some operation on an individual
element of a vector to change its value in the vector. There are different ways through
which we can modify a vector:
# R program to modify elements of a Vector
# Creating a vector
X <- c(2, 7, 9, 7, 8, 2)
# modify a specific element
X[3] <- 1
X[2] <-9
cat('subscript operator', X, '\n')
# Modify using different logics.
X[X>5] <- 0
cat('Logical indexing', X, '\n')
# Modify by specifying
# the position or elements.
X <- X[c(3, 2, 1)]
cat('combine() function', X)
Output:
subscript operator 2 9 1 7 8 2
Logical indexing 2 0 1 0 0 2
combine() function 1 0 2

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 32 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Deleting a Vector:
Deletion of a Vector is the process of deleting all of the elements of the vector. This
can be done by assigning it to a NULL value.
# R program to delete a Vector
# Creating a Vector
M <- c(8, 10, 2, 5)
# set NULL to the vector
M <- NULL
cat('Output vector', M)

Output:
Output vector NULL
Obtaining the Length of a Vector
To find out how many items a vector has, use the length() function:
Example:
>fruits <- c("banana", "apple", "orange")
> length(fruits)
[1] 3
Example:
> x <- c(1,2,4,6)
> length(x)
[1] 4
Sort a Vector:
To sort items in a vector alphabetically or numerically, use the sort() function:
Example:
>fruits <- c("banana", "apple", "orange", "mango", "lemon")
> sort(fruits) # Sort a string
[1] "apple" "banana" "lemon" "mango" "orange"
> numbers <- c(13, 3, 5, 7, 20, 2)
> sort(numbers) # Sort numbers
[1] 2 3 5 7 13 20
Common Vector Operations
Some common operations related to vectors are arithmetic and logical operations,
vector indexing, and some useful ways to create vectors.
Vector Arithmetic and Logical Operations:
Remember that R is a functional language. Every operator, including + in the following
example, is actually a function.
> 2+3
[1] 5
> "+"(2,3)
[1] 5
Recall further that scalars are actually one-element vectors. So, we can add vectors,
and the + operation will be applied element-wise.
> x <- c(1,2,4)
> x + c(5,0,-1)
[1] 6 2 3
If you are familiar with linear algebra, you may be surprised at what happens when
we multiply two vectors.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 33 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

> x * c(5,0,-1)
[1] 5 0 -4
But remember, because of the way the * function is applied, the multiplication is done
element by element. The first element of the product (5) is the result of the first element of
x (1) being multiplied by the first element of c(5,0,1) (5), and so on.
The same principle applies to other numeric operators. Here’s an example:
> x <- c(1,2,4)
> x / c(5,4,-1)
[1] 0.2 0.5 -4.0
> x %% c(5,4,-1)
[1] 1 2 0

Vector Indexing
Access Vectors:
You can access the vector items by referring to its index number inside brackets [].
The first item has index 1, the second item has index 2, and so on:
Example:
>fruits <- c("banana", "apple", "orange")

# Access the first item (banana)


> fruits[1]
[1] "banana"
You can also access multiple elements by referring to different index positions with
the c() function:
Example:
>fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Access the first and third item (banana and orange)


> fruits[c(1, 3)]
[1] "banana" "orange"
You can also use negative index numbers to access all items except the ones
specified:
Example:
>fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Access all items except for the first item


> fruits[c(-1)]
[1] "apple" "orange" "mango" "lemon"
Change an Item:
To change the value of a specific item, refer to the index number:
Example:
>fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Change "banana" to "pear"


> fruits[1] <- "pear"
# Print fruits
> fruits
[1] "pear" "apple" "orange" "mango" "lemon"

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 34 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Generating Sequenced Vectors


One of the examples on top, showed you how to create a vector with numerical values
in a sequence with the : operator:
Example:
>numbers <- 1:10
> numbers
[1] 1 2 3 4 5 6 7 8 9 10
seq(): To make bigger or smaller steps in a sequence, use the seq() function:
Example:
>numbers <- seq(from = 0, to = 100, by = 20)
> numbers
[1] 0 20 40 60 80 100
> numbers<-seq(from=1.1,to=2,length=10)
> numbers
[1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Note: The seq() function has three parameters: from is where the sequence starts, to is
where the sequence stops, and by is the interval of the sequence.
Repeating Vectors
To repeat vectors, use the rep() function:
Example:
Repeat each value:
>repeat_each <- rep(c(1,2,3), each = 3)

> repeat_each
[1] 1 1 1 2 2 2 3 3 3
Example:
Repeat the sequence of the vector:
>repeat_times <- rep(c(1,2,3), times = 3)

> repeat_times
[1] 1 2 3 1 2 3 1 2 3
Example:
Repeat each value independently:
>repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))
> repeat_indepent
[1] 1 1 1 1 1 2 2 3
Using all() and any()
The any() and all() functions are handy shortcuts. They report whether any or all of
their arguments are TRUE.
> x <- 1:10
> any(x > 8)
[1] TRUE
> any(x > 88)
[1] FALSE
> all(x > 88)
[1] FALSE
> all(x > 0)
[1] TRUE

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 35 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

For example, suppose that R executes the following:


> any(x > 8)
It first evaluates x > 8, yielding this:
(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE)
Vectorized Operations
One of the most effective ways to achieve speed in R code is to use operations that
are vectorized, meaning that a function applied to a vector is actually applied individually to
each element.
Vector In, Vector Out :
Vectorized example is
> u <- c(5,2,8)
> v <- c(1,3,9)
>u>v
[1] TRUE FALSE FALSE
Here, the > function was applied to u[1] and v[1], resulting in TRUE, then to u[2] and
v[2], resulting in FALSE, and so on. A key point is that if an R function uses vectorized
operations, it, too, is vectorized, thus enabling a potential speedup. Here is an example:
> w <- function(x) return(x+1)
> w(u)
[1] 6 3 9
Here, w() uses +, which is vectorized, so w() is vectorized as well. As you can see,
there is an unlimited number of vectorized functions, as complex ones are built up from
simpler ones.
Vector In, Matrix Out
The vectorized functions we’ve been working with so far have scalar return values.
Calling sqrt() on a number gives us a number. If we apply this function to an eight-element
vector, we get eight numbers, thus another eightelement vector, as output.
But what if our function itself is vector-valued, as z12() is here:
z12 <- function(z) return(c(z,z^2))
Applying z12() to 5, say, gives us the two-element vector (5,25). If we apply this
function to an eight-element vector, it produces 16 numbers:
x <- 1:8
> z12(x)
[1] 1 2 3 4 5 6 7 8 1 4 9 16 25 36 49 64
It might be more natural to have these arranged as an 8-by-2 matrix, which we can
do with the matrix function:
> matrix(z12(x),ncol=2)
[,1] [,2]
[1,] 1 1
[2,] 2 4
[3,] 3 9
[4,] 4 16
[5,] 5 25
[6,] 6 36
[7,] 7 49
[8,] 8 64
But we can streamline things using sapply() (or simplify apply). The call sapply(x,f)
applies the function f() to each element of x and then converts the result to a matrix.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 36 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

> z12 <- function(z) return(c(z,z^2))


> sapply(1:8,z12)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 4 9 16 25 36 49 64
We do get a 2-by-8 matrix, not an 8-by-2 one, but it’s just as useful this way.
NA and NULL Values
In R statistical data sets, we often encounter missing data, which we represent in R
with the value NA. NULL, on the other hand, represents that the value in question simply
doesn’t exist, rather than being existent but unknown.
Using NA:
In R’s statistical functions, we can instruct the function to skip over any missing
values, or NAs. Here is an example:
> x <- c(88,NA,12,168,13)
>x
[1] 88 NA 12 168 13
> mean(x)
[1] NA
> mean(x,na.rm=T)
[1] 70.25
> x <- c(88,NULL,12,168,13)
> mean(x)
[1] 70.25
In the first call, mean() refused to calculate, as one value in x was NA. But by setting
the optional argument na.rm (NA remove) to true (T), we calculated the mean of the
remaining elements.
Using NULL:
One use of NULL is to build up vectors in loops, in which each iteration adds another
element to the vector. In this simple example, we build up a vector of even numbers:
# build up a vector of the even numbers in 1:10
> z <- NULL
> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)
>z
[1] 2 4 6 8 10
But the point here is to demonstrate the difference between NA and NULL. If we were
to use NA instead of NULL in the preceding example, we would pick up an unwanted NA:
> z <- NA
> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)
>z
[1] NA 2 4 6 8 10
NULL values really are counted as nonexistent, as you can see here:
> u <- NULL
> length(u)
[1] 0
> v <- NA
> length(v)
[1] 1
NULL is a special R object with no mode.

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 37 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

UNIT-IV
MATRICES AND ARRAYS
Matrices:
A matrix is a vector with two additional attributes: the number of rows and the
number of columns. The matrices are vectors, they also have modes, such as numeric and
character.
Creating Matrices:
 A matrix is a two dimensional data set with columns and rows.
 A column is a vertical representation of data, while a row is a horizontal
representation of data.
 A matrix can be created with the matrix() function. Specify
the nrow and ncol parameters to get the amount of rows and columns.
Example:
# Create a matrix
> a <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
# Print the matrix
> a
Note: Remember the c() function is used to concatenate items together.
You can also create a matrix with strings:
Example:
> fruits <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
> fruits
Access Matrix Items:
You can access the items by using [ ] brackets. The first number "1" in the bracket
specifies the row-position, while the second number "2" specifies the column-position:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

thismatrix[1, 2]
The whole row can be accessed if you specify a comma after the number in the
bracket:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix[2,]
The whole column can be accessed if you specify a comma before the number in the
bracket:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix[,2]
Access More Than One Row:
More than one row can be accessed if you use the c() function:
Example:
thismatrix <-
matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear", "melon", "fig"),
nrow = 3, ncol = 3)
> thismatrix[c(1,2),]
Access More Than One Column:
More than one column can be accessed if you use the c() function:
Example:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 38 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

thismatrix <- atrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear",


"melon", "fig"), nrow = 3, ncol = 3)
> thismatrix[, c(1,2)]
General Matrix Operations
Linear Algebra Operations on Matrices:
You can perform various linear algebra operations on matrices, such as matrix
multiplication, matrix scalar multiplication, and matrix addition.
> y <- matrix(c(1,2,3,4),nrow=2,ncol=2)
>y
[,1] [,2]
[1,] 1 3
[2,] 2 4
> y %*% y # mathematical matrix multiplication
[,1] [,2]
[1,] 7 15
[2,]10 22
> 3*y # mathematical multiplication of matrix by scalar
[,1] [,2]
[1,] 3 9
[2,] 6 12
> y+y # mathematical matrix addition
[,1] [,2]
[1,] 2 6
[2,] 4 8

Matrix Indexing:
>z
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 1 0
[3,] 3 0 1
[4,] 4 0 0
> z[,2:3]
[,1] [,2]
[1,] 1 1
[2,] 1 0
[3,] 0 1
[4,] 0 0
Here, we requested the submatrix of z consisting of all elements with column numbers
2 and 3 and any row number. This extracts the second and third columns. Here’s an example
of extracting rows instead of columns:
>y
[,1] [,2]
[1,]11 12
[2,]21 22
[3,]31 32
> y[2:3,]

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 39 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

[,1] [,2]
[1,]21 22
[2,]31 32
> y[2:3,2]
[1] 22 32
You can also assign values to submatrices:
>y
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> y[c(1,3),] <- matrix(c(1,1,8,12),nrow=2)
>y
[,1] [,2]
[1,] 1 8
[2,] 2 5
[3,] 1 12
Here, we assigned new values to the first and third rows of y. And here’s another
example of assignment to submatrices:
> x <- matrix(nrow=3,ncol=3)
> y <- matrix(c(4,5,2,3),nrow=2)
>y
[,1] [,2]
[1,] 4 2
[2,] 5 3
> x[2:3,2:3] <- y
>x
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA 4 2
[3,] NA 5 3
Negative subscripts, used with vectors to exclude certain elements, work the same
way with matrices:
>y
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> y[-2,]
[,1] [,2]
[1,] 1 4
[2,] 3 6
In the second command, we requested all rows of y except the second.
Filtering on Matrices:
Filtering can be done with matrices, just as with vectors. You must be careful with the
syntax, though. Let’s start with a simple example:
>x

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 40 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

x
[1,] 1 2
[2,] 2 3
[3,] 3 4
> x[x[,2] >= 3,]
x
[1,] 2 3
[2,] 3 4
> j <- x[,2] >= 3
>j
[1] FALSE TRUE TRUE
Here, we look at the vector x[,2], which is the second column of x, and determine
which of its elements are greater than or equal to 3. The result, assigned to j, is a Boolean
vector. Now, use j in x:
> x[j,]
x
[1,] 2 3
[2,] 3 4
• The object x[,2] is a vector.
• The operator >= compares two vectors.
• The number 3 was recycled to a vector of 3s.

Applying Functions to Matrix


One of the most famous and most used features of R is the *apply() family of
functions, such as apply(), tapply(), and lapply().
Using the apply() Function:
This is the general form of apply for matrices:
apply(m,dimcode,f,fargs)
where the arguments are as follows:
 m is the matrix.
 dimcode is the dimension, equal to 1 if the function applies to rows or 2 for columns.
 f is the function to be applied.
 fargs is an optional set of arguments to be supplied to f.
For example, here we apply the R function mean() to each column of a matrix z:
>z
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> apply(z,2,mean)
[1] 2 5
In this case, we could have used the colMeans() function, but this provides a simple
example of using apply().
A function you write yourself is just as legitimate for use in apply() as any R built-in
function such as mean(). Here’s an example using our own function f:
>z
[,1] [,2]
[1,] 1 4

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 41 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

[2,] 2 5
[3,] 3 6
> f <- function(x) x/c(2,8)
> y <- apply(z,1,f)
>y
[,1] [,2] [,3]
[1,] 0.5 1.000 1.50
[2,] 0.5 0.625 0.75
Adding and Deleting Matrix Rows and Columns
Technically, matrices are of fixed length and dimensions, so we cannot add or delete
rows or columns. However, matrices can be reassigned, and thus we can achieve the same
effect as if we had directly done additions or deletions.
Changing the Size of a Matrix:
Analogous operations can be used to change the size of a matrix. For instance, the
rbind() (row bind) and cbind() (column bind) functions let you add rows or columns to a
matrix.
Add Rows and Columns:
Use the cbind() function to add additional columns in a Matrix:
Example:
thismatrix <-
matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear", "melon", "fig"),
nrow = 3, ncol = 3)
newmatrix <- cbind(thismatrix, c("strawberry", "blueberry", "raspberry"))
# Print the new matrix
> newmatrix
Note: The cells in the new column must be of the same length as the existing matrix.
Use the rbind() function to add additional rows in a Matrix:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear",
"melon", "fig"), nrow = 3, ncol = 3)
> newmatrix <- rbind(thismatrix, c("strawberry", "blueberry", "raspberry"))
# Print the new matrix
> newmatrix
Note: The cells in the new row must be of the same length as the existing matrix.
Remove Rows and Columns:
Use the c() function to remove rows and columns in a Matrix:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange", "mango", "pineapple"), nrow
= 3, ncol =2)
#Remove the first row and the first column
thismatrix <- thismatrix[-c(1), -c(1)]
> thismatrix
Check if an Item Exists:
To find out if a specified item is present in a matrix, use the %in% operator:
Example:
Check if "apple" is present in the matrix:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

> "apple" %in% thismatrix

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 42 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Amount of Rows and Columns:


Use the dim() function to find the amount of rows and columns in a Matrix:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

> dim(thismatrix)
Matrix Length:
Use the length() function to find the dimension of a Matrix:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)

> length(thismatrix)
Total cells in the matrix is the number of rows multiplied by number of columns.
In the example above: Dimension = 2*2 = 4.
Combine two Matrices:
Again, you can use the rbind() or cbind() function to combine two or more matrices
together:
Example:
# Combine matrices
Matrix1 <- matrix(c("apple", "banana", "cherry", "grape"), nrow = 2, ncol = 2)
Matrix2 <- matrix(c("orange", "mango", "pineapple", "watermelon"), nrow = 2, ncol = 2)

# Adding it as a rows
Matrix_Combined <- rbind(Matrix1, Matrix2)
Matrix_Combined

# Adding it as a columns
Matrix_Combined <- cbind(Matrix1, Matrix2)
Matrix_Combined
ARRAYS
Arrays:
 Compared to matrices, arrays can have more than two dimensions.
 We can use the array() function to create an array, and the dim parameter to specify
the dimensions:
Example:
# An array with one dimension with values ranging from 1 to 24
thisarray <- c(1:24)
> thisarray
# An array with more than one dimension
multiarray <- array(thisarray, dim = c(4, 3, 2))
> multiarray
Example Explained:
 In the example above we create an array with the values 1 to 24.
 How does dim=c(4,3,2) work?
The first and second number in the bracket specifies the amount of rows and columns.
 The last number in the bracket specifies how many dimensions we want.
Note: Arrays can only have one data type.
Access Array Items:
You can access the array elements by referring to the index position. You can use
the [] brackets to access the desired elements from an array:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 43 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Example:
thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))
> multiarray[2, 3, 2]
The syntax is as follow: array[row position, column position, matrix level]
You can also access the whole row or column from a matrix in an array, by using
the c() function:
Example:
thisarray <- c(1:24)

# Access all the items from the first row from matrix one
multiarray <- array(thisarray, dim = c(4, 3, 2))
> multiarray[c(1),,1]
# Access all the items from the first column from matrix one
multiarray <- array(thisarray, dim = c(4, 3, 2))
> multiarray[,c(1),1]
A comma (,) before c() means that we want to access the column.
A comma (,) after c() means that we want to access the row.
Check if an Item Exists:
To find out if a specified item is present in an array, use the %in% operator:
Example: Check if the value "2" is present in the array:
thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))

> 2 %in% multiarray


Amount of Rows and Columns:
Use the dim() function to find the amount of rows and columns in an array:
Example:
thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))

> dim(multiarray)
Array Length:
Use the length() function to find the dimension of an array:
Example:
thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))

> length(multiarray)
LISTS
Lists:
 A list in R can contain many different data types inside it. A list is a collection of data
which is ordered and changeable.
 To create a list, use the list() function:
Creating Lists:
Example:
# List of strings
thislist <- list("apple", "banana", "cherry")
# Print the list
> thislist

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 44 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

General List Operations


List Indexing:
You can access a list component in several different ways. You can access the list
items by referring to its index number, inside brackets. The first item has index 1, the
second item has index 2, and so on:

Example:
thislist <- list("apple", "banana", "cherry",”mango”,”orange”)
> thislist[1]
> thislist[-3]
Range of Indexes:
You can specify a range of indexes by specifying where to start and where to end the
range, by using the : operator:
Example:
Return the second, third, fourth and fifth item:
thislist <- list("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")

>(thislist)[2:5]
Note: The search will start at index 2 (included) and end at index 5 (included).
Remember that the first item has index 1.
Example:
thislist <- list("apple", "banana", "cherry",”mango”,”orange”)
> thislist[1:3]
> thislist[-3:-1]
Change Item Value:
To change the value of a specific item, refer to the index number:
Example:
thislist <- list("apple", "banana", "cherry")
thislist[1] <- "mango"

# Print the updated list


> thislist
Adding and Deleting List Elements
The operations of adding and deleting list elements arise in a surprising number of
contexts. This is especially true for data structures in which lists form the foundation, such
as data frames and R classes. New components can be added after a list is created.
Add List Items:
To add an item to the end of the list, use the append() function:
Example:
Add "orange" to the list:
thislist <- list("apple", "banana", "cherry")
> append(thislist, "orange")
To add an item to the right of a specified index, add "after=index number" in
the append() function:
Example:
Add "orange" to the list after "banana" (index 2):
thislist <- list("apple", "banana", "cherry")
> append(thislist, "orange", after = 2)

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 45 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Remove List Items:


You can also remove list items. The following example creates a new, updated list
without an "apple" item:
Example:
Remove "apple" from the list:
thislist <- list("apple", "banana", "cherry")
> newlist <- thislist[-1]
# Print the new list
> newlist
List Length:
To find out how many items a list has, use the length() function:
Example:
thislist <- list("apple", "banana", "cherry")
> length(thislist)
Check if Item Exists:
To find out if a specified item is present in a list, use the %in% operator:
Example:
Check if "apple" is present in the list:
thislist <- list("apple", "banana", "cherry")
> "apple" %in% thislist
Loop Through a List:
You can loop through the list items by using a for loop:
Example:
Print all items in the list, one by one:
thislist <- list("apple", "banana", "cherry")

for (x in thislist) {
print(x)
}
Join Two Lists:
There are several ways to join, or concatenate, two or more lists in R. The most
common way is to use the c() function, which combines two elements together:
Example:
list1 <- list("a", "b", "c")
list2 <- list(1,2,3)
list3 <- c(list1,list2)
# print output
> list3
Accessing List Components and Values
If the components in a list do have tags, as is the case with name, salary, and union
for j, you can obtain them via names():

> names(j)
[1] "name" "salary" "union"
To obtain the values, use unlist():
> v <- unlist(j)
>v
name salary union
"Joe" "55000" "TRUE"

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 46 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

> class(v)
[1] "character"
The return value of unlist() is a vector—in this case, a vector of character strings. Note
that the element names in this vector come from the components in the original list.
On the other hand, if we were to start with numbers, we would get numbers.
> z <- list(a=5,b=12,c=13)
> y <- unlist(z)
> class(y)
[1] "numeric"
>y
abc
5 12 13
So the output of unlist() in this case was a numeric vector. What about a mixed case?
> w <- list(a=5,b="xyz")
> wu <- unlist(w)
> class(wu)
[1] "character"
> wu
ab
"5" "xyz"
Here, R chose the least common denominator: character strings. This sounds like
some kind of precedence structure, and it is. As R’s help for unlist() states:
Where possible the list components are coerced to a common mode during the
unlisting, and so the result often ends up as a character vector. Vectors will be coerced to
the highest type of the components in the hierarchy NULL < raw < logical < integer < real <
complex < character < list < expression: pairlists are treated as lists.

Applying Functions to Lists


Two functions are handy for applying functions to lists: lapply and sapply.
Using the lapply() and sapply() Functions:
The function lapply() (for list apply) works like the matrix apply() function, calling the
specified function on each component of a list (or vector coerced to a list) and returning
another list. Here’s an example:

> lapply(list(1:3,25:29),median)
[[1]]
[1] 2
[[2]]
[1] 27
R applied median() to 1:3 and to 25:29, returning a list consisting of 2 and 27. In
some cases, such as the example here, the list returned by lapply() could be simplified to a
vector or matrix. This is exactly what sapply() (for simplified [l]apply) does.

> sapply(list(1:3,25:29),median)
[1] 2 27

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 47 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

UNIT-V
DATA FRAMES
Data Frames:
 Data Frames are data displayed in a format as a table.
 Data Frames can have different types of data inside it. While the first column can
be character, the second and third can be numeric or logical. However, each column
should have the same type of data.
 Use the data.frame() function to create a data frame:
Creating Data Frames:
Example:
# Create a data frame
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Print the data frame


Data_Frame
Example:
> Names <- c("Raja","Bhavana")
> ages <- c(12,10)
> d <- data.frame(kids,ages,stringsAsFactors=FALSE)
> d # matrix-like viewpoint

Names ages
Raja 12
Bhavana 10

The first two arguments in the call to data.frame() are clear: We wish to produce a
data frame from our two vectors: Names and ages. However, that third argument,
stringsAsFactors=FALSE requires more comment. If the named argument stringsAsFactors is
not specified, then by default, stringsAsFactors will be TRUE. (You can also use options() to
arrange the opposite default.) This means that if we create a data frame from a character
vector—in this case, Namess—R will convert that vector to a factor.
Summarize the Data:
Use the summary() function to summarize the data from a Data Frame:
Example:
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45) )
# Print the data frame and summary
> Data_Frame
> summary(Data_Frame)
Accessing Data Frames:
We can use single brackets [ ], double brackets [[ ]] or $ to access columns from a
data frame:
Example:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 48 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Print the data frame
> Data_Frame[1]
> Data_Frame[["Training"]]
> Data_Frame$Training
Other Matrix-Like Operations
Various matrix operations also apply to data frames. Most notably and usefully, we
can do filtering to extract various sub-data frames of interest.
Extracting Sub-data Frames:
As mentioned, a data frame can be viewed in row-and-column terms. In particular, we
can extract sub-data frames by rows or columns. Here’s an example:
> examsquiz[2:5,]
Exam.1 Exam.2 Quiz
2 3.3 2 3.7
3 4.0 4 4.0
4 2.3 0 3.3
5 2.3 1 3.3
> examsquiz[2:5,2]
[1] 2 4 0 1
> class(examsquiz[2:5,2])
[1] "numeric"
> examsquiz[2:5,2,drop=FALSE]
Exam.2
22
34
40
51
> class(examsquiz[2:5,2,drop=FALSE])
[1] "data.frame"
Note that in that second call, since examsquiz[2:5,2] is a vector, R created a vector
instead of another data frame. By specifying drop=FALSE, we can keep it as a (onecolumn)
data frame.
We can also do filtering. Here’s how to extract the subframe of all students whose first
exam score was at least 3.8:
> examsquiz[examsquiz$Exam.1 >= 3.8,]
Exam.1 Exam.2 Quiz
3 4 4.0 4.0
9 4 3.3 4.0
11 4 4.0 4.0
14 4 0.0 4.0
16 4 3.7 4.0
19 4 4.0 4.0
22 4 4.0 4.0
25 4 4.0 3.3
29 4 3.0 3.7
SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 49 of 59
2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Using rbind() and cbind() Functions


The rbind() and cbind() matrix functions introduced in data frames that you have
compatible sizes, of course. We can use cbind() to add a new column that has the same
length as the existing columns.
In using rbind() to add a row, the added row is typically in the form of another data
frame or list.
Add Rows:
Use the rbind() function to add new rows in a Data Frame:
Example:
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new row


New_row_DF <- rbind(Data_Frame, c("Strength", 110, 110))

# Print the new row


> New_row_DF
Add Columns:
Use the cbind() function to add new columns in a Data Frame:
Example:
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new column


New_col_DF <- cbind(Data_Frame, Steps = c(1000, 6000, 2000))

# Print the new column


> New_col_DF
Remove Rows and Columns:
Use the c() function to remove rows and columns in a Data Frame:
Example:
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Remove the first row and column


Data_Frame_New <- Data_Frame[-c(1), -c(1)]

# Print the new data frame


> Data_Frame_New
Amount of Rows and Columns:
Use the dim() function to find the amount of rows and columns in a Data Frame:
Example:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 50 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
> dim(Data_Frame)
You can also use the ncol() function to find the number of columns and nrow() to find
the number of rows:
Example:
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
> ncol(Data_Frame)
> nrow(Data_Frame)
Data Frame Length:
Use the length() function to find the number of columns in a Data Frame (similar
to ncol()):
Example:
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# print the output


> length(Data_Frame)
Combining Data Frames:
Use the rbind() function to combine two or more data frames in R vertically:
Example:
Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame2 <- data.frame (


Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)
# print the output
> New_Data_Frame <- rbind(Data_Frame1, Data_Frame2)
> New_Data_Frame
And use the cbind() function to combine two or more data frames in R horizontally:
Example:
Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 51 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Data_Frame4 <- data.frame (


Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)

# print the output


> New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4)
> New_Data_Frame1

FACTORS AND TABLES


Factors form the basis for many of R’s powerful operations, including many of those
performed on tabular data. The motivation for factors comes from the notion of nominal, or
categorical, variables in statistics.
Factors:
Factors are used to categorize data. Examples of factors are:
 Demography: Male/Female
 Music: Rock, Pop, Classic, Jazz
 Training: Strength, Stamina
To create a factor, use the factor() function and add a vector as argument:
Example:
# Create a factor
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

# Print the factor


> music_genre
Result:
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
You can see from the example above that the factor has four levels (categories):
Classic, Jazz, Pop and Rock.
To only print the levels, use the levels() function:
Example:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

> levels(music_genre)
Result:
[1] "Classic" "Jazz" "Pop" "Rock"
You can also set the levels, by adding the levels argument inside the factor() function:
Example:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"),
levels = c("Classic", "Jazz", "Pop", "Rock", "Other"))

> levels(music_genre)
Result:
[1] "Classic" "Jazz" "Pop" "Rock" "Other"

Factor Length:
Use the length() function to find out how many items there are in the factor:
Example:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 52 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

> length(music_genre)
Result:
[1] 8
Access Factors:
To access the items in a factor, refer to the index number, using [] brackets:
Example:
Access the third item:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

> music_genre[3]
Result:
[1] Classic
Levels: Classic Jazz Pop Rock

Change Item Value:


To change the value of a specific item, refer to the index number:
Example:
Change the value of the third item:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

music_genre[3] <- "Pop"


> music_genre[3]
Result:
[1] Pop
Levels: Classic Jazz Pop Rock
Note that you cannot change the value of a specific item if it is not already specified in
the factor. The following example will produce an error:
Example:
Trying to change the value of the third item ("Classic") to an item that does not
exist/not predefined ("Opera"):
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
music_genre[3] <- "Opera"
> music_genre[3]
Result:
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "Opera") :
invalid factor level, NA generated
However, if you have already specified it inside the levels argument, it will work:
Example:
Change the value of the third item:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"),
levels = c("Classic", "Jazz", "Pop", "Rock", "Opera"))
music_genre[3] <- "Opera"
> music_genre[3]
Result:
[1] Opera
Levels: Classic Jazz Pop Rock Opera

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 53 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Common Functions Used with Factors


With factors, we have yet another member of the family of apply functions, tapply.
We’ll look at that function, as well as two other functions commonly used with factors: split()
and by().
The tapply() Function:
The operation performed by tapply() is to (temporarily) split x into groups, each group
corresponding to a level of the factor (or a combination of levels of the factors in the case of
multiple factors), and then apply g() to the resulting subvectors of x. Here’s a little example:
> ages <- c(25,26,55,37,21,42)
> affils <- c("R","D","D","R","U","D")
> tapply(ages,affils,mean)
D R U
41 31 21
As an example, suppose that we have an economic data set that includes variables for
gender, age, and income. Here, the call tapply(x,f,g) might have x as income and f as a pair
of factors: one for gender and the other coding whether the person is older or younger than
25. We may be interested in finding mean income, broken down by gender and age. If we
set g() to bemean(), tapply() will return the mean incomes in each of four subgroups:
• Male and under 25 years old
• Female and under 25 years old
• Male and over 25 years old
• Female and over 25 years old
Here’s an example of that setting:
> d <- data.frame(list(gender=c("M","M","F","M","F","F"),
age=c(47,59,21,32,33,24),
income=c(55000,88000,32450,76500,123000,45650)))
>d
gender age income
M 47 55000
M 59 88000
F 21 32450
M 32 76500
F 33 123000
F 24 45650
The split() Function:
In contrast to tapply(), which splits a vector into groups and then applies a specified
function on each group, split() stops at that first stage, just forming the groups.
The basic form, without bells and whistles, is split(x,f), with x and f playing roles
similar to those in the call tapply(x,f,g); that is, x being a vector or data frame and f being a
factor or a list of factors. The action is to split x into groups, which are returned in a list.
(Note that x is allowed to be a data frame with split() but not with tapply().)
Let’s try it out with our earlier example.
>d
gender age income over25
M 47 55000 1
M 59 88000 1
F 21 32450 0
M 32 76500 1

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 54 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

F 33 123000 1
F 24 45650 0
> split(d$income,list(d$gender,d$over25))
$F.0
[1] 32450 45650
$M.0
numeric(0)
$F.1
[1] 123000
$M.1
[1] 55000 88000 76500
The output of split() is a list, and recall that list components are denoted by dollar
signs. So the last vector, for example, was named "M.1" to indicate that it was the result of
combining "M" in the first factor and 1 in the second.
The by() Function:
The function to be applied can be multivariate—for example, range()—but the input
must be a vector. Yet the input for regression is a matrix (or data frame) with at least two
columns: one for the predicted variable and one or more for predictor variables. In our
abalone data application, the matrix would consist of a column for the diameter data and a
column for length. The by() function can be used here. It works like tapply() (which it calls
internally, in fact), but it is applied to objects rather than vectors. Here’s how to use it for
the desired regression analyses:
> aba <- read.csv("abalone.data",header=TRUE)
> by(aba,aba$Gender,function(m) lm(m[,2]~m[,3]))
aba$Gender: F
Call:
lm(formula = m[, 2] ~ m[, 3])
Coefficients:
(Intercept) m[, 3]
0.04288 1.17918
Calls to by() look very similar to calls to tapply(), with the first argument specifying
our data, the second the grouping factor, and the third the function to be applied to each
group.
Just as tapply() forms groups of indices of a vector according to levels of a factor, this
by() call finds groups of row numbers of the data frame aba. That creates three sub data
frames: one for each gender level of M, F, and I. The anonymous function we defined
regresses the second column of its matrix argument m against the third column.

Tables
To begin exploring R tables, consider this example:
> u <- c(22,8,33,6,8,29,-2)
> fl <- list(c(5,12,13,12,13,5,13),c("a","bc","a","a","bc","a","a"))
> tapply(u,fl,length)
a bc
5 2 NA
12 1 1
13 2 1
Here, tapply() again temporarily breaks u into sub vectors, as you saw earlier, and
then applies the length() function to each sub vector. (Note that this is independent of

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 55 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

what’s in u. Our focus now is purely on the factors.) Those sub vector lengths are the counts
of the occurrences of each of the 3 × 2 = 6 combinations of the two factors. For instance, 5
occurred twice with "a" and not at all with "bc"; hence the entries 2 and NA in the first row of
the output. In statistics, this is called a contingency table. There is one problem in this
example: the NA value. It really should be 0, meaning that in no cases did the first factor
have level 5 and the second have level "bc". The table() function creates contingency tables
correctly.
> table(fl)
fl.2
fl.1 a bc
521
12 1 1
13 1 0
The first argument in a call to table() is either a factor or a list of factors. The two
factors here were (5,12,13,12,13,5,13) and ("a","bc","a","a","bc", "a","a"). In this case, an
object that is interpretable as a factor is counted as one.
Typically a data frame serves as the table() data argument. Suppose for instance the
file ct.dat consists of election-polling data, in which candidate X is running for reelection. The
ct.dat file looks like this:
"Vote for X" "Voted For X Last Time"
"Yes" "Yes"
"Yes" "No"
"No" "No"
"Not Sure" "Yes"
"No" "No"
In the usual statistical fashion, each row in this file represents one subject under
study. In this case, we have asked five people the following two questions:
• Do you plan to vote for candidate X?
• Did you vote for X in the last election?
This gives us five rows in the data file.
Let’s read in the file:
> ct <- read.table("ct.dat",header=T)
> ct
Vote.for.X Voted.for.X.Last.Time
1 Yes Yes
2 Yes No
3 No No
4 Not Sure Yes
5 No No
We can use the table() function to compute the contingency table for this data:
> cttab <- table(ct)
> cttab
Voted.for.X.Last.Time
Vote.for.X No Yes
No 2 0
Not Sure 0 1
Yes 1 1

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 56 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Matrix/Array-Like Operations on Tables:


Just as most (nonmathematical) matrix/array operations can be used on data frames,
they can be applied to tables, too. (This is not surprising, given that the cell counts portion
of a table object is an array.)
For example, we can access the table cell counts using matrix notation. Let’s apply
this to our voting example from the previous section.
> class(cttab)
[1] "table"
> cttab[1,1]
[1] 2
> cttab[1,]
No Yes
20
In the second command, even though the first command had shown that cttab had
class “cttab”, we treated it as a matrix and printed out its “[1,1] element.” Continuing this
idea, the third command printed the first column of this “matrix.” We can multiply the matrix
by a scalar. For instance, here’s how to change cell counts to proportions:
> ctt/5
Voted.for.X.Last.Time
Vote.for.X No Yes
No 0.4 0.0
Not Sure 0.0 0.2
Yes 0.2 0.2
In statistics, the marginal values of a variable are those obtained when this variable is
held constant while others are summed. In the voting example, the marginal values of the
Vote.for.X variable are 2 + 0 = 2, 0 + 1 = 1, and 1 + 1 = 2. We can of course obtain these
via the matrix apply() function:
> apply(ctt,1,sum)
No Not Sure Yes
212
Note that the labels here, such as No, came from the row names of the matrix, which
table() produced. But R supplies a function addmargins() for this purpose—that is, to find
marginal totals. Here’s an example:
> addmargins(cttab)
Voted.for.X.Last.Time
Vote.for.X No Yes Sum
No 2 0 2
Not Sure 0 1 1
Yes 1 1 2
Sum 3 2 5
Here, we got the marginal data for both dimensions at once, conveniently
superimposed onto the original table. We can get the names of the dimensions and levels
through dimnames(), as follows:
> dimnames(cttab)
$Vote.for.X
[1] "No" "Not Sure" "Yes"
$Voted.for.X.Last.Time
[1] "No" "Yes"

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 57 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

Extracting a Sub table:


Let’s continue working with our voting example:
> cttab
Voted.for.X.Last.Time
Vote.for.X No Yes
No 2 0
Not Sure 0 1
Yes 1 1
The function subtable() below performs sub table extraction. It has two
arguments:
• tbl: The table of interest, of class "table".
• subnames: A list specifying the desired subtable extraction. Each component of this list is
named after some dimension of tbl, and the value of that component is a vector of the
names of the desired levels.

list(Vote.for.X=c("No","Yes"),Voted.for.X.Last.Time=c("No","Yes"))
We can now call the function.
> subtable(cttab,list(Vote.for.X=c("No","Yes"),
+ Voted.for.X.Last.Time=c("No","Yes")))
Voted.for.X.Last.Time
Vote.for.X No Yes
No 2 0
Yes 1 1
Now that we have a feel for what the function does, let’s take a look at its innards.
1 subtable <- function(tbl,subnames) {
2 # get array of cell counts in tbl
3 tblarray <- unclass(tbl)
4 # we'll get the subarray of cell counts corresponding to subnames by
5 # calling do.call() on the "[" function; we need to build up a list
6 # of arguments first
7 dcargs <- list(tblarray)
8 ndims <- length(subnames) # number of dimensions
9 for (i in 1:ndims) {
10 dcargs[[i+1]] <- subnames[[i]]
11 }
12 subarray <- do.call("[",dcargs)
13 # now we'll build the new table, consisting of the subarray, the
14 # numbers of levels in each dimension, and the dimnames() value, plus
15 # the "table" class attribute
16 dims <- lapply(subnames,length)
17 subtbl <- array(subarray,dims,dimnames=subnames)
18 class(subtbl) <- "table"
19 return(subtbl)
20 }

The former operation can be done via R’s array() function, which has the following
arguments:

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 58 of 59


2nd BCA(Data Science)/IV-Semester ‘R’ - Language

• data: The data to be placed into the new array. In our case, this is subarray.
• dim: The dimension lengths (number of rows, number of columns, number of layers, and
so on). In our case, this is the value ndims, computed in line 16.
• dimnames: The dimension names and the names of their levels, already given to us by the
user as the argument subnames. This was a somewhat conceptually complex function to
write, but it gets easier once you’ve mastered the inner structures of the "table" class.

Maths Functions
R includes a number of other functions that are handy for working with tables and
factors. Two of them here: aggregate() and cut().
The aggregate() Function:
The aggregate() function calls tapply() once for each variable in a group. For example,
in the abalone data, we could find the median of each variable, broken down by gender, as
follows:
> aggregate(aba[,-1],list(aba$Gender),median)
Group.1 Length Diameter Height WholeWt ShuckedWt ViscWt ShellWt Rings
F 0.590 0.465 0.160 1.03850 0.44050 0.2240 0.295 10
I 0.435 0.335 0.110 0.38400 0.16975 0.0805 0.113 8
M 0.580 0.455 0.155 0.97575 0.42175 0.2100 0.276 10
The first argument, aba[,-1], is the entire data frame except for the first column,
which is Gender itself. The second argument, which must be a list, is our Gender factor as
before. Finally, the third argument tells R to compute the median on each column in each of
the data frames generated by the sub grouping corresponding to our factors. There are three
such subgroups in our example here and thus three rows in the output of aggregate().

The cut() Function:


A common way to generate factors, especially for tables, is the cut() function. You
give it a data vector x and a set of bins defined by a vector b. The function then determines
which bin each of the elements of x falls into. The following is the form of the call we’ll use
here:

y <- cut(x,b,labels=FALSE)
where the bins are defined to be the semi-open intervals (b[1],b[2]],
(b[2],b[3]],.... Here’s an example:
>z
[1] 0.88114802 0.28532689 0.58647376 0.42851862 0.46881514 0.24226859 0.05289197
[8] 0.88035617
> seq(from=0.0,to=1.0,by=0.1)
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
> binmarks <- seq(from=0.0,to=1.0,by=0.1)
> cut(z,binmarks,labels=F)
[1] 9 3 6 5 5 3 1 9
This says that z[1], 0.88114802, fell into bin 9, which was (0,0,0.1]; z[2],
0.28532689, fell into bin 3; and so on.
This returns a vector, as seen in the example’s result. But we can convert it into a
factor and possibly then use it to build a table. For instance, you can imagine using this to
write your own specialized histogram function. (The R function findInterval() would be useful
for this, too.).

SDHR DC,Tirupathi. T.Venkatapathi Raju , M.Tech(CSE) Page 59 of 59

You might also like