0% found this document useful (0 votes)
16 views

Document (1)

data science

Uploaded by

Akansha S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Document (1)

data science

Uploaded by

Akansha S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Here’s a concise set of notes for Unit-2: R Programming Basics:

1. Overview of R

R: A programming language and environment for statistical computing and


graphics.

Open-source, supports data manipulation, statistical modeling, and


visualization.

Popular IDE: RStudio.

Features:

Comprehensive statistical analysis

Extensive graphical capabilities

Integration with other languages (C, Python)

2. R Data Types and Objects


Basic Data Types:

Numeric: Decimal numbers (e.g., 3.14, 10).

Integer: Whole numbers (e.g., as.integer(4)).

Character: Strings (e.g., “Hello”).

Logical: Boolean values (TRUE, FALSE).

Complex: Complex numbers (e.g., 1+2i).

Objects in R:

Vectors: One-dimensional data (e.g., c(1, 2, 3)).

Matrices: Two-dimensional arrays (e.g., matrix(1:9, nrow=3)).

Lists: Collection of different types of elements (e.g., list(1, “a”, TRUE)).

Data Frames: Tabular data (e.g., data.frame(a=1:3, b=c(“x”, “y”, “z”))).

Factors: Categorical data (e.g., factor(c(“male”, “female”))).


3. Reading and Writing Data

Reading Data:

Read.table(): Read tabular data from a file.

Read.csv(): Read CSV files.

readLines(): Read text files line-by-line.

Scan(): Read raw data.

Writing Data:

Write.table(): Write tabular data to a file.

Write.csv(): Write data to CSV files.

Cat(): Write data to console or files.


4. Control Structures

Conditional Statements:

If, else, ifelse(condition, true_value, false_value).

Loops:

For: Iteration over a sequence.

While: Loop with condition.

Repeat: Infinite loop (use break to exit).

5. Functions

User-defined reusable blocks of code.

Syntax:

My_function <- function(arg1, arg2) {


# Code

Return(result)

Example:

Add <- function(x, y) {

Return(x + y)

Add(3, 5) # Output: 8

6. Scoping Rules

Lexical Scoping:

Variables are looked up in the environment where the function is defined.

Dynamic Scoping:

R primarily uses lexical scoping.

Use <<- to assign values to variables in parent environments.


7. Dates and Times

Date Class:

Sys.Date(): Current date.

As.Date(“2025-01-01”): Convert string to date.

Time Class:

Sys.time(): Current date and time.

POSIXct and POSIXlt: Classes for time representation.

8. Loop Functions

Apply(): Apply a function to rows/columns of a matrix.


Lapply(): Apply a function to elements of a list.

Sapply(): Same as lapply(), but returns a simplified result.

Tapply(): Apply a function to subsets of data.

Mapply(): Multivariate version of sapply().

9. Debugging Tools

Traceback(): View the call stack after an error.

Debug(): Step through a function line by line.

Browser(): Pause execution to inspect variables.

Trace(): Insert debugging code into a function.

Recover(): Interactive debugging on errors.

10. Simulation
Generating Random Numbers:

Rnorm(n, mean, sd): Normal distribution.

Runif(n, min, max): Uniform distribution.

Rbinom(n, size, prob): Binomial distribution.

Random Sampling:

Sample(x, size, replace, prob): Draw samples.

11. Code Profiling

Tools:

Rprof(): Profile R code.

summaryRprof(): Summarize profiling output.


Best Practices:

Avoid loops where vectorized operations are possible.

Use efficient data structures.

1. Overview of R

What is R?

R is a statistical computing language designed for data analysis, statistical


modeling, and graphical representation.

Features:

Rich library of statistical and graphical functions.

Supports vectorized operations for efficiency.

Extensible through packages (CRAN, Bioconductor).

RStudio IDE:

User-friendly interface with script editor, console, and visualization panels.


Popular tools: Syntax highlighting, debugging, and package management.

2. R Data Types and Objects

Data Types

Numeric: Numbers (e.g., 3.5, -2).

Num <- 3.14

Typeof(num) # Output: “double”

Integer: Whole numbers.

Int <- as.integer(5)

Typeof(int) # Output: “integer”

Character: Text or string.

Str <- “Hello”

Typeof(str) # Output: “character”

Logical: Boolean values (TRUE, FALSE).


Log_val <- TRUE

Typeof(log_val) # Output: “logical”

Complex: Numbers with imaginary parts.

Comp <- 2 + 3i

Typeof(comp) # Output: “complex”

Objects

1. Vector: Homogeneous data.

Vec <- c(1, 2, 3, 4)

2. Matrix: 2D, homogeneous.

Mat <- matrix(1:9, nrow = 3, ncol = 3)

3. List: Heterogeneous data.

Lst <- list(1, “a”, TRUE)


4. Data Frame: Tabular data.

Df <- data.frame(Name = c(“A”, “B”), Age = c(25, 30))

5. Factor: Categorical data.

Gender <- factor(c(“male”, “female”, “male”))

3. Reading and Writing Data

Reading Data

From CSV File:

Data <- read.csv(“file.csv”)

From Text File:

Data <- read.table(“file.txt”, header = TRUE, sep = “\t”)

Raw Input:
Values <- scan(what = integer())

Writing Data

To CSV:

Write.csv(data, “output.csv”)

To Text File:

Write.table(data, “output.txt”, sep = “\t”)

4. Control Structures

Conditional Statements

If and else:

X <- 10

If (x > 5) {

Print(“x is greater than 5”)

} else {

Print(“x is less than or equal to 5”)


}

Ifelse:

Result <- ifelse(x > 5, “Greater”, “Lesser”)

Loops

For loop:

For (I in 1:5) {

Print(i)

While loop:

I <- 1

While (I <= 5) {

Print(i)

I <- I + 1

Repeat loop:

I <- 1

Repeat {
If (I > 5) break

Print(i)

I <- I + 1

5. Functions

Defining Functions

Syntax:

My_function <- function(arg1, arg2) {

# Function body

Return(result)

Example

Square <- function(x) {

Return(x^2)

Square(4) # Output: 16
6. Scoping Rules

Lexical Scoping:

Variables are searched in the environment where the function is defined.

X <- 10

My_func <- function() {

Return(x)

My_func() # Output: 10

Dynamic Scoping: R does not use this approach.

7. Dates and Times

Current Date and Time:

Today <- Sys.Date()

Now <- Sys.time()


Conversion:

Date <- as.Date(“2025-01-01”)

Time <- as.POSIXct(“2025-01-01 12:00:00”)

8. Loop Functions

Apply:

Mat <- matrix(1:9, nrow = 3)

Apply(mat, 1, sum) # Row sums

Lapply:

Lapply(1:5, function(x) x^2)

Sapply:

Sapply(1:5, function(x) x^2)

Tapply:

Tapply(1:10, c(1,1,2,2,3,3,4,4,5,5), sum)


Mapply:

Mapply(rep, 1:3, 3)

9. Debugging Tools

Traceback:

Traceback()

Debug:

Debug(my_function)

Browser:

Browser()

Recover:

Options(error = recover)
10. Simulation

Random Numbers:

Rnorm(5, mean = 0, sd = 1) # Normal distribution

Runif(5, min = 0, max = 1) # Uniform distribution

Rbinom(5, size = 10, prob = 0.5) # Binomial distribution

11. Code Profiling

Profiling with Rprof():

Rprof(“profile.out”)

For (I in 1:1000) sqrt(i)

Rprof(NULL)

summaryRprof(“profile.out”)

Optimization Tips:

Replace loops with vectorized functions.

Use efficient data structures like matrices and data frames.


1. Overview of R

What is R?

R is like a calculator but much more powerful. It can handle complex data
analysis, generate beautiful graphs, and automate tasks.

Example: Plotting a simple graph.

X <- 1:10

Y <- x^2

Plot(x, y, main = “Graph of y = x^2”)

2. R Data Types and Objects

Data Types

1. Numeric:
X <- 10.5

Typeof(x) # Output: “double”

2. Integer:

Y <- as.integer(7)

Typeof(y) # Output: “integer”

3. Character:

Z <- “Hello”

Typeof(z) # Output: “character”

4. Logical:

Flag <- TRUE

Typeof(flag) # Output: “logical”

5. Complex:

Comp <- 2 + 3i

Typeof(comp) # Output: “complex”


Objects

1. Vectors: Store multiple values of the same type.

Vec <- c(1, 2, 3, 4)

Print(vec) # Output: 1 2 3 4

2. Matrices: 2D array where all elements are of the same type.

Mat <- matrix(1:6, nrow = 2, ncol = 3)

Print(mat)

3. Lists: Collection of different data types.

Lst <- list(num = 10, str = “Hello”, vec = c(1, 2))

Print(lst)

4. Data Frames: Tabular data.

Df <- data.frame(Name = c(“Alice”, “Bob”), Age = c(25, 30))

Print(df)
3. Reading and Writing Data

Reading Data

1. From CSV:

Data <- read.csv(“file.csv”) # Replace “file.csv” with your file path.

2. Raw Input:

Values <- scan(what = integer(), nmax = 5) # Reads 5 integers from the


console.

Writing Data

1. To CSV:

Write.csv(data, “output.csv”) # Save data to a CSV file.


3. Control Structures

Conditional Statements

1. If and else:

X <- 10

If (x > 5) {

Print(“x is greater than 5”)

} else {

Print(“x is less than or equal to 5”)

Loops

1. For Loop:

For (I in 1:5) {

Print(i)

}
2. While Loop:

I <- 1

While (I <= 5) {

Print(i)

I <- I + 1

3. Repeat Loop:

I <- 1

Repeat {

Print(i)

If (I == 5) break

I <- I + 1

4. Functions

1. Defining a Function:

Square <- function(x) {


Return(x^2)

Square(4) # Output: 16

2. Functions with Default Arguments:

Greet <- function(name = “User”) {

Return(paste(“Hello,”, name))

Greet() # Output: “Hello, User”

Greet(“Alice”) # Output: “Hello, Alice”

5. Scoping Rules

Lexical Scoping:

R searches for a variable in the environment where the function was defined,
not where it’s called.

X <- 10

My_function <- function() {

X <- 20
Return(x)

My_function() # Output: 20

Print(x) # Output: 10 (Global x remains unchanged)

Global Assignment:

Use <<- to assign a value to a variable outside the function.

X <- 5

My_function <- function() {

X <<- 10

My_function()

Print(x) # Output: 10

6. Dates and Times

1. Get Current Date and Time:

Sys.Date() # Example Output: “2025-01-06”

Sys.time() # Example Output: “2025-01-06 10:15:00”


2. Convert Strings to Dates:

As.Date(“2025-01-01”) # Output: “2025-01-01”

7. Loop Functions

1. Apply:

Mat <- matrix(1:6, nrow = 2)

Apply(mat, 1, sum) # Row sums: Output: 9 12

2. Lapply:

Lapply(1:3, function(x) x^2) # Output: List: 1 4 9

3. Sapply:

Sapply(1:3, function(x) x^2) # Output: 1 4 9


8. Debugging Tools

1. Traceback:

Shows the sequence of calls before an error occurred.

Traceback()

2. Debug:

Steps through a function.

Debug(square)

Square(4)

3. Browser:

Pauses execution at a specific line.


My_function <- function(x) {

Browser()

Return(x^2)

My_function(4)

9. Simulation

1. Generate Random Numbers:

Rnorm(5, mean = 0, sd = 1) # Generate 5 random numbers from normal


distribution.

Runif(5, min = 0, max = 10) # 5 random numbers from uniform distribution.

2. Random Sampling:

Sample(1:10, 5, replace = TRUE) # Sample 5 numbers with replacement.


10. Code Profiling

1. Profiling with Rprof:

Rprof(“profile.out”)

For (I in 1:10000) sqrt(i)

Rprof(NULL)

summaryRprof(“profile.out”)

2. Optimization:

Replace loops with vectorized operations.

X <- 1:10000

Y <- x^2 # Faster than using a for loop.

You might also like