Pre- Placement Workshop 
in R and Analytics 
Delhi School of Economics 2014 
Ajay Ohri
Hi , I am Ajay Ohri
Agenda 
• Try and learn R in 12 hours
Agenda 
• Try and learn R in 12 hours 
• Get an introduction to Analytics
Agenda 
• Try and learn R in 12 hours 
• Get an introduction to Analytics 
• Be better skilled for Analytics as a career
Agenda 
• Try and learn R in 12 hours 
• Get an introduction to Analytics 
• Be better skilled for Analytics as a career (?)
Training Plan 
• DAY 1 
– Session 1 -2.5 hours 
– Session 2 -3.5 hours 
• DAY 2 
– Session 1-2.5 hours 
– Session 2 -3.5 hours
Instructor 
• Author of R for Business Analytics 
• Author of R for Cloud Computing ( An 
approach for Data Scientists) 
• 10+ yrs in Analytics and 6+ years in R 
• Founder, Decisionstats.com
The Audience 
Breakup – Demographics and Background
Expectations from each other 
• From Instructor 
– Your turn to speak
Expectations from each other 
• From Instructor 
• From Audience 
– mobile phones should be kindly switched off 
• Yes, this includes Whatsapp 
– Ask Questions at end of session 
– Take Notes
Day 1 Session 1 
– Introductions 
• Introduction to Analytics 
• Introduction to R 
• Interfaces in R 
– Demos in R (Maths, Objects,etc) 
• Break 1- 
– Installation, Trouble Shooting, Questions
Day 1 Session 2 
– Recap 
• Input of Data 
• Inspecting Data Quality 
• Investigating Data Issues 
– Demos in R 
• Data Input, 
• Data Quality, 
• Data Exploration) 
• Break 2- 
– Questions
Day 2 Session 1 
– Revision 
• Exploring Data 
• Manipulating Data 
• Visualization of Data 
• Demos in R 
• Data Exploration, 
• Data Manipulation, 
• Data Visualizations 
• Break 1 
– Questions
Day 2 Session 2 
– Recap 
• Data Mining 
• Regression Models 
• Advanced Topics 
• Demos in R 
• Data Mining, 
• Model Building, 
• Advanced Topics 
• Summary and Conclusion 
• Break 2 
– Questions
Analytics 
• What is analytics? 
• Where is it used? 
• How is it used? 
• What are some good practices?
Analytics 
• What is analytics? – Study of data for helping 
with decision making using software 
• Where is it used? 
• How is it used? 
• What are some good practices?
Analytics 
• What is analytics? 
• Where is it used? – Industries (like Pharma, 
BFSI, Telecom, Retail) 
• How is it used? –Use statistics and software 
• What are some good practices?
Analytics 
• What is analytics? 
• Where is it used? 
• How is it used? 
• What are some good practices? – 
– Learn one new thing extra from your 
competition every day. This is a fast moving field. 
– Etc.
What is Data Science
Other Analytics Software 
• SAS (Base) et al 
• JMP 
• SPSS 
• Python 
• Octave 
• Clojure 
• Julia(?)
Other Analytics Software 
• SAS (Base) et al 
• JMP 
• SPSS 
• Python 
• Octave 
• Clojure 
• Julia(?)
What is R? 
http://www.r-project.org/ 
• Language 
– Object oriented 
– Open Source 
– Free 
– Widely used 
the concept of "objects" that have data fields(attributes that describe the object) 
and associated procedures known as methods. Objects, which are 
usually instances of classes, are used to interact with one another to design 
applications and computer programs
Pre Requisites 
• Installation of R 
http://cran.rstudio.com/bin/windows/base/ 
• R Studio 
• R Packages
Pre Requisites 
• Installation of R 
– Rtools 
– http://cran.rstudio.com/bin/windows/Rtools/ 
• R Studio 
• R Packages
Pre Requisites 
• Installation of R 
– RTools 
• R Studio 
http://www.rstudio.com/products/rstudio/download/ 
• R Packages
Pre Requisites 
• Installation of R 
– RTools 
• R Studio 
http://www.rstudio.com/products/rstudio/download/ 
• R Packages 
about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet 
sites covering a very wide range of modern statistics.
Pre Requisites 
• Installation of R 
– RTools 
• R Studio 
http://www.rstudio.com/products/rstudio/download/ 
• R Packages 
install.packages(), 
update.packages(), 
library() 
Packages are installed once, updated periodically, but loaded every time
Pre Requisites 
• R 
• R Studio 
• R Tools (for Windows) 
• JAVA (JRE) 
– R Packages (need Internet connection) 
– Rcmdr 
• All packages asked at startup 
• Epack plugin 
• KMggplot2plugin 
– rattle 
• A few packages that are asked when using rattle 
• GTK+ (needs internet) 
– Deducer 
– ggmap 
– Hmisc 
– arules 
– MASS
Interfaces to R 
• Console 
Default 
Customization 
• IDE 
• GUI
Demo- 
Basic Math on R Console 
• + 
• - 
• Log 
• Exp 
• * 
• / 
• () 
• mean 
• sum 
• sd 
• log 
• median 
• exp
Demo- 
Basic Math on R Console 
• + 
• - 
• Log 
• Exp 
• * 
• / 
• () 
Hint- Ctrl +L clears screen
Demo- 
Basic Objects on R Console 
• + 
• - 
• Log 
• Exp 
• * 
• / 
• () 
Functions-ls() 
– what objects are here 
rm(“foo”) removes object named foo 
Assignment 
Using = or -> assigns object names to values 
Hint- Up arrow gives you last 
typed command
Functions and Loops 
• Loops 
for (number in 1:5){ print (number) }
Functions and Loops 
• Function 
functionajay=function(a)(a^2+2*a+1) 
Hint: Always match brackets 
Each ( deserves a ) 
Each { deserves a } 
Each [ deserves a ]
Demo- 
Basic Objects on R Console 
• + 
• - 
• Log 
• Exp 
• * 
This is made more clear in 
next slide 
Functions-class() 
gives class 
dim() gives dimensions 
nrow() gives rows 
ncol() gives columns 
length() gives length 
str() gives structure 
Hint- Up arrow gives you last 
typed command
Demo- 
Datasets on R Console 
• 
Hint- use data() to list all loaded 
datasets
Demo- 
Datasets on R Console 
• 
Hint- use data() to list all loaded 
datasets 
library(FOO) loads package “FOO”
R- Basic Functions 
– ls() 
– rm() 
– str() 
– summary() 
– getwd() 
– setwd() 
– dir() 
– read.csv()
Day 1 Session 2 
– Recap 
• Input of Data 
• Inspecting Data Quality 
• Investigating Data Issues 
– Demos in R 
• Data Input, 
• Data Quality, 
• Data Exploration) 
• Break 2- 
– Questions
read.table()
Statistical formats 
• read.spss from foreign package 
• read.sas7bdat from sas7bdat package
From Databases 
The RODBC package provides access to databases through 
an ODBC interface. 
The primary functions are 
• odbcConnect(dsn, uid="", pwd="") Open a connection 
to an ODBC database 
• sqlFetch(channel, sqltable) Read a table from an ODBC 
database into a data frame 
Hint- a good site to learn R 
http://www.statmethods.net
A Detour to SQL
From Web (aka Web Scraping) 
• readlines Hint : R is case sensitive 
readlines is not the same as readLines 
Hint : Use head() and tail() to inspect objects 
Other packages are XML and Curl 
Case Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
Inspecting Data Quality 
• head() 
• tail() 
• names() 
• str() 
• objectname[I,m] 
• objectname$variable 
Hint- Try this code please 
data(mtcars) 
head(mtcars,10) 
tail(mtcars,5) 
names(mtcars) 
str(mtcars) 
mtcars[1,] 
mtcars[,2] 
mtcars[2,3] 
mtcars$cyl
Inspecting Data Quality: Demo 
•
Inspecting Data Quality: Demo 
•
Data Selection 
• object[l,m] gives the value in l row and m 
column 
• object[l,] will give all the values in l row 
• object$varname gives all values of varname 
• subset helps in selection
Data Selection: Demo 
Questions- How do I use multiple conditions (AND OR) 
Can I do away with subset function 
How do I select random sample 
Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business- 
analytics-rstats/
Day 2 Session 1 
– Revision 
• Exploring Data 
• Manipulating Data 
• Visualization of Data 
• Demos in R 
• Data Exploration, 
• Data Manipulation, 
• Data Visualizations 
• Break 1 
– Questions
Good coding practices 
• Use # for comment 
• Use git for version control 
• Use Rstudio for multiple lines of code
Functions in R 
• custom functions 
• source code for a function 
• Understanding help ? , ??
Packages in R 
• CRAN 
• CRAN Views 
• R Documentation
Documentation in R 
• Help ? And ?? 
• CRAN Views 
• Package Help 
• Tips for Googling 
– Stack Overflow 
– Email Lists 
– Twitter 
– R Bloggers
Interfaces to R 
• Console 
• IDE 
R Studio 
• GUI 
Graphical User 
Interface
Graphical Interfaces to R 
• R Commander 
• Rattle 
• Deducer
Installation of R Commander
Overview of R Commander
Demo 
R Commander – 3D Graphs
Installation of Rattle
Installation of Rattle
Installation of Rattle
Installation of Rattle
Installation of Rattle 
• GTK+ Installation Necessary 
• Install other packages when prompted
Installation of Rattle 
• GTK+ Installation Necessary 
• Install other packages when prompted
Overview of Rattle
Demo Rattle
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Overview of Deducer (with JGR)
Demo Deducer 
• data() 
• data(mtcars)
Data Exploration 
• summary() 
• table() 
• describe() (Hmisc) 
• summarize()(Hmisc) 
Hint- Try this code please 
summary(mtcars) 
table(mtcars$cyl) 
library(Hmisc) 
describe(mtcars) 
summarize(mtcars$mpg,mtcars$cyl,mean) 
CLASS WORK- 
•Use table command for two variables 
•Summarize mtcars$mpg for two variables (cyl , gear) 
•Try and find min and max for the same
Data Exploration 
• missing values are represented by NA in R 
• Demo 
– is.na 
– na.omit 
– na.rm
Data Visualization 
Notes- 
Explaining Basic Types of Graphs 
Customizing Graphs 
Graph Output 
Advanced Graphs 
Facets, 
Grammar of Graphics 
Data Visualization Rules
Data Manipulation Demo 
Notes- 
1. gsub 
2. gsub with 
escape 
3. as operator 
4. is operator
Text Manipulation 
Functions-nchar 
substr 
paste
Date Manipulation
Date Manipulation 
Use ? help generously 
Hit escape to escape the + signs 
+ signs occur due to unclosed quotes or brackets 
Class Work 
What is your age in days as of today? 
What is your age in weeks as of today? 
Hint- 
> age2=difftime(Sys.Date(),dob2,units='weeks') 
> age2 
Time difference of 1959.286 weeks
Data Output 
• Graphical Output 
• Numerical Output (aggregation)
Data Output 
• Graphical Output 
• Numerical Output (aggregation)
Data Output 
• Graphical Output
Data Output 
• Use objects to summarize 
• Use write.csv 
• Use setwd() to set location of output
Econometrics 
Coming up 
Regression
Correlation
Regression 
Notes- 
Correlation is not causation 
How do we determine which is dependent 
and which are independent variables
Regression
Regression using R Commander
Lies True Lies and Statistics 
• Anscombe -case study
Regression Recap 
• cor 
• lm 
• anova 
• summary and plot of lm object 
• residuals 
• p value 
– vif 
– heteroskedascity 
– outliers
Propensity Modeling in Industry 
• Response Rates 
• Lift 
• Test and Control groups
Day 2 Session 2 
– Recap 
• Data Mining 
• Regression Models 
• Advanced Topics 
• Demos in R 
• Data Mining, 
• Model Building, 
• Advanced Topics 
• Summary and Conclusion 
• Break 2 
– Questions
Data Mining 
• Rattle 
– association analysis 
– cluster analysis 
– modeling
Rattle 
• Analyze wine
Rattle 
• Analyze wine
Rattle 
• Analyze wine
Rattle 
• Cluster Analysis
Data Mining 
• Brief Introduction 
– Affinity analysis is a data analysis and data mining technique that 
discovers co-occurrence relationships among activities performed by (or 
recorded about) specific individuals or groups. In general, this can be 
applied to any process where agents can be uniquely identified and 
information about their activities can be recorded. In retail, affinity 
analysis is used to perform market basket analysis, in which retailers seek 
to understand the purchase behavior of customers. This information can 
then be used for purposes of cross-selling and up-selling,
Rattle 
• Brief Introduction 
– market basket analysis 
– Market basket analysis might tell a retailer that customers often 
purchase shampoo and conditioner together, so putting both items on 
promotion at the same time would not create a significant increase in 
revenue, while a promotion involving just one of the items would likely 
drive sales of the other
Rattle 
• Brief Introduction 
– association rules 
– if butter and bread are bought, customers also buy milk 
Example database with 4 items and 5 transactions 
transactio 
n ID 
milk bread butter beer 
1 1 1 0 0 
2 0 0 1 0 
3 0 0 0 1 
4 1 1 1 0 
5 0 1 0 0
Rattle 
• Brief Introduction 
– association rules 
– the itemset (milk,bread->butter) has a support of 20% since it occurs in 20% of all 
transactions (1 out of 5 transactions). 
– the itemset (milk,bread->butter) has a confidence of 50% since it occurs in 50% of all 
such transactions (1 out of 2 transactions). 
–
Rattle 
• Brief Introduction 
– association rules
Regression Models 
• lm function 
• Understanding output 
• Diagnostics 
– homoskedasticity 
– Multicollinearity 
– p value 
– Residuals
Advanced Topics :Demos 
• Time Series Analysis (use epack plugin) 
http://decisionstats.com/2010/10/22/doing-time-series-using-a-r-gui/
Advanced Topics :Demos 
• Advanced Data Visualization ( kmggplot2 
plugin) 
http://decisionstats.com/2012/05/21/new-rcommander-with-ggplot-rstats/
Advanced Topics :Demos 
Social Network Analysis (sna) 
Facebook 
http://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/ 
Twitter 
http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais
Advanced Topics :Demos 
• Spatial Analysis 
• ggmap demo 
• http://decisionstats.com/2013/08/19/the-wonderful-ggmap-package-for-spatial-analysis-in-r-rstats/ 
• rmaps 
• http://rcharts.io/viewer/?9223554#.Uw4hOPmSySp
Thank You 
• http://linkedin.com/in/ajayohri 
• ohri2007@gmail.com

A Workshop on R

  • 1.
    Pre- Placement Workshop in R and Analytics Delhi School of Economics 2014 Ajay Ohri
  • 2.
    Hi , Iam Ajay Ohri
  • 3.
    Agenda • Tryand learn R in 12 hours
  • 4.
    Agenda • Tryand learn R in 12 hours • Get an introduction to Analytics
  • 5.
    Agenda • Tryand learn R in 12 hours • Get an introduction to Analytics • Be better skilled for Analytics as a career
  • 6.
    Agenda • Tryand learn R in 12 hours • Get an introduction to Analytics • Be better skilled for Analytics as a career (?)
  • 7.
    Training Plan •DAY 1 – Session 1 -2.5 hours – Session 2 -3.5 hours • DAY 2 – Session 1-2.5 hours – Session 2 -3.5 hours
  • 8.
    Instructor • Authorof R for Business Analytics • Author of R for Cloud Computing ( An approach for Data Scientists) • 10+ yrs in Analytics and 6+ years in R • Founder, Decisionstats.com
  • 9.
    The Audience Breakup– Demographics and Background
  • 10.
    Expectations from eachother • From Instructor – Your turn to speak
  • 11.
    Expectations from eachother • From Instructor • From Audience – mobile phones should be kindly switched off • Yes, this includes Whatsapp – Ask Questions at end of session – Take Notes
  • 12.
    Day 1 Session1 – Introductions • Introduction to Analytics • Introduction to R • Interfaces in R – Demos in R (Maths, Objects,etc) • Break 1- – Installation, Trouble Shooting, Questions
  • 13.
    Day 1 Session2 – Recap • Input of Data • Inspecting Data Quality • Investigating Data Issues – Demos in R • Data Input, • Data Quality, • Data Exploration) • Break 2- – Questions
  • 14.
    Day 2 Session1 – Revision • Exploring Data • Manipulating Data • Visualization of Data • Demos in R • Data Exploration, • Data Manipulation, • Data Visualizations • Break 1 – Questions
  • 15.
    Day 2 Session2 – Recap • Data Mining • Regression Models • Advanced Topics • Demos in R • Data Mining, • Model Building, • Advanced Topics • Summary and Conclusion • Break 2 – Questions
  • 16.
    Analytics • Whatis analytics? • Where is it used? • How is it used? • What are some good practices?
  • 17.
    Analytics • Whatis analytics? – Study of data for helping with decision making using software • Where is it used? • How is it used? • What are some good practices?
  • 18.
    Analytics • Whatis analytics? • Where is it used? – Industries (like Pharma, BFSI, Telecom, Retail) • How is it used? –Use statistics and software • What are some good practices?
  • 19.
    Analytics • Whatis analytics? • Where is it used? • How is it used? • What are some good practices? – – Learn one new thing extra from your competition every day. This is a fast moving field. – Etc.
  • 20.
    What is DataScience
  • 21.
    Other Analytics Software • SAS (Base) et al • JMP • SPSS • Python • Octave • Clojure • Julia(?)
  • 22.
    Other Analytics Software • SAS (Base) et al • JMP • SPSS • Python • Octave • Clojure • Julia(?)
  • 23.
    What is R? http://www.r-project.org/ • Language – Object oriented – Open Source – Free – Widely used the concept of "objects" that have data fields(attributes that describe the object) and associated procedures known as methods. Objects, which are usually instances of classes, are used to interact with one another to design applications and computer programs
  • 24.
    Pre Requisites •Installation of R http://cran.rstudio.com/bin/windows/base/ • R Studio • R Packages
  • 25.
    Pre Requisites •Installation of R – Rtools – http://cran.rstudio.com/bin/windows/Rtools/ • R Studio • R Packages
  • 26.
    Pre Requisites •Installation of R – RTools • R Studio http://www.rstudio.com/products/rstudio/download/ • R Packages
  • 27.
    Pre Requisites •Installation of R – RTools • R Studio http://www.rstudio.com/products/rstudio/download/ • R Packages about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.
  • 28.
    Pre Requisites •Installation of R – RTools • R Studio http://www.rstudio.com/products/rstudio/download/ • R Packages install.packages(), update.packages(), library() Packages are installed once, updated periodically, but loaded every time
  • 29.
    Pre Requisites •R • R Studio • R Tools (for Windows) • JAVA (JRE) – R Packages (need Internet connection) – Rcmdr • All packages asked at startup • Epack plugin • KMggplot2plugin – rattle • A few packages that are asked when using rattle • GTK+ (needs internet) – Deducer – ggmap – Hmisc – arules – MASS
  • 30.
    Interfaces to R • Console Default Customization • IDE • GUI
  • 31.
    Demo- Basic Mathon R Console • + • - • Log • Exp • * • / • () • mean • sum • sd • log • median • exp
  • 32.
    Demo- Basic Mathon R Console • + • - • Log • Exp • * • / • () Hint- Ctrl +L clears screen
  • 33.
    Demo- Basic Objectson R Console • + • - • Log • Exp • * • / • () Functions-ls() – what objects are here rm(“foo”) removes object named foo Assignment Using = or -> assigns object names to values Hint- Up arrow gives you last typed command
  • 34.
    Functions and Loops • Loops for (number in 1:5){ print (number) }
  • 35.
    Functions and Loops • Function functionajay=function(a)(a^2+2*a+1) Hint: Always match brackets Each ( deserves a ) Each { deserves a } Each [ deserves a ]
  • 36.
    Demo- Basic Objectson R Console • + • - • Log • Exp • * This is made more clear in next slide Functions-class() gives class dim() gives dimensions nrow() gives rows ncol() gives columns length() gives length str() gives structure Hint- Up arrow gives you last typed command
  • 37.
    Demo- Datasets onR Console • Hint- use data() to list all loaded datasets
  • 38.
    Demo- Datasets onR Console • Hint- use data() to list all loaded datasets library(FOO) loads package “FOO”
  • 39.
    R- Basic Functions – ls() – rm() – str() – summary() – getwd() – setwd() – dir() – read.csv()
  • 40.
    Day 1 Session2 – Recap • Input of Data • Inspecting Data Quality • Investigating Data Issues – Demos in R • Data Input, • Data Quality, • Data Exploration) • Break 2- – Questions
  • 41.
  • 42.
    Statistical formats •read.spss from foreign package • read.sas7bdat from sas7bdat package
  • 43.
    From Databases TheRODBC package provides access to databases through an ODBC interface. The primary functions are • odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database • sqlFetch(channel, sqltable) Read a table from an ODBC database into a data frame Hint- a good site to learn R http://www.statmethods.net
  • 44.
  • 45.
    From Web (akaWeb Scraping) • readlines Hint : R is case sensitive readlines is not the same as readLines Hint : Use head() and tail() to inspect objects Other packages are XML and Curl Case Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
  • 46.
    Inspecting Data Quality • head() • tail() • names() • str() • objectname[I,m] • objectname$variable Hint- Try this code please data(mtcars) head(mtcars,10) tail(mtcars,5) names(mtcars) str(mtcars) mtcars[1,] mtcars[,2] mtcars[2,3] mtcars$cyl
  • 47.
  • 48.
  • 49.
    Data Selection •object[l,m] gives the value in l row and m column • object[l,] will give all the values in l row • object$varname gives all values of varname • subset helps in selection
  • 50.
    Data Selection: Demo Questions- How do I use multiple conditions (AND OR) Can I do away with subset function How do I select random sample Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business- analytics-rstats/
  • 51.
    Day 2 Session1 – Revision • Exploring Data • Manipulating Data • Visualization of Data • Demos in R • Data Exploration, • Data Manipulation, • Data Visualizations • Break 1 – Questions
  • 52.
    Good coding practices • Use # for comment • Use git for version control • Use Rstudio for multiple lines of code
  • 53.
    Functions in R • custom functions • source code for a function • Understanding help ? , ??
  • 54.
    Packages in R • CRAN • CRAN Views • R Documentation
  • 55.
    Documentation in R • Help ? And ?? • CRAN Views • Package Help • Tips for Googling – Stack Overflow – Email Lists – Twitter – R Bloggers
  • 56.
    Interfaces to R • Console • IDE R Studio • GUI Graphical User Interface
  • 57.
    Graphical Interfaces toR • R Commander • Rattle • Deducer
  • 58.
  • 59.
    Overview of RCommander
  • 60.
    Demo R Commander– 3D Graphs
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
    Installation of Rattle • GTK+ Installation Necessary • Install other packages when prompted
  • 66.
    Installation of Rattle • GTK+ Installation Necessary • Install other packages when prompted
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
    Demo Deducer •data() • data(mtcars)
  • 78.
    Data Exploration •summary() • table() • describe() (Hmisc) • summarize()(Hmisc) Hint- Try this code please summary(mtcars) table(mtcars$cyl) library(Hmisc) describe(mtcars) summarize(mtcars$mpg,mtcars$cyl,mean) CLASS WORK- •Use table command for two variables •Summarize mtcars$mpg for two variables (cyl , gear) •Try and find min and max for the same
  • 79.
    Data Exploration •missing values are represented by NA in R • Demo – is.na – na.omit – na.rm
  • 80.
    Data Visualization Notes- Explaining Basic Types of Graphs Customizing Graphs Graph Output Advanced Graphs Facets, Grammar of Graphics Data Visualization Rules
  • 81.
    Data Manipulation Demo Notes- 1. gsub 2. gsub with escape 3. as operator 4. is operator
  • 82.
  • 83.
  • 84.
    Date Manipulation Use? help generously Hit escape to escape the + signs + signs occur due to unclosed quotes or brackets Class Work What is your age in days as of today? What is your age in weeks as of today? Hint- > age2=difftime(Sys.Date(),dob2,units='weeks') > age2 Time difference of 1959.286 weeks
  • 85.
    Data Output •Graphical Output • Numerical Output (aggregation)
  • 86.
    Data Output •Graphical Output • Numerical Output (aggregation)
  • 87.
    Data Output •Graphical Output
  • 88.
    Data Output •Use objects to summarize • Use write.csv • Use setwd() to set location of output
  • 89.
  • 90.
  • 91.
    Regression Notes- Correlationis not causation How do we determine which is dependent and which are independent variables
  • 92.
  • 93.
  • 94.
    Lies True Liesand Statistics • Anscombe -case study
  • 95.
    Regression Recap •cor • lm • anova • summary and plot of lm object • residuals • p value – vif – heteroskedascity – outliers
  • 96.
    Propensity Modeling inIndustry • Response Rates • Lift • Test and Control groups
  • 97.
    Day 2 Session2 – Recap • Data Mining • Regression Models • Advanced Topics • Demos in R • Data Mining, • Model Building, • Advanced Topics • Summary and Conclusion • Break 2 – Questions
  • 98.
    Data Mining •Rattle – association analysis – cluster analysis – modeling
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
    Data Mining •Brief Introduction – Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling,
  • 104.
    Rattle • BriefIntroduction – market basket analysis – Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner together, so putting both items on promotion at the same time would not create a significant increase in revenue, while a promotion involving just one of the items would likely drive sales of the other
  • 105.
    Rattle • BriefIntroduction – association rules – if butter and bread are bought, customers also buy milk Example database with 4 items and 5 transactions transactio n ID milk bread butter beer 1 1 1 0 0 2 0 0 1 0 3 0 0 0 1 4 1 1 1 0 5 0 1 0 0
  • 106.
    Rattle • BriefIntroduction – association rules – the itemset (milk,bread->butter) has a support of 20% since it occurs in 20% of all transactions (1 out of 5 transactions). – the itemset (milk,bread->butter) has a confidence of 50% since it occurs in 50% of all such transactions (1 out of 2 transactions). –
  • 107.
    Rattle • BriefIntroduction – association rules
  • 108.
    Regression Models •lm function • Understanding output • Diagnostics – homoskedasticity – Multicollinearity – p value – Residuals
  • 109.
    Advanced Topics :Demos • Time Series Analysis (use epack plugin) http://decisionstats.com/2010/10/22/doing-time-series-using-a-r-gui/
  • 110.
    Advanced Topics :Demos • Advanced Data Visualization ( kmggplot2 plugin) http://decisionstats.com/2012/05/21/new-rcommander-with-ggplot-rstats/
  • 111.
    Advanced Topics :Demos Social Network Analysis (sna) Facebook http://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/ Twitter http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais
  • 112.
    Advanced Topics :Demos • Spatial Analysis • ggmap demo • http://decisionstats.com/2013/08/19/the-wonderful-ggmap-package-for-spatial-analysis-in-r-rstats/ • rmaps • http://rcharts.io/viewer/?9223554#.Uw4hOPmSySp
  • 113.
    Thank You •http://linkedin.com/in/ajayohri • [email protected]