Packt+ | Advance your knowledge in tech

You're reading from Practical Predictive Analytics Analyse current and historical data to predict future trends using R, Spark, and more

Product type Paperback

Published in Jun 2017

Publisher Packt

ISBN-13 9781785886188

Length 576 pages

Edition 1st Edition

Languages

Tools

Splunk

Concepts

Predictive Analytics

Author (1):

Winters

View More author details

Table of Contents (13) Chapters

Preface

1. Getting Started with Predictive Analytics FREE CHAPTER

2. The Modeling Process

3. Inputting and Exploring Data

4. Introduction to Regression Algorithms

5. Introduction to Decision Trees, Clustering, and SVM

6. Using Survival Analysis to Predict and Analyze Customer Churn

7. Using Market Basket Analysis as a Recommender Engine

8. Exploring Health Care Enrollment Data as a Time Series

9. Introduction to Spark Using R

10. Exploring Large Datasets Using Spark

11. Spark Machine Learning - Regression and Cluster Models

12. Spark Models – Rule-Based Learning

Predicting cluster assignments

The goal in this exercise is to score the test dataset, by assigning clusters based upon the predict method for the training dataset.

Using flexclust to predict cluster assignment

The standard kmeans function does not have a prediction method. However, we can use the flexclust package which does. Since the prediction method can take a long time to run, we will illustrate it only on a sample number of rows and columns. In order to compare the test and training results, they also need to have the same number of columns. For illustration purposes, we will set the number at 10.

To begin, take a sample from the OnlineRetail training data:

set.seed(1)
 sample.size <- 10000
 max.cols <- 10

library("flexclust") OnlineRetail <- OnlineRetail[1:sample.size, ]

Next, create the document term matrix from the description column in the sampled dataset. We will use the create_matrix function from the RTextTools package, which can create a TDM first without having a separate...