Predicting cluster assignments
The goal in this exercise is to score the test dataset, by assigning clusters based upon the predict method for the training dataset.
Using flexclust to predict cluster assignment
The standard kmeans function does not have a prediction method. However, we can use the flexclust package which does. Since the prediction method can take a long time to run, we will illustrate it only on a sample number of rows and columns. In order to compare the test and training results, they also need to have the same number of columns. For illustration purposes, we will set the number at 10.
To begin, take a sample from the OnlineRetail training data:
set.seed(1)
sample.size <- 10000
max.cols <- 10
library("flexclust") OnlineRetail <- OnlineRetail[1:sample.size, ]Next, create the document term matrix from the description column in the sampled dataset. We will use the create_matrix function from the RTextTools package, which can create a TDM first without having a separate...