9.5 Build a recommender

We build the recommender with an evaluation scheme using evaluationScheme() function. Here we provide all the relevant information for creating a recommender in the next step. Consider this as something similar to trainControl() function from caret. A critical difference is that in evaluationScheme() we also provide the data, which in our case is movie_small. This is because evaluationScheme() creates data partitions based on the method that we select.

I personally prefer to use cross validation while building a machine learning model. In the code below, you can change the method argument to other values as given in the documentation. k specifies the number of cross validation folds. The next argument given is a critical parameter. Here we specify how many rating could be used (or withheld) from the test set while validating the model. For example, given = 15 means that while testing the model, use only randomly picked 15 ratings from every user to predict the unknown ratings. A negative value of given specifies the ratings to withhold. For instance, given = -5 will use all the rating except 5 ratings for every user to test the model.

All else equal, a model that performs well with lower values of given is desirable because user ratings are sparse.

Finally, pick a threshold for goodRating, which will be used for recommending the movies later on. I have picked 4 in the code below, meaning any movie with a rating 4 and above should be considered as a movie with good rating.

set.seed(12345)
eval_movies <- evaluationScheme(data = movie_small, 
                      method = "cross-validation", 
                      k = 10,
                      given = 15, 
                      goodRating = 4)
eval_movies
## Evaluation scheme with 15 items given
## Method: 'cross-validation' with 10 run(s).
## Good ratings: >=4.000000
## Data set: 816 x 601 rating matrix of class 'realRatingMatrix' with 80921 ratings.

evaluationScheme() creates 3 data sets. It splits the data into train and test set but then within the test set it further creates a known and an unknown data sets. The known test data has the ratings specified by given and unknown has the remaining ratings, which will be used to validate the predictions made using known.

For ease of exposition below, we save these data sets separately.

train_movies <- getData(eval_movies, "train")
known_movies <- getData(eval_movies, "known")
unknown_movies <- getData(eval_movies, "unknown")