Chapter 3 Funk’s matrix factorization algorithm

In this section, we consider Funk’s matrix factorization (MF) algorithm (Funk 2006; Koren, Bell, and Volinsky 2009) for rating prediction. We use the model \(Y \sim P + UV^\mathrm{T} + \varepsilon\) where:

\(Y\) is the \(N_\mathrm{U}\times N_\mathrm{M}\) rating matrix, i.e., with \(N_\mathrm{U}\) users and \(N_\mathrm{M}\) movies,
\(P\) represents the predictions from best model of the previous section,
\(U\) and \(V\) are \(N_\mathrm{U} \times k\) and \(N_\mathrm{M} \times k\) matrices, respectively, where \(k\) is the number of latent features to be found.

Unknown ratings \(Y_{u,i}\) can thus be estimated as \(P_{u,i} + U_u V_i^\mathrm{T}\). The parameter \(k\) is also the rank of matrix \(UV\); i.e. \(UV\) is a rank-\(k\) approximation of the residual matrix \(Y-P\).

Funk’s MF estimates \(U\) and \(V\) using gradient descent, but operating only on the known ratings. First, \(U\) and \(V\) are seeded with random values. Then, for each epoch, the algorithm iterates over all known ratings \((u,i)\) in the training set and updates the feature matrices as follows:

\[e_{u,i} = Y_{u,i} - P_{u,i} - U_u V_i^\mathrm{T}\] \[U_u \gets U_u + \gamma(e_{u,i} V_i - \lambda U_u)\] \[V_i \gets V_i + \gamma(e_{u,i} U_u - \lambda V_i)\]

where \(\gamma\) is the learning rate and \(\lambda\) is a regularization parameter. In this report, these are set to 0.02 and 0.001, respectively, in accordance to guidance from Funk (2006), and we will only optimize the rank parameter \(k\).

3.1 Computing the residuals

The following code computes \(Y_{u,i} - P_{u,i}\) for all user-movie pairs \((u,i)\) in the training set, and creates an index mapping eliminating users and movies with no rating pairs.

# Compute residuals from previous best model
previous_train <- genre_biases_reg[
  user_biases_reg[
    movie_biases_reg[
      edx_train, on = 'movieId'],
    on = 'userId'],
  on = 'genres'] |> 
  mutate(pred = mu + b_i + b_u + b_g + f_t[weekNum]) |> 
  pull(pred)

residuals_train <- as.numeric(edx_train$rating - previous_train)

# Generate test set predictions for previous best model
previous_test <- genre_biases_reg[
  user_biases_reg[
    movie_biases_reg[
      edx_test, on = 'movieId'],
    on = 'userId'],
  on = 'genres'] |> 
  mutate(pred = mu + b_i + b_u + b_g + f_t[weekNum]) |> 
  pull(pred)

# Obtain new movie and user indices **without gaps**, and save the mappings
Uidx <- numeric(max(edx_train$userId))
Uidx[unique(edx_train$userId)] = seq(uniqueN(edx_train$userId))

Vidx <- numeric(max(edx_train$movieId))
Vidx[unique(edx_train$movieId)] = seq(uniqueN(edx_train$movieId))

3.2 The `recommenderlab` package: first failure

The recommenderlab package (Hahsler 2021) contains an funkSVD function that accepts a realRatingMatrix object as input. Note that this object is expected to contain the actual ratings rather than a residual matrix. This object is easy to create and does not consume too much memory:

# Create the recommenderlab::realRatingMatrix object
mat <- new(
  className("realRatingMatrix", "recommenderlab"),
  data = sparseMatrix(Uidx[edx_train$userId],
                      Vidx[edx_train$movieId],
                      x = edx_train$rating)
)
format(object.size(mat), units = "auto", standard="SI")

## [1] "97.2 MB"

However, attempting to factor this matrix as follows returns an error, as shown below. This suggests that recommenderlab uses dense matrices in its internal functions.

# NOT RUN

# returns:
#   <simpleError: cannot allocate vector of size 5.6 Gb>

tryCatch(recommenderlab::funkSVD(mat), error = print)

3.3 The `rrecsys` package: second failure

Like recommenderlab, the rrecsys package (Çoba 2019) also contains an implementation of the Funk MF algorithm, again accepting the raw ratings as input. However, the following attempt to convert the training set into a format the rrecsys package can understand results in many GB of memory being requested, suggesting that while rrecsys::defineData understands sparse matrix input in coordinate form, the package does not use sparse matrix representations internally:

# NOT RUN

mat <- rrecsys::defineData(cbind(Uidx[edx_train$userId],
                                        Vidx[edx_train$movieId],
                                        x = edx_train$rating),
                           sparseMatrix = T,
                           binary = F,
                           minimum = 0.5,
                           maximum = 5,
                           intScale = TRUE)

The above operation did not complete after several minutes and was aborted.

3.4 Writing our own Funk MF algorithm

In light of the above failures, a fresh implementation of the Funk MF algorithm, using RCpp, was written. The C++ source code is available at https://yinchi.github.io/harvardx-movielens/svd.cpp and in Appendix A and is loaded into the R environment below:

# Funk matrix factorization. See C++ source for full documentation.
# Default values for regCoef and learningRate are as suggested by [Funk 2006].
Rcpp::sourceCpp("svd.cpp")
funk <- function(Uidx, Vidx, residuals, nFeatures, steps = 500,
                 regCoef = 0.02, learningRate = 1e-3) {
  
  # Change Uidx and Vidx to 0-based, for C++ only.
  funkCpp(Uidx[edx_train$userId] - 1,
          Vidx[edx_train$movieId] - 1,
          residuals_train,
          nFeatures, steps, regCoef, learningRate)
}

3.5 Computing the optimal rank of matrix \(UV\)

The following code plots the prediction error of the new model against the number of latent features in the Funk matrix factorization, i.e. the rank of matrix \(UV\):

# Compute RMSE values for varying number of MF features, if saved file not found.
set.seed(1)
if (!file.exists('funk_tuning.Rdata')) {
  nFeatures <- c(1, 2, 4, 8, seq(12,20), 24, 28, 32)
  rmses <- sapply(nFeatures, \(nF){
    
    message(nF, ' features')
    
    # Run Funk MF
    set.seed(1)
    funkResult <- funk(Uidx, Vidx, residuals_train, nFeatures = nF, steps = 500)
    U <- funkResult$U
    V <- funkResult$V
    
    # Uidx[u] is the row index of user u in matrix U
    # Vidx[v] is the row index of movie v in matrix V
    predicted_ratings_funk <- edx_test |>
      mutate(pred = previous_test +
               map2_dbl(userId, movieId, \(u,v) U[Uidx[u],] %*% V[Vidx[v],])) |>
      pull(pred)
    rmse <- RMSE(predicted_ratings_funk, edx_test$rating)
    
    message(rmse,'\n')
    return(rmse)
  })
  
  save(nFeatures,rmses, file = 'funk_tuning.Rdata')
}
set.seed(1)

# Load RMSE data from file
load('funk_tuning.Rdata')

# Plot RMSE against number of MF features.
par(cex = 0.7)
qplot(nFeatures, rmses, xlab = 'rank(UV)', ylab = 'RMSE', geom = c('point','line'))

The optimal number of latent features is:

nFeaturesOpt <- nFeatures[which.min(rmses)]
nFeaturesOpt

## [1] 14

3.6 Final matrix factorization and RMSE values

Using the new model with \(k=14\) to predict ratings for the edx_test gives the following RMSE values:

# Run Funk MF if saved file not found
set.seed(1)
if (!file.exists('funk.Rdata')) {
  funkResult <-
    funk(Uidx, Vidx, residuals_train, nFeatures = nFeaturesOpt, steps = 500)
  save(nFeaturesOpt, funkResult, file = 'funk.Rdata')
}
set.seed(1)

# Load MF data from file
load('funk.Rdata')
U <- funkResult$U
V <- funkResult$V


# Uidx[u] is the row index of user u in matrix U
# Vidx[v] is the row index of movie v in matrix V
predicted_ratings_funk <- edx_test |>
  mutate(pred = previous_test +
           map2_dbl(userId, movieId, \(u,v) U[Uidx[u],] %*% V[Vidx[v],])) |>
  pull(pred)
rmse <- RMSE(predicted_ratings_funk, edx_test$rating)

# Compute RMSE and add to data.table
RMSEs <- RMSEs |>
  add_row(Method = "Section 2 best model + Matrix factorization",
          RMSE = RMSE(predicted_ratings_funk, edx_test$rating),
          "RMSE (clamped estimates)" =
            RMSE(clamp(predicted_ratings_funk),edx_test$rating))

RMSEs[nrow(RMSEs),] |>
  kable(align='lrr', booktabs = T) |> row_spec(0, bold = T)

Method	RMSE	RMSE (clamped estimates)
Section 2 best model + Matrix factorization	0.7949206	0.7939817

The RMSEs of all models in this report, evaluated using edx_test, are as follows:

RMSEs |> kable(align='lrr', booktabs = T, linesep='') |> row_spec(0, bold = T)

Method	RMSE	RMSE (clamped estimates)
Mean only	1.0600537	1.0600537
Movie effects	0.9429615	0.9429615
Movie + user effects	0.8646843	0.8644818
Movie + user + genre effects	0.8643241	0.8641138
Movie + user + genre + time effects	0.8641266	0.8639174
Movie + user + genre + time effects (regularized)	0.8636151	0.8634932
Section 2 best model + Matrix factorization	0.7949206	0.7939817

We now “submit” our best model, i.e. \[\begin{equation} Y_{u,i} \sim \mu + b_{1;i} + b_{2;u} + b_{3;g(i)} + f(t_{u,i}) + UV^\mathrm{T} + \varepsilon_{u,i}, \tag{3.1} \end{equation}\]

with parameters mu, movie_biases_reg, user_biases_reg, genre_biases_reg, f_t, U, and V, for final validation.

References

Çoba, Ludovik. 2019. Rrecsys: Environment for Evaluating Recommender Systems. https://cran.r-project.org/package=rrecsys.

Funk, Simon. 2006. “Netflix Update: Try This at Home.” http://sifter.org/~simon/journal/20061211.html.

Hahsler, Michael. 2021. Recommenderlab: Lab for Developing and Testing Recommender Algorithms. https://cran.r-project.org/package=recommenderlab.

Koren, Yehuda, Robert Bell, and Chris Volinsky. 2009. “Matrix Factorization Techniques for Recommender Systems.” Computer 42 (8): 30–37. https://doi.org/10.1109/mc.2009.263.