Chapter 3 Funk’s matrix factorization algorithm
In this section, we consider Funk’s matrix factorization (MF) algorithm (Funk 2006; Koren, Bell, and Volinsky 2009) for rating prediction. We use the model \(Y \sim P + UV^\mathrm{T} + \varepsilon\) where:
- \(Y\) is the \(N_\mathrm{U}\times N_\mathrm{M}\) rating matrix, i.e., with \(N_\mathrm{U}\) users and \(N_\mathrm{M}\) movies,
- \(P\) represents the predictions from best model of the previous section,
- \(U\) and \(V\) are \(N_\mathrm{U} \times k\) and \(N_\mathrm{M} \times k\) matrices, respectively, where \(k\) is the number of latent features to be found.
Unknown ratings \(Y_{u,i}\) can thus be estimated as \(P_{u,i} + U_u V_i^\mathrm{T}\). The parameter \(k\) is also the rank of matrix \(UV\); i.e. \(UV\) is a rank-\(k\) approximation of the residual matrix \(Y-P\).
Funk’s MF estimates \(U\) and \(V\) using gradient descent, but operating only on the known ratings. First, \(U\) and \(V\) are seeded with random values. Then, for each epoch, the algorithm iterates over all known ratings \((u,i)\) in the training set and updates the feature matrices as follows:
\[e_{u,i} = Y_{u,i} - P_{u,i} - U_u V_i^\mathrm{T}\] \[U_u \gets U_u + \gamma(e_{u,i} V_i - \lambda U_u)\] \[V_i \gets V_i + \gamma(e_{u,i} U_u - \lambda V_i)\]
where \(\gamma\) is the learning rate and \(\lambda\) is a regularization parameter. In this report, these are set to 0.02 and 0.001, respectively, in accordance to guidance from Funk (2006), and we will only optimize the rank parameter \(k\).
3.1 Computing the residuals
The following code computes \(Y_{u,i} - P_{u,i}\) for all user-movie pairs \((u,i)\) in the training set, and creates an index mapping eliminating users and movies with no rating pairs.
# Compute residuals from previous best model
previous_train <- genre_biases_reg[
user_biases_reg[
movie_biases_reg[
edx_train, on = 'movieId'],
on = 'userId'],
on = 'genres'] |>
mutate(pred = mu + b_i + b_u + b_g + f_t[weekNum]) |>
pull(pred)
residuals_train <- as.numeric(edx_train$rating - previous_train)
# Generate test set predictions for previous best model
previous_test <- genre_biases_reg[
user_biases_reg[
movie_biases_reg[
edx_test, on = 'movieId'],
on = 'userId'],
on = 'genres'] |>
mutate(pred = mu + b_i + b_u + b_g + f_t[weekNum]) |>
pull(pred)
# Obtain new movie and user indices **without gaps**, and save the mappings
Uidx <- numeric(max(edx_train$userId))
Uidx[unique(edx_train$userId)] = seq(uniqueN(edx_train$userId))
Vidx <- numeric(max(edx_train$movieId))
Vidx[unique(edx_train$movieId)] = seq(uniqueN(edx_train$movieId))3.2 The recommenderlab package: first failure
The recommenderlab package (Hahsler 2021) contains an funkSVD function that accepts a
realRatingMatrix object as input. Note that this object is expected to contain the actual
ratings rather than a residual matrix.
This object is easy to create and does not consume too much memory:
# Create the recommenderlab::realRatingMatrix object
mat <- new(
className("realRatingMatrix", "recommenderlab"),
data = sparseMatrix(Uidx[edx_train$userId],
Vidx[edx_train$movieId],
x = edx_train$rating)
)
format(object.size(mat), units = "auto", standard="SI")## [1] "97.2 MB"
However, attempting to factor this matrix as follows returns an error, as shown below.
This suggests that recommenderlab uses dense matrices in its
internal functions.
# NOT RUN
# returns:
# <simpleError: cannot allocate vector of size 5.6 Gb>
tryCatch(recommenderlab::funkSVD(mat), error = print)3.3 The rrecsys package: second failure
Like recommenderlab, the rrecsys package (Çoba 2019) also contains an implementation of the
Funk MF algorithm, again accepting the raw ratings as input. However,
the following attempt to convert the training set into a format the rrecsys package can
understand results in many GB of memory being requested, suggesting that while
rrecsys::defineData understands sparse matrix input in coordinate form, the package
does not use sparse matrix representations internally:
# NOT RUN
mat <- rrecsys::defineData(cbind(Uidx[edx_train$userId],
Vidx[edx_train$movieId],
x = edx_train$rating),
sparseMatrix = T,
binary = F,
minimum = 0.5,
maximum = 5,
intScale = TRUE)The above operation did not complete after several minutes and was aborted.
3.4 Writing our own Funk MF algorithm
In light of the above failures, a fresh implementation of the Funk MF algorithm, using
RCpp, was written. The C++ source code is available at
https://yinchi.github.io/harvardx-movielens/svd.cpp and in Appendix A
and is loaded into the R environment below:
# Funk matrix factorization. See C++ source for full documentation.
# Default values for regCoef and learningRate are as suggested by [Funk 2006].
Rcpp::sourceCpp("svd.cpp")
funk <- function(Uidx, Vidx, residuals, nFeatures, steps = 500,
regCoef = 0.02, learningRate = 1e-3) {
# Change Uidx and Vidx to 0-based, for C++ only.
funkCpp(Uidx[edx_train$userId] - 1,
Vidx[edx_train$movieId] - 1,
residuals_train,
nFeatures, steps, regCoef, learningRate)
}3.5 Computing the optimal rank of matrix \(UV\)
The following code plots the prediction error of the new model against the number of latent features in the Funk matrix factorization, i.e. the rank of matrix \(UV\):
# Compute RMSE values for varying number of MF features, if saved file not found.
set.seed(1)
if (!file.exists('funk_tuning.Rdata')) {
nFeatures <- c(1, 2, 4, 8, seq(12,20), 24, 28, 32)
rmses <- sapply(nFeatures, \(nF){
message(nF, ' features')
# Run Funk MF
set.seed(1)
funkResult <- funk(Uidx, Vidx, residuals_train, nFeatures = nF, steps = 500)
U <- funkResult$U
V <- funkResult$V
# Uidx[u] is the row index of user u in matrix U
# Vidx[v] is the row index of movie v in matrix V
predicted_ratings_funk <- edx_test |>
mutate(pred = previous_test +
map2_dbl(userId, movieId, \(u,v) U[Uidx[u],] %*% V[Vidx[v],])) |>
pull(pred)
rmse <- RMSE(predicted_ratings_funk, edx_test$rating)
message(rmse,'\n')
return(rmse)
})
save(nFeatures,rmses, file = 'funk_tuning.Rdata')
}
set.seed(1)
# Load RMSE data from file
load('funk_tuning.Rdata')
# Plot RMSE against number of MF features.
par(cex = 0.7)
qplot(nFeatures, rmses, xlab = 'rank(UV)', ylab = 'RMSE', geom = c('point','line'))
The optimal number of latent features is:
nFeaturesOpt <- nFeatures[which.min(rmses)]
nFeaturesOpt## [1] 14
3.6 Final matrix factorization and RMSE values
Using the new model with \(k=14\) to predict ratings for the edx_test gives the following RMSE
values:
# Run Funk MF if saved file not found
set.seed(1)
if (!file.exists('funk.Rdata')) {
funkResult <-
funk(Uidx, Vidx, residuals_train, nFeatures = nFeaturesOpt, steps = 500)
save(nFeaturesOpt, funkResult, file = 'funk.Rdata')
}
set.seed(1)
# Load MF data from file
load('funk.Rdata')
U <- funkResult$U
V <- funkResult$V
# Uidx[u] is the row index of user u in matrix U
# Vidx[v] is the row index of movie v in matrix V
predicted_ratings_funk <- edx_test |>
mutate(pred = previous_test +
map2_dbl(userId, movieId, \(u,v) U[Uidx[u],] %*% V[Vidx[v],])) |>
pull(pred)
rmse <- RMSE(predicted_ratings_funk, edx_test$rating)
# Compute RMSE and add to data.table
RMSEs <- RMSEs |>
add_row(Method = "Section 2 best model + Matrix factorization",
RMSE = RMSE(predicted_ratings_funk, edx_test$rating),
"RMSE (clamped estimates)" =
RMSE(clamp(predicted_ratings_funk),edx_test$rating))
RMSEs[nrow(RMSEs),] |>
kable(align='lrr', booktabs = T) |> row_spec(0, bold = T)| Method | RMSE | RMSE (clamped estimates) |
|---|---|---|
| Section 2 best model + Matrix factorization | 0.7949206 | 0.7939817 |
The RMSEs of all models in this report, evaluated using edx_test, are as follows:
RMSEs |> kable(align='lrr', booktabs = T, linesep='') |> row_spec(0, bold = T)| Method | RMSE | RMSE (clamped estimates) |
|---|---|---|
| Mean only | 1.0600537 | 1.0600537 |
| Movie effects | 0.9429615 | 0.9429615 |
| Movie + user effects | 0.8646843 | 0.8644818 |
| Movie + user + genre effects | 0.8643241 | 0.8641138 |
| Movie + user + genre + time effects | 0.8641266 | 0.8639174 |
| Movie + user + genre + time effects (regularized) | 0.8636151 | 0.8634932 |
| Section 2 best model + Matrix factorization | 0.7949206 | 0.7939817 |
We now “submit” our best model, i.e. \[\begin{equation} Y_{u,i} \sim \mu + b_{1;i} + b_{2;u} + b_{3;g(i)} + f(t_{u,i}) + UV^\mathrm{T} + \varepsilon_{u,i}, \tag{3.1} \end{equation}\]
with parameters mu, movie_biases_reg, user_biases_reg, genre_biases_reg,
f_t, U, and V, for final validation.