Chapter 3 Funk’s matrix factorization algorithm
In this section, we consider Funk’s matrix factorization (MF) algorithm (Funk 2006; Koren, Bell, and Volinsky 2009) for rating prediction. We use the model \(Y \sim P + UV^\mathrm{T} + \varepsilon\) where:
- \(Y\) is the \(N_\mathrm{U}\times N_\mathrm{M}\) rating matrix, i.e., with \(N_\mathrm{U}\) users and \(N_\mathrm{M}\) movies,
- \(P\) represents the predictions from best model of the previous section,
- \(U\) and \(V\) are \(N_\mathrm{U} \times k\) and \(N_\mathrm{M} \times k\) matrices, respectively, where \(k\) is the number of latent features to be found.
Unknown ratings \(Y_{u,i}\) can thus be estimated as \(P_{u,i} + U_u V_i^\mathrm{T}\). The parameter \(k\) is also the rank of matrix \(UV\); i.e. \(UV\) is a rank-\(k\) approximation of the residual matrix \(Y-P\).
Funk’s MF estimates \(U\) and \(V\) using gradient descent, but operating only on the known ratings. First, \(U\) and \(V\) are seeded with random values. Then, for each epoch, the algorithm iterates over all known ratings \((u,i)\) in the training set and updates the feature matrices as follows:
\[e_{u,i} = Y_{u,i} - P_{u,i} - U_u V_i^\mathrm{T}\] \[U_u \gets U_u + \gamma(e_{u,i} V_i - \lambda U_u)\] \[V_i \gets V_i + \gamma(e_{u,i} U_u - \lambda V_i)\]
where \(\gamma\) is the learning rate and \(\lambda\) is a regularization parameter. In this report, these are set to 0.02 and 0.001, respectively, in accordance to guidance from Funk (2006), and we will only optimize the rank parameter \(k\).
3.1 Computing the residuals
The following code computes \(Y_{u,i} - P_{u,i}\) for all user-movie pairs \((u,i)\) in the training set, and creates an index mapping eliminating users and movies with no rating pairs.
# Compute residuals from previous best model
<- genre_biases_reg[
previous_train
user_biases_reg[
movie_biases_reg[= 'movieId'],
edx_train, on = 'userId'],
on = 'genres'] |>
on mutate(pred = mu + b_i + b_u + b_g + f_t[weekNum]) |>
pull(pred)
<- as.numeric(edx_train$rating - previous_train)
residuals_train
# Generate test set predictions for previous best model
<- genre_biases_reg[
previous_test
user_biases_reg[
movie_biases_reg[= 'movieId'],
edx_test, on = 'userId'],
on = 'genres'] |>
on mutate(pred = mu + b_i + b_u + b_g + f_t[weekNum]) |>
pull(pred)
# Obtain new movie and user indices **without gaps**, and save the mappings
<- numeric(max(edx_train$userId))
Uidx unique(edx_train$userId)] = seq(uniqueN(edx_train$userId))
Uidx[
<- numeric(max(edx_train$movieId))
Vidx unique(edx_train$movieId)] = seq(uniqueN(edx_train$movieId)) Vidx[
3.2 The recommenderlab
package: first failure
The recommenderlab
package (Hahsler 2021) contains an funkSVD
function that accepts a
realRatingMatrix
object as input. Note that this object is expected to contain the actual
ratings rather than a residual matrix.
This object is easy to create and does not consume too much memory:
# Create the recommenderlab::realRatingMatrix object
<- new(
mat className("realRatingMatrix", "recommenderlab"),
data = sparseMatrix(Uidx[edx_train$userId],
$movieId],
Vidx[edx_trainx = edx_train$rating)
)format(object.size(mat), units = "auto", standard="SI")
## [1] "97.2 MB"
However, attempting to factor this matrix as follows returns an error, as shown below.
This suggests that recommenderlab
uses dense matrices in its
internal functions.
# NOT RUN
# returns:
# <simpleError: cannot allocate vector of size 5.6 Gb>
tryCatch(recommenderlab::funkSVD(mat), error = print)
3.3 The rrecsys
package: second failure
Like recommenderlab
, the rrecsys
package (Çoba 2019) also contains an implementation of the
Funk MF algorithm, again accepting the raw ratings as input. However,
the following attempt to convert the training set into a format the rrecsys
package can
understand results in many GB of memory being requested, suggesting that while
rrecsys::defineData
understands sparse matrix input in coordinate form, the package
does not use sparse matrix representations internally:
# NOT RUN
<- rrecsys::defineData(cbind(Uidx[edx_train$userId],
mat $movieId],
Vidx[edx_trainx = edx_train$rating),
sparseMatrix = T,
binary = F,
minimum = 0.5,
maximum = 5,
intScale = TRUE)
The above operation did not complete after several minutes and was aborted.
3.4 Writing our own Funk MF algorithm
In light of the above failures, a fresh implementation of the Funk MF algorithm, using
RCpp
, was written. The C++ source code is available at
https://yinchi.github.io/harvardx-movielens/svd.cpp and in Appendix A
and is loaded into the R environment below:
# Funk matrix factorization. See C++ source for full documentation.
# Default values for regCoef and learningRate are as suggested by [Funk 2006].
::sourceCpp("svd.cpp")
Rcpp<- function(Uidx, Vidx, residuals, nFeatures, steps = 500,
funk regCoef = 0.02, learningRate = 1e-3) {
# Change Uidx and Vidx to 0-based, for C++ only.
funkCpp(Uidx[edx_train$userId] - 1,
$movieId] - 1,
Vidx[edx_train
residuals_train,
nFeatures, steps, regCoef, learningRate) }
3.5 Computing the optimal rank of matrix \(UV\)
The following code plots the prediction error of the new model against the number of latent features in the Funk matrix factorization, i.e. the rank of matrix \(UV\):
# Compute RMSE values for varying number of MF features, if saved file not found.
set.seed(1)
if (!file.exists('funk_tuning.Rdata')) {
<- c(1, 2, 4, 8, seq(12,20), 24, 28, 32)
nFeatures <- sapply(nFeatures, \(nF){
rmses
message(nF, ' features')
# Run Funk MF
set.seed(1)
<- funk(Uidx, Vidx, residuals_train, nFeatures = nF, steps = 500)
funkResult <- funkResult$U
U <- funkResult$V
V
# Uidx[u] is the row index of user u in matrix U
# Vidx[v] is the row index of movie v in matrix V
<- edx_test |>
predicted_ratings_funk mutate(pred = previous_test +
map2_dbl(userId, movieId, \(u,v) U[Uidx[u],] %*% V[Vidx[v],])) |>
pull(pred)
<- RMSE(predicted_ratings_funk, edx_test$rating)
rmse
message(rmse,'\n')
return(rmse)
})
save(nFeatures,rmses, file = 'funk_tuning.Rdata')
}set.seed(1)
# Load RMSE data from file
load('funk_tuning.Rdata')
# Plot RMSE against number of MF features.
par(cex = 0.7)
qplot(nFeatures, rmses, xlab = 'rank(UV)', ylab = 'RMSE', geom = c('point','line'))
The optimal number of latent features is:
<- nFeatures[which.min(rmses)]
nFeaturesOpt nFeaturesOpt
## [1] 14
3.6 Final matrix factorization and RMSE values
Using the new model with \(k=14\) to predict ratings for the edx_test
gives the following RMSE
values:
# Run Funk MF if saved file not found
set.seed(1)
if (!file.exists('funk.Rdata')) {
<-
funkResult funk(Uidx, Vidx, residuals_train, nFeatures = nFeaturesOpt, steps = 500)
save(nFeaturesOpt, funkResult, file = 'funk.Rdata')
}set.seed(1)
# Load MF data from file
load('funk.Rdata')
<- funkResult$U
U <- funkResult$V
V
# Uidx[u] is the row index of user u in matrix U
# Vidx[v] is the row index of movie v in matrix V
<- edx_test |>
predicted_ratings_funk mutate(pred = previous_test +
map2_dbl(userId, movieId, \(u,v) U[Uidx[u],] %*% V[Vidx[v],])) |>
pull(pred)
<- RMSE(predicted_ratings_funk, edx_test$rating)
rmse
# Compute RMSE and add to data.table
<- RMSEs |>
RMSEs add_row(Method = "Section 2 best model + Matrix factorization",
RMSE = RMSE(predicted_ratings_funk, edx_test$rating),
"RMSE (clamped estimates)" =
RMSE(clamp(predicted_ratings_funk),edx_test$rating))
nrow(RMSEs),] |>
RMSEs[kable(align='lrr', booktabs = T) |> row_spec(0, bold = T)
Method | RMSE | RMSE (clamped estimates) |
---|---|---|
Section 2 best model + Matrix factorization | 0.7949206 | 0.7939817 |
The RMSEs of all models in this report, evaluated using edx_test
, are as follows:
|> kable(align='lrr', booktabs = T, linesep='') |> row_spec(0, bold = T) RMSEs
Method | RMSE | RMSE (clamped estimates) |
---|---|---|
Mean only | 1.0600537 | 1.0600537 |
Movie effects | 0.9429615 | 0.9429615 |
Movie + user effects | 0.8646843 | 0.8644818 |
Movie + user + genre effects | 0.8643241 | 0.8641138 |
Movie + user + genre + time effects | 0.8641266 | 0.8639174 |
Movie + user + genre + time effects (regularized) | 0.8636151 | 0.8634932 |
Section 2 best model + Matrix factorization | 0.7949206 | 0.7939817 |
We now “submit” our best model, i.e. \[\begin{equation} Y_{u,i} \sim \mu + b_{1;i} + b_{2;u} + b_{3;g(i)} + f(t_{u,i}) + UV^\mathrm{T} + \varepsilon_{u,i}, \tag{3.1} \end{equation}\]
with parameters mu
, movie_biases_reg
, user_biases_reg
, genre_biases_reg
,
f_t
, U
, and V
, for final validation.