Saturday, May 05, 2018

Singular Value Decomposition for Recommendations with Tensorflow


I recently completed (auditing) the Matrix Factorization and Advanced Techniques course on Coursera, conducted by the same people from the University of Minnesota who gave us the Lenskit project and an earlier awesome course on Recommender Systems on Coursera. Since I am auditing the course, Coursera no longer allows me to submit answers to quizzes and assignments. That is a good and a bad thing - while I can no longer check my understanding of the material by submitting the quizzes, it also frees me to do the assignment using any toolset that I choose.

One other interesting insight I got from the course was that Matrix Factorization could be formulated as a Gradient Descent problem. That is actually what led me to Tensorflow (TF) as my toolset of choice for this task. While TF is best known for building Deep Learning (DL) models, at its core it's a symbolic math library for high performance numerical computions that can be used to solve many kinds of numerical optimization problems that are common in Machine Learning (ML), with Gradient Descent being the central algorithm that is most (all?) of TF's built-in optimizers are based on.

The assignment was to investigate the Singular Value Decomposition (SVD) algorithm as a Matrix Factorization technique to do recommendations. The ratings matrix is de-biased using different techniques, and each de-biased matrix is decomposed an approximation reconstructed using various values of k. In this post, I describe how I did this using TF and Python. We start by setting up some imports and constants to refer to data files locations.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import tensorflow as tf

DATA_DIR = "../data"
MOVIE_FILE = os.path.join(DATA_DIR, "movies.csv")
RATINGS_FILE = os.path.join(DATA_DIR, "ratings.csv")
RESULTS_FILE = os.path.join(DATA_DIR, "eval_results.csv")
CONSOLE_FILE = os.path.join(DATA_DIR, "eval_console.txt")

The first step is extracting the data to a matrix format. We are supplied a file movies.csv which contains the movie_id and title, and a file ratings.csv which contains the user_id, movie_id and a 4-point rating. The movies.csv file is needed to report back recommendations to the user in terms of movie names. We use pandas to read the files into memory, and create few lookup tables that we will need later.

1
2
movies_df = pd.read_csv(MOVIE_FILE)
movies_df.head()


1
2
ratings_df = pd.read_csv(RATINGS_FILE)
ratings_df.head()


We then set up various lookup tables that we will need to translate ratings into matrix elements and recommendations from the matrix factorization back to human readable form.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
movies_id2name, movies_name2id = {}, {}
ids = movies_df["movieId"].values
titles = movies_df["title"].values
for id, title in zip(ids, titles):
    movies_id2name[id] = title
    movies_name2id[title] = id

movies_id2idx, movies_idx2id = {}, {}
for idx, movie_id in enumerate(ratings_df["movieId"].unique()):
    movies_id2idx[movie_id] = idx
    movies_idx2id[idx] = movie_id

users_id2idx, users_idx2id = {}, {}
for idx, user_id in enumerate(ratings_df["userId"].unique()):
    users_id2idx[user_id] = idx
    users_idx2id[idx] = user_id

num_users = len(users_id2idx)
num_movies = len(movies_id2idx)

Finally, we are ready to construct the matrix. We choose to use a dense matrix full of zeros to start with, where rows represent users and columns represent movies. We have 862 users and 2500 items, so our matrix R_val is of shape (862, 2500). We use the lookup tables we just generated to fill out the ratings for each (user, movie) pair that we have information for.

1
2
3
4
5
6
7
8
9
def construct_original_matrix(num_users, num_movies, ratings_df, 
                          users_id2idx, movies_id2idx):
    X = np.zeros((num_users, num_movies), dtype=np.float32)
    for user_id, movie_id, rating in ratings_df[["userId", "movieId", "rating"]].values:
        X[users_id2idx[user_id], movies_id2idx[movie_id]] = rating
    return X

R_val = construct_original_matrix(num_users, num_movies, ratings_df,
                                  users_id2idx, movies_id2idx)

The next step is to compute different kinds of bias and remove it from the matrix. This allows SVD to be more effective, as we will see in the results later. Bias can be split up into global bg, user bu and user-item bui, which represent the bias across all users, bias per user and bias per item per user.


Since this is a sparse matrix, we will treat all zero entries as unknown, and only consider non-zero entries for computing the bias. This is done using the code below. As you can see, this is essentially computing and removing average values of all non-zero entries along different axes.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def compute_bias(X, bias_type):
    Xc = X.copy()
    Xc[Xc == 0] = np.nan
    if bias_type == "global":
        return np.nanmean(Xc)
    elif bias_type == "user":
        return np.mean(np.nanmean(Xc, axis=0))
    elif bias_type == "item":
        return np.mean(np.nanmean(Xc, axis=1))
    else:
        raise Exception("invalid bias type")


def remove_bias(X, bias):
    Xc = X.copy()
    Xc[Xc == 0] = np.nan
    Xc = np.subtract(Xc, bias)
    Xc = np.nan_to_num(Xc)
    return Xc

        
bg = compute_bias(R_val, "global")
Rg_val = remove_bias(R_val, bg)

bu = compute_bias(Rg_val, "user")
Ru_val = remove_bias(Rg_val, bu)

bi = compute_bias(Rg_val, "item")
Ri_val = remove_bias(Rg_val, bi)

bui = compute_bias(Ru_val, "item")
Rui_val = remove_bias(Ru_val, bui)

Now that we have our different debiased input matrices, the next step is to decompose it into its constituents using SVD, and recompose it to an approximation using smaller dimensions. This will give us a matrix whose reduced columns correspond to user-tastes rather than actual movies, so we can make predictions of how a user will rank the movies they haven't seen already, or rank movies according to how the user would like them. We use TF's built-in svd() call here.


The graph will factorize the input matrix R using SVD into matrices U, S and V, recompose reduced versions of the three matrices to get an approximation R'. It then computes the reconstruction error (RMSE) between R and R', and also the proportion of explained variance R2 and returns it.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def reconstruct_using_svd(X, k):
    
    if k == 0:
        return X, 1., 0.
    
    graph = tf.Graph()
    with graph.as_default():
        
        # input arg
        R = tf.placeholder(tf.float32, shape=(num_users, num_movies), name="R")
        
        # run SVD
        S, U, Vt = tf.svd(R, full_matrices=True)
        
        # reduce dimensions
        Sk = tf.diag(S)[0:k, 0:k]
        Uk = U[:, 0:k]
        Vk = tf.transpose(Vt)[0:k, :]

        # reconstruct matrix
        Rprime = tf.matmul(Uk, tf.matmul(Sk, Vk))

        
        # compute reconstruction RMSE
        rsquared = tf.linalg.norm(Rprime) / tf.linalg.norm(R)
        rmse = tf.metrics.root_mean_squared_error(R, Rprime)[1]
        
    with tf.Session(graph=graph) as sess:
        
        tf.global_variables_initializer().run()
        tf.local_variables_initializer().run()
        
        [Rprime_val, rsquared_val, rmse_val] = sess.run(
            [Rprime, rsquared, rmse], feed_dict={R: X})
        return Rprime_val, rsquared_val, rmse_val


R_rec, rsquared, rec_err = reconstruct_using_svd(R_val, 10)

print("reconstruction error (RMSE):", rec_err)
print("percentage of variance explained: {:.3f}".format(rsquared))
print("shape of reconstructed matrix: ", R_rec.shape)

We also create a method predict for predicting ratings that a user might give to a movie he hasn't seen yet, and a method recommend for recommending movies for a given user. Both methods take in a de-biased matrix and the bias to apply to the final results.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def predict(user_id, movie_ids, X_rec, bias, 
            users_id2idx, movies_id2idx, movies_id2name):
    predictions = []
    for movie_id in sorted(movie_ids):
        user_idx = users_id2idx[user_id]
        movie_idx = movies_id2idx[movie_id]
        movie_name = movies_id2name[movie_id]
        prediction = bias + X_rec[user_idx, movie_idx]
        predictions.append((user_id, movie_id, movie_name, prediction))
    return predictions


def recommend(user_id, X_rec, top_n, bias, 
              users_id2idx, movies_idx2id, movies_id2name):
    user_idx = users_id2idx[user_id]
    rec_movies_idxs = np.argsort(-1 * X_rec[user_idx])[0:top_n]
    recommendations = []
    for movie_idx in rec_movies_idxs:
        movie_id = movies_idx2id[movie_idx]
        movie_name = movies_id2name[movie_id]
        pred_rating = bias + X_rec[user_idx, movie_idx]
        recommendations.append((user_id, movie_id, movie_name, pred_rating))
    return recommendations


R_rec, _, _ = reconstruct_using_svd(Rui_val, 50)
preds = predict(320, [260, 153, 527, 588], R_rec, bg + bu + bui, users_id2idx, 
                movies_id2idx, movies_id2name)
for pred in preds:
    print(pred)
print("---")
R_rec, _, _ = reconstruct_using_svd(Ri_val, 50)
recs = recommend(320, R_rec, 10, bg + bi, users_id2idx, movies_idx2id, movies_id2name)
for rec in recs:
    print(rec)

Here, we invoke these methods with parameters that are shown in the examples in the instructions. Unfortunately, my results don't match the example results provided. My suspicion is that there may be implementation differences between the Lenskit stack and the TF stack that are causing these differences, but I can't say for sure. If you notice something I am doing wrong, please let me know and I will fix it and report my findings again crediting you with the fix.

(320, 153, 'Batman Forever (1995)', 3.5883482)
(320, 260, 'Star Wars: Episode IV - A New Hope (1977)', 4.031598)
(320, 527, "Schindler's List (1993)", 3.769835)
(320, 588, 'Aladdin (1992)', 3.5995538)
---
(320, 2571, 'Matrix, The (1999)', 4.281472)
(320, 2959, 'Fight Club (1999)', 4.2387533)
(320, 4993, 'Lord of the Rings: The Fellowship of the Ring, The (2001)', 4.0973997)
(320, 296, 'Pulp Fiction (1994)', 4.089973)
(320, 1196, 'Star Wars: Episode V - The Empire Strikes Back (1980)', 4.0807734)
(320, 1136, 'Monty Python and the Holy Grail (1975)', 4.055152)
(320, 5952, 'Lord of the Rings: The Two Towers, The (2002)', 4.0527725)
(320, 7153, 'Lord of the Rings: The Return of the King, The (2003)', 4.0478134)
(320, 260, 'Star Wars: Episode IV - A New Hope (1977)', 4.031599)
(320, 4973, "Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)", 4.020437)

Now that we have all our pieces in place, we construct our evaluation loop, where we re-run the matrix factorization using different bias models to de-bias the input ratings matrix R, and reconstruct approximations with different values of k, and report the reconstruction error (RMSE) and the proportion of explained variance (R2) for each bias-model and k combination. I also write out the output of the predict and recommend functions above for a specific user and a set of movies for each combination into an external file where they can be read from.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def print_console(fconsole, X, k, bias_model, bias,
                  users_id2idx, movies_id2idx, movies_idx2id, 
                  movies_id2name):
    fconsole.write("#### predict -PuserId=320 -PitemIds=260,153,527,588 -PbiasModel={:s} -PfeatureCount={:d}\n"
                   .format(bias_model, k))
    predictions = predict(320, [260, 153, 527, 588], X, bias, 
                          users_id2idx, movies_id2idx, movies_id2name)
    for user_id, movie_id, movie_name, prediction in predictions:
        fconsole.write("\t{:d}\t{:d}\t{:s}\t{:.3f}\n"
            .format(user_id, movie_id, movie_name, prediction))
    fconsole.write("#### recommend -PuserId=320 -PbiasModel={:s} -PfeatureCount={:d}\n"
                   .format(bias_model, k))
    recommendations = recommend(320, X, 10, bias, users_id2idx, 
                                movies_idx2id, movies_id2name)
    for user_id, movie_id, movie_name, pred_rating in recommendations:
        fconsole.write("\t{:d}\t{:d}\t{:s}\t{:.3f}\n"
            .format(user_id, movie_id, movie_name, pred_rating))
    fconsole.write("\n")


kvalues = [0, 1, 5, 10, 15, 20, 25, 30, 40, 50]
bias_models = ["none", "global", "user", "item", "user-item"]

fresults = open(RESULTS_FILE, "w")
fconsole = open(CONSOLE_FILE, "w")
for k in kvalues:
    print("running evaluation for k={:d}".format(k))
    # reconstruct original matrix
    R_val = np.zeros((num_users, num_movies), dtype=np.float32)
    for user_id, movie_id, rating in ratings_df[["userId", "movieId", "rating"]].values:
        R_val[users_id2idx[user_id], movies_id2idx[movie_id]] = rating
    # no bias
    bias_model = bias_models[0]
    R_rec, rsquared, rmse = reconstruct_using_svd(R_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, 0, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global bias
    bias_model = bias_models[1]
    bg = compute_bias(R_val, bias_model)
    Rg_val = remove_bias(R_val, bg)
    R_rec, rsquared, rmse = reconstruct_using_svd(Rg_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global + user bias
    bias_model = bias_models[2]
    bu = compute_bias(Rg_val, bias_model)
    Ru_val = remove_bias(Rg_val, bu)
    R_rec, rsquared, rmse = reconstruct_using_svd(Ru_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg + bu, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global + item bias
    bias_model = bias_models[3]
    bi = compute_bias(Rg_val, bias_model)
    Ri_val = remove_bias(Rg_val, bi)
    R_rec, rsquared, rmse = reconstruct_using_svd(Ri_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg + bi, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)
    # global + user bias + item bias (user-item)
    bias_model = bias_models[4]
    bui = compute_bias(Ru_val, "item")
    Rui_val = remove_bias(Ru_val, bui)
    R_rec, rsquared, rmse = reconstruct_using_svd(Rui_val, k)
    print(".. k={:d}, bias-model={:s}, reconstruction: R^2={:.3f}, RMSE={:.3f}"
          .format(k, bias_model, rsquared, rmse))
    fresults.write("{:d}\t{:s}\t{:.7f}\t{:.7f}\n".format(k, bias_model, 
                                                         rsquared, rmse))
    print_console(fconsole, R_rec, k, bias_model, bg + bu + bui, users_id2idx, 
                  movies_id2idx, movies_idx2id, movies_id2name)

fresults.close()
fconsole.close()

As can be seen on the chart on the left below, the error (RMSE) between the original and reconstructed for the raw matrix is much higher than the de-biased models. On the chart on the right, we have removed the raw matrix in order to see the RMSE values for the different bias models more clearly. As you can see, the performance of the different bias models are really close, with the global bias model performing the best in this case.






Similarly, the R2 values are also plotted for different values of k and exhibit an increase as the rank k of the reconstructed matrix increases. The user bias model seems to do the best here at explaining the variance between the input and reconstructed matrix.


That's all I had for today. This was the first time I attempted to use TF for something other than Deep Learning. While it is not as comprehensive as standard ML libraries like scikit-learn, it does have functionality for k-means clustering, decision trees, CRF, etc, so definitely something to look at further in the future.