< prev | next >

Collaborative Filtering

Collaborative Filtering is a method of making predictions about the interests of a single user by collecting preferences from many users.

Machine Learning Models used for collaborative filtering include:

The algorithm in the example below uses a cosine similarity function to measure the similarity between vectors of an inner product space.

Python Example

This example code uses a Nearest Neighbors algorithm for the collaborative filtering model.

To download the code, click here.

"""
collaborative_filtering_with_scikit-surprise.py
creates and tests a collaborative filtering model
"""

# Import needed functions.
import pandas
from surprise import Dataset
from surprise import Reader
from surprise import KNNWithMeans

# Define parameters.
lowest_rating = 1
highest_rating = 5
similarity_function = "cosine"
user_based_similarities = False
similarity_options = {
    "name": similarity_function,
    "user_based": user_based_similarities}
data_frame_columns = ["user", "item", "rating"]
ratings_dictionary = {
    "item": [1, 2, 1, 2, 1, 2, 1, 2, 1],
    "user": ['Joe', 'Joe', 'Sue', 'Sue', 'Fred', 'Fred', 'Jane', 'Jane', 'Tom'],
    "rating": [2, 3, 2, 4, 3, 1, 4, 5, 1]}
prediction_user = "Tom"
prediction_item = 2

# Create a pandas data frame using the ratings dictionary.
data_frame = pandas.DataFrame(ratings_dictionary)

# Define a data reader.
reader = Reader(rating_scale=(lowest_rating, highest_rating))

# Load data from the data frame using the reader.
data = Dataset.load_from_df(data_frame[data_frame_columns], reader)

# Define a K Nearest Neighbors algorithm.
knn_algorithm = KNNWithMeans(sim_options=similarity_options)

# Create a training dataset.
training_data = data.build_full_trainset()

# Train the algorithm.
knn_algorithm.fit(training_data)

# Process a prediction for an unknown user item rating.
prediction = knn_algorithm.predict(prediction_user, prediction_item)
predicted_rating = prediction.est
predicted_rating_rounded = round(predicted_rating, 0)
print("Predicted Rating:")
print(predicted_rating)
print("Predicted Rating Rounded:")
print(predicted_rating_rounded)

The Results are shown below:

Computing the cosine similarity matrix...
Done computing similarity matrix.
Predicted Rating:
1.85
Predicted Rating Rounded:
2.0