Collaborative Filtering
Collaborative Filtering is a method of making predictions about the interests of a single user by collecting preferences from many users.
Machine Learning Models used for collaborative filtering include:
The algorithm in the example below uses a cosine similarity function to measure the similarity between vectors of an inner product space.
Python Example
This example code uses a Nearest Neighbors algorithm for the collaborative filtering model.
To download the code, click here.
""" collaborative_filtering_with_scikit-surprise.py creates and tests a collaborative filtering model """ # Import needed functions. import pandas from surprise import Dataset from surprise import Reader from surprise import KNNWithMeans # Define parameters. lowest_rating = 1 highest_rating = 5 similarity_function = "cosine" user_based_similarities = False similarity_options = { "name": similarity_function, "user_based": user_based_similarities} data_frame_columns = ["user", "item", "rating"] ratings_dictionary = { "item": [1, 2, 1, 2, 1, 2, 1, 2, 1], "user": ['Joe', 'Joe', 'Sue', 'Sue', 'Fred', 'Fred', 'Jane', 'Jane', 'Tom'], "rating": [2, 3, 2, 4, 3, 1, 4, 5, 1]} prediction_user = "Tom" prediction_item = 2 # Create a pandas data frame using the ratings dictionary. data_frame = pandas.DataFrame(ratings_dictionary) # Define a data reader. reader = Reader(rating_scale=(lowest_rating, highest_rating)) # Load data from the data frame using the reader. data = Dataset.load_from_df(data_frame[data_frame_columns], reader) # Define a K Nearest Neighbors algorithm. knn_algorithm = KNNWithMeans(sim_options=similarity_options) # Create a training dataset. training_data = data.build_full_trainset() # Train the algorithm. knn_algorithm.fit(training_data) # Process a prediction for an unknown user item rating. prediction = knn_algorithm.predict(prediction_user, prediction_item) predicted_rating = prediction.est predicted_rating_rounded = round(predicted_rating, 0) print("Predicted Rating:") print(predicted_rating) print("Predicted Rating Rounded:") print(predicted_rating_rounded) The Results are shown below: Computing the cosine similarity matrix Done computing similarity matrix. Predicted Rating: 1.85 Predicted Rating Rounded: 2.0