Implementation and evaluation of a collaborative-filtering recommender system with Spark using the MovieLens dataset, with comparison to single-machine implementation and application of approximate nearest neighbor method for accelerated search at query time.

Report Repository

Abstract

We build and evaluate a collaborative-filtering based recommender system with Spark alternating least squares (ALS) implementation using the MovieLens dataset. We compare the Spark’s parallel ALS model to LensKit single-machine implementation in terms of efficiency and model performance. Finally, we implement accelerated search at query time using Annoy spatial data structure, and compare this fast search method to the brute force approach in terms of query efficiency and quality of recommendation.

Authors

  • Luigi Noto
  • Giacomo Bugli
  • Guilherme Albertini