Recommender Systems: An Overview of the State-of-the-art

In the context of the TWIRL project, we performed a study on the current state-of-the-art in recommender systems.

Simply stated, a recommender system is an algorithm that tries to predict the preference that a user would give to an item (such as a book, a movie, or even a person) he did not yet consider. In this way such systems try to overcome the problem of "information overload": instead of making the user responsible for finding the relevant items in an overwhelming amount of information, the system filters the information and recommends only those items that it thinks the user would like. The use of such systems has become a common practice by a large majority of Internet content providers and retailers, of which Facebook ("People you may know"), Amazon ("Customers Who Bought This Item Also Bought") and YouTube ("Suggested videos") are most prominent examples.

Predicting user preferences is of course a complex challenge, and is generally approached in two ways:

  • Content-based recommenders recommend items similar to the ones a user preferred in the past, based on information about and specific characteristics of these items' content. A movie recommender, for example, compares a movie profile consisting of characteristics such as specific actors, directors, genres, and subject matter, to a user profile constructed by deriving the commonalities among all the different movies a user has rated highly in the past, in terms of the same characteristics.
  • Collaborative filtering recommenders recommend items to a user that users with similar tastes and preferences liked in the past. They derive user similarity from information about the users' past behavior and activities, such as buying or rating an item. A movie recommender system recommends movies to a user by finding users that have similar tastes in movies based on the fact they rate the same movies similarly. The movies that are most liked by these "similar" users and have not been seen yet by the target user are recommended to that target user.

Both approaches suffer from challenges, such as the cold start problem, that occurs when new users or new items are added in the system about which no information or no ratings are available, overspecialization, where the user is recommended items that are too similar to those he previously rated, or scalability, where dealing with a large amount of users or items requires significant computational resources.

Our report on the state-of-the-art does not present an exhaustive overview of all technologies that have been presented in the literature. Rather, we present a general introduction to the topic and discuss major emerging challenges. We refer to a number of qualitative surveys for a more in-depth look. Feel free to contact Tom Tourwé or Elena Tsiporkova in order to know more.