字幕表 動画を再生する 英語字幕をプリント Let's turn to the task of actually implementing collaborative filtering and look at the complexity of the implementation. The most expensive step in implementing collaborative filtering is the, the step of finding k most similar users. Or the k most similar items. The, the neighborhood of an item or a user. Remember what we had to do to find the k most similar item was to take the item and compute its similarity with respect to every other item. And this required us to compute, for example, a cosine or a centered cosine distance, between the item of interest and every other item. If we actually did this at every step with how to actually look at every entry in the utility matrix. And so the, the complexity of doing that turns out to be the size of the utility matrix U. Now, since a utility matrix can be very, very large because there are many, many users, millions of users and millions of items. Clearly impractical to do this at one time. So what we really have to do is pre compute. Pre compute for every item with set up similar items. Or for every user a similar set of users. But doing naive pre computations can take a really long time. Supposed to have n n items and we're doing item to item collaborative filtering. For each item, we'd have to compute its similarity to the other item. And so the complexity of doing that is a product of the number of the items and the size of the utility matrix. [INAUDIBLE] Now, since a utility matrix can be very large, and the number of items can be very large. The product of these two is clearly a very large number. And so even a naive pre-computation, can, can be too expensive to do, in practice. So how do you deal with this? Now. In previous lectures, we looked at a way of doing this, and that technique, those are the techniques for near-neighbor search in high dimensions. For example, locality of sensitive, locality-sensitive hashing. We can use something like LSH, to, to take an item, and quickly find, a set of, near neighbors to that item, or take a user, and find a set of near neighbors to that user. And do that in advance. In practice, you might do something like this you know, once in a day for, for example or once every few hours, we can also use clustering to to group users and items into, into smaller clusters and thereby speed up the process by restricting the search, to within a cluster, as opposed to the entire setup items as users. And finally, we could use a set of techniques called dimensionality reduction methods which will cover in an upcoming lecture. What are the advantages and disadvantages of collaborative filtering? The biggest advantage of collaborative filtering is that it works for any kind of item. It doesn't matter whether the item is books or music or videos or news articles or people. Collaborative filtering works on any kind of item without requiring and feature selection which is actually the biggest advantage of collaborative filtering because it turns out to be tough problem to find the right sort of features so something as complicated as movie. Or a piece of news and this it sort of also explains the huge popularity of collaborative filtering. Because it sort of obviates this need for feature selection for complex things such as images and movies and music and so on. That said, collaborative filtering has a number of disadvantages. The first of these, is cold start. For Collaborative Filtering to work you need enough users in the system, who have rated enough items. Remember we need to, given an item you're to find a set of other similar items. Or given a user, you're to find the set of other similar users. But if there are not enough users in the system, it's hard to find a match. The second problem is, is one of sparsity. Even when we do have enough users in the. In the in the system. Because there are so many items. Typically the millions of users and millions and millions of items. Most users have not taken most items. And therefore the user ratings matrix is very, very sparse indeed. And it's hard to find, you know, a user, better users have actually rated the same set of items, right? So remember, supposed they wanted to predict the, rating for user, x and an item i, I need to find other users who also rated item i. Really hard to find such set of users because of sparsity. The next problem, is the first rater problem. Suppose we add a new item to the catalog, right? we, you cannot make recommend this new item to anybody because the new item doesn't have any ratings. And Esoteric item, for instance, that are actually you know, there may be a few people who really love the item, but the number of people who like the item is really small. Think you have a smaller number of ratings, that's how to make recommendations for those items, too. And the final problem of collaborative filtering is the popularity bias. Let's take a really popular movie or, or book like Harry Potter. Now, lots of people tend to give high ratings to such a popular, popular book or movie. And Collaborative Filtering will therefore tend to recommend a popular book or movie. to, to almost everyone in the system. Regardless of whether they'd actually like it or not. Just because the neighborhood of any item will include popular items. At Amazon we used to call this problem the Harry Potter effect. And the Harry Potter effect need not be a bad thing. Recommending popular items generally works out well. But it's you know, the problem is if every item that's recommended is a popular item the popular items can completely cloud out the unique recommendations that can be made for a specific user. And you are to take special steps to avoid the popularity bias from washing out the specific recommendations. Now that you've seen some of the difficulties with collaborative filtering, we can design hybrid methods to work on those difficulties. For example, we can add content based methods to collaborative filtering. We can add item profiles to deal with the new item problem and make recommendations of new items to users. Or we might you know, take new users and use demographic information about new users to build synthetic profiles for them and that deal with the new user problem. Another approach is to implement two or more different recommender systems and combine their predictions perhaps using the linear model. For example, you could use a global baseline recommender and combine that with collaborative filtering. And let's see how to do this. So here's a problem. We take the estimate of Joe's rating for the movie The Sixth Sense. The problem is that Joe has not rated any movies similar to The Sixth Sense according to our measure of similarity. And so, using, the collaborative filtering formula that saw in the previous lights, we can make no prediction for Joe's rating for the movie, The Sixth Sense. This is a problem that arises from the sparsity of the rating matrix. So here's an approach called the Global Baseline approach, and it's very, very simple. Suppose in our, in our system, the mean movie rating is 3.7 stars. 3.7 is the average rating for, across all movies and all users. And we observe that The Sixth Sense, the ra, the average rating for Sixth Sense is 0.5 stars above the mean movie rating. People like The Sixth Sense, on average data, you know, more than the average movie. You also notice that Joe rates 0.2 stars below the average rating for, for other users. Right? This is not, remember this is not Joe's rating for The Sixth Sense or for any specific movie. But if we take the average of Joe's rating, Joe Turns will be a tough rater. And his average rating is 3.5 stars as opposed to the mean rating of 3.7 stars. And so Joe's rating i, i, Joe's rating in general are 0.2 stars below average. Now we can use these, three numbers to come up with the baseline estimate for, the user Joe and the movie Sixth Sense. We started a mean rating with just 3.7. Notice that The Sixth Sense is 0.5 stars above average. And then we subtract the 0.2 because Joe's rating is on average 0.2 stars below the average rating. And you predict that Joe will give the Sixth Sense four stars. Notice that in this case we have not used any specific information about the movies Joe has rated. We don't need Joe to have rated any movies similar to The Sixth Sense in order to make this prediction. All we need is to have enough users who have actually rated The Sixth Sense. And so, so that we can actually compute an average rating for for the you know, for The Sixth Sense. Now what we'd like to do is actually combine this global baseline rating with collaborative filtering. So let, let's look at an example. So the Global Baseline estimated that Joe will give The Sixth Sense 4 stars. Now suppose we use a collaborative filtering on nearest neighbour approach and we actually find out that Joe didn't like the movie Signs and Signs actually happens to be a movie that's is very similar to The Sixth Sense, exactly by the same director. And so and lets say the similarity between Signs and Sixth Sense is is one point of we find out Joe didn't like Signs and in fact Joe rated Signs one star below his average rating for all movies. So now we can combine the Global Baseline estimate and the collaborative filtering refinement and come up with the final estimate. So, since the Global Baseline estimates that Joe will rate The Sixth Sense four stars whereas the collaborative filtering gives it a negative one below his average rating. So we can, we can just add those two ratings. And predict that Joe will grade The Sixth Sense three stars. So notice that this approach actually takes a linear combination of, two independent classifiers, a global baseline classifier, and a collaborative filtering classifier, and takes the sum of those, you know, takes the sum of those predictions. And if you wanted we could rated these predictions in different ways as well. In this case, we rated both of those equally. So finally, here is the the formula that we can use to implement the, the combination in the global base line and collaborative filtering. Let's define SIJ to be the similarity of items I and J. And given and item I, we're going to find it's K nearest neighbors and we'll only going to find the K nearest neighbors who have also been rated by user X. And our goal is to estimate the rating for user X, and item i. And, about here, and, given the, the simple formula that we had, which was just a, a weighted average dating, for all the items in the neighborhood N i x. Now we're going to add in the global baseline idea. And here's what the new formula looks like. We're going to take estimate the rating r x i to be the sum of a baseline rating and the collaborative filtering refinement. b b x i here is the baseline estimate. And the baseline estimate for user x and item i is itself the sum of three components. mu, which is the overall mean movie rating in the system. b x, which is the rating deviation of user x, which is just the average rating that user x gives across all movies that he has rated, minus mu. And b i is similarly the rating deviation of movie i. And so if you add those three up, you get bxi which is a baseline rating for user x and item i. And then we're going to add in the collaborative filtering piece, which is the same as the formula we had before, with the weighted average. Rated by similarity of item i and item j in the neighborhood. however, and from using rxj, which is the rating of user x for item J. We're just going to subtract out the baseline piece and when you look at the the, the, the deviation of the rating for the, for the item from the baseline. Since you've already added the baseline piece we don't want to double count it. In the second piece there as well. So we subtracted out the the, the, the baseline ratings from the collaborative filtering piece. And this gives us the final formula that combines collaborative filtering and the baseline approach.