字幕表 動画を再生する 英語字幕をプリント So we are starting with a new topic. The topic we will discuss today is, is called dimensionality reduction. And the idea here is basically that we will learn about techniques that will later become very handy when we will talk about recommender systems, and in particular latent factor recommender systems. So let me give you an idea of what the problem of dimensionality reduction is all about. So basically our assumption is that we have a set of data points. Think of them as points in a plane or points in a three-dimensional space. And the idea is that these points are not just randomly scattered through the space, but they, they li, lie in a subspace of it. So for example, here, here I have two cases of this. You could imagine that you have a set of data in a two-dimensional plane, but the data is not only kind of randomly scattered through this plane, but it it it is only scattered across a small subspace of it. So for example in the first case, we have our we have the data points that are that are embedded on this particular line so maybe a more better representation of this data is not in this two-dimensional space but it's basically just where where in the length of the line is, is a given data point. Or, for example, in the second case, we have we, we are drawing a case where we have points embedded in a three-dimensional space, but again, these point, points are not randomly scattered through space, but basically, they are, they are, they all lie on this single plane that is embedded in this space. So basically the idea for axes can we go and discover such data in presentation. So if I give you another clear set of data can we go identify what are the main axes along original data is represented or embedded. So in particular, in this second case, we have these 2 are an axis where all the data lies. So our goal in some sense will be that we want to find a sub space that effectively represents all the data in that we are given. So, let me just give you a complete example, right. So, our goal, in a sense, would be that we want to compress or reduce the dimensionality or the size of the data representation. So the way we can think of this is that we are given a big table with a, large number of rows, let's say millions of rows, and also a large number of, of columns. And what we can think of, of this, of this kind of table is that every row represents a different data point. And every column represents a different coordinate or a dif, different dimension. And our goal is that we take this set of data and identify kind of more compact or fewer dimensional representations. So in a sense, we would like to keep all the rows. But we would like to shrink the number of columns. While, while stoll, still preserve the richness of a da, of the data set. So, for example, let's look at the the table that I have here. I have, for example, a table where every row is a different customer and every column is a different time of the day, where every entry stores how many. But how many of particular transactions or particular products need a particular customer to buy. And for example what we see in this particular case is that even though we have five different days so five different columns, our data is not really in some sense five dimensional but it's only two dimensional. What do I mean by this is that for example all the first four rows and the first three columns, they're basically all multiplications of one another, right? So since I have a set of customers that all buy products on the first in the first three columns and they do nothing on the last. Two and then I have another set of let's say, customers. That they will make transactions over the weekends. And they don't do anything over the week. Right? So, in some sense, rather than representing every customer now with the with a set of five values. I can, I can simply represent this data with a. With a set of. Two two coordinate vectors, plus a value of which, in some sense, which dimension or which cluster it belongs to, right? So for example, this matrix that I showed you is really two dimensional, where every row is simply a multiplication of one of the, one of the two vectors of 1s and 0s. So basically the idea for us will be can we identify this kind of low low level of representation of data. So let me explain a concept that will be very important for us to think about this, right? So we are thinking that our data comes in the form of a matrix right. So we can think of matrix basically as every line giving us, giving us coordinates of a point in some d-dimensional space. So we have our data point, we have some number of data points, and we have some number of columns which is corresponds to the dimensionality of the data. And now the question is, what is the real intrinsic dimensionality to that data set? And the concept we need to explain is the concept of a rank of a matrix. And we will say that the rank of a matrix A is simply the number of linearly independent columns of A. So let me give you an example. So for, for in, in here is an example. You can see that the matrix A that has three rows and three columns. And the rank of this matrix equals 2. Why's the rank of this matrix equal to 2? Is because it has 2 linear, linearly independent rows in this case. What do we notice for example is that, I can, that the row number 3 is simply the sum of rows one and two. So the, the third row of this, of this matrix can be represented as a linear combination of rows one and two. So in this case our matrix is really two dimensional. Even I have a, I have data, in three dimensions. I have three columns, this matrix is really two dimensional. So how can we think about this is the following? I can basically think that there are really like two basis vectors or two coordinate vectors in my in my space first one corresponds to the first row second one corresponds to the second row and then what I can do now is I can represent every data point as a linear combination of these two vectors. So for example, the first row can simply be represented as a vector of one and zero. Which means that I only take the, the first, the first vector and I take zero of the second vector. For example the, the second row of my matrix say, can be represented now as a vector of one ze, zero one because I'm only taking the. The second of my two basis vectors. And for example the last row which is a sum of the rows one and two can be simply represented as with a vector one one. So why is this intuition interesting. This intuition is important because I could think of now data as being some points in high dimensional space. I can think of the data being represented as a matrix where, as I mentioned before, every data point is a row in this matrix, and every column is a separate dimension. And what I can do now, I can think of this as doing dimensionality reduction, right? So for example, if I'm given the matrix, on the top, I can basically take and rewrite this, the coordinates of these points. Instead of using three coordinates, using only two coordinates, right? So if I use my original coordinate space, where basically I have axis aligned. Vectors that describe coordinates of my space. So I have a one and then two zeros, and a zero one zero, and zero zero one. So this is x, y, and z coordinate. Then every, in this coordinate system, every data point simply corresponds to the, to the, to the row of my matrix. But, what I can also do is I can come and invent a new coordinate system. Imagine I invent the second one, where I only have two, two vectors. So basically, I want to represent every data point. With two coordinates and every what is mean this means that I want to represent every data point as linear combination of the, of the two vectors. And as I mentioned before now in this new coordinate space I can represent the coordinates of every point using only, only two values, right? And I can still reconstruct the or, the original coordinate values. So what does this mean is in some sense that we, we, reduce the dimensionality or we compressed the date in a sense that now I need a fewer num, number of coordinates to describe the location of every point right and this is what the the role of dimensionality deduction is. So, really the way we can think of dimensionality deduction is that we have a set of data points embedded in some some high dimensional space as in this case I have two dimensional space but clearly the data is in high dimensions but only spends a small dimensional spart, part of it so as in this case, I have a set of points. That are, that I, that I'm given in, in two-dimensional space but in reality these points simply fall on a line and i would like to discover that these points are imbedded in a small, small subspace and I would like to present now or compress the dimensionality of every point to this small coordinate subspace. And what is important here for example in this particular case is that. I can now think of representing the coordinates of every point, using kind of two dimensions. I can represent that position along the, the, the red line. And I can represent it with the coordinate that tells me how far away from the red line is a given data point. And what is interesting now that's, is that I can say that instead of representing, still using two coordinates. I can could only represent using one coordinate. So meaning, I would forget about how far from the red line a point and I would only care about the location on the red line where the point can be projected. And this way I would be able to represent every point with a single, with a single number, basically the position of it along the red line and I would incur a bit of an error. Right so what we will be doing is we will be in some sense trying to use a, a smaller representation of our data as possible. So as few columns as possible while also including as little error as possible right so, what will what will the game we will be playing is between having a smaller data representation while also trying to incur as little error as possible. So the way we will do this and why we would want to do this is, is the following right why would I want to dis, discuss do the dimensionality reduction. So the first thing is I would want to for example discover hidden correlations in my data. And sometimes I would like to discovered really the, the latent dimensions along the which d, along which the data varies. So this is particularly useful if I think of my da, data as, as my points. I think of them as documents. Right so I can take every document, represent it as a very long vector, where this vector has only values zero and one where zero means a given word. You know, the Kth word does not appear in the document, and one means the word appear, appears in the document. And my goal, for example, would be to identify what are the axes along. Which these, the documents, are spread in this, all possible words kind of space and what we would find out is that here, documents are basically align themselves along different axes that correspond to topics like, like sports, politics, technology and so on. Another in, useful thing that we would want to do is for example many times we can take a large data set and represent it as a much smaller data set. In some sense that basically we, we are able to remove or get rid of noisy features so, or noisy columns because. There our data is not wearing too much, too much. So we can kind of get rid of, of that part of the data while still preserving more, most, most, most of it. So this is the idea in some sense to do remove, to remove noise from the data to remove noise and redundant features or noise and redundant columns. Another way why we, we may want to do this. Is that we want to, for example, be able to interpret or visualize data. What this means is that we can have very high dimensional data and we can reduce the dimensionality of it, maybe just to two or three dimensions. And plotting two or three dimensions is very easy, right? We can kind of plot it on the screen. So, that's another case. And, of course, one important application is that, many ties, times, we want to reduce dimensionality of the data so that kind of the data size also shrinks, which means it's easier to store, process and analyze the data afterwards, right? So these are all the reasons why I would want to, in some sense, find as low or dimens, dimension of representation of a given set of data.