 ## 字幕表 動画を再生する

• So we are starting with a new topic.

• The topic we will discuss today is, is called dimensionality reduction.

• And the idea here is basically that we will learn about techniques that will

• later become very handy when we will talk about recommender systems, and

• in particular latent factor recommender systems.

• So let me give you an idea of what the problem of dimensionality reduction

• So basically our assumption is that we have a set of data points.

• Think of them as points in a plane or points in a three-dimensional space.

• And the idea is that these points are not just randomly scattered through the space,

• but they, they li, lie in a subspace of it.

• So for example, here, here I have two cases of this.

• You could imagine that you have a set of data in a two-dimensional plane, but

• the data is not only kind of randomly scattered through this plane,

• but it it it is only scattered across a small subspace of it.

• So for example in the first case, we have our we have the data points that are that

• are embedded on this particular line so maybe a more better representation of

• this data is not in this two-dimensional space but it's basically just

• where where in the length of the line is, is a given data point.

• Or, for example, in the second case, we have we, we are drawing a case where we

• have points embedded in a three-dimensional space, but again, these

• point, points are not randomly scattered through space, but basically, they are,

• they are, they all lie on this single plane that is embedded in this space.

• So basically the idea for axes can we go and discover such data in presentation.

• So if I give you another clear set of data can we go identify what are the main

• axes along original data is represented or embedded.

• So in particular, in this second case,

• we have these 2 are an axis where all the data lies.

• So our goal in some sense will be that we want to find a sub space

• that effectively represents all the data in that we are given.

• So, let me just give you a complete example, right.

• So, our goal, in a sense, would be that we want to compress or

• reduce the dimensionality or the size of the data representation.

• So the way we can think of this is that we are given a big table with a, large number

• of rows, let's say millions of rows, and also a large number of, of columns.

• And what we can think of, of this,

• of this kind of table is that every row represents a different data point.

• And every column represents a different coordinate or a dif, different dimension.

• And our goal is that we take this set of data and

• identify kind of more compact or fewer dimensional representations.

• So in a sense, we would like to keep all the rows.

• But we would like to shrink the number of columns.

• While, while stoll, still preserve the richness of a da, of the data set.

• So, for example, let's look at the the table that I have here.

• I have, for example, a table where every row is a different customer and

• every column is a different time of the day, where every entry stores how many.

• But how many of particular transactions or

• particular products need a particular customer to buy.

• And for example what we see in this particular case is that even though we

• have five different days so five different columns,

• our data is not really in some sense five dimensional but it's only two dimensional.

• What do I mean by this is that for example all the first four rows and

• the first three columns, they're basically all multiplications of one another, right?

• So since I have a set of customers that all buy products on

• the first in the first three columns and they do nothing on the last.

• Two and then I have another set of let's say, customers.

• That they will make transactions over the weekends.

• And they don't do anything over the week.

• Right? So, in some sense,

• rather than representing every customer now with the with a set of five values.

• I can, I can simply represent this data with a.

• With a set of.

• Two two coordinate vectors, plus a value of which,

• in some sense, which dimension or which cluster it belongs to, right?

• So for example, this matrix that I showed you is really two dimensional, where every

• row is simply a multiplication of one of the, one of the two vectors of 1s and 0s.

• So basically the idea for

• us will be can we identify this kind of low low level of representation of data.

• So let me explain a concept that will be very important for

• So we are thinking that our data comes in the form of a matrix right.

• So we can think of matrix basically as every line giving us,

• giving us coordinates of a point in some d-dimensional space.

• So we have our data point, we have some number of data points, and we have some

• number of columns which is corresponds to the dimensionality of the data.

• And now the question is, what is the real intrinsic dimensionality to that data set?

• And the concept we need to explain is the concept of a rank of a matrix.

• And we will say that the rank of a matrix A is simply the number of

• linearly independent columns of A.

• So let me give you an example.

• So for, for in, in here is an example.

• You can see that the matrix A that has three rows and three columns.

• And the rank of this matrix equals 2.

• Why's the rank of this matrix equal to 2?

• Is because it has 2 linear, linearly independent rows in this case.

• What do we notice for example is that, I can,

• that the row number 3 is simply the sum of rows one and two.

• So the, the third row of this,

• of this matrix can be represented as a linear combination of rows one and two.

• So in this case our matrix is really two dimensional.

• Even I have a, I have data, in three dimensions.

• I have three columns, this matrix is really two dimensional.

• I can basically think that there are really like two basis vectors or

• two coordinate vectors in my in my space first one corresponds to

• the first row second one corresponds to the second row and then what I can do

• now is I can represent every data point as a linear combination of these two vectors.

• So for example,

• the first row can simply be represented as a vector of one and zero.

• Which means that I only take the,

• the first, the first vector and I take zero of the second vector.

• For example the, the second row of my matrix say,

• can be represented now as a vector of one ze, zero one because I'm only taking the.

• The second of my two basis vectors.

• And for example the last row which is a sum of the rows one and

• two can be simply represented as with a vector one one.

• So why is this intuition interesting.

• This intuition is important because I could think of

• now data as being some points in high dimensional space.

• I can think of the data being represented as a matrix where, as I mentioned before,

• every data point is a row in this matrix, and every column is a separate dimension.

• And what I can do now, I can think of this as doing dimensionality reduction, right?

• So for example, if I'm given the matrix, on the top, I can basically take and

• rewrite this, the coordinates of these points.

• Instead of using three coordinates, using only two coordinates, right?

• So if I use my original coordinate space, where basically I have axis aligned.

• Vectors that describe coordinates of my space.

• So I have a one and then two zeros, and a zero one zero, and zero zero one.

• So this is x, y, and z coordinate.

• Then every, in this coordinate system,

• every data point simply corresponds to the, to the, to the row of my matrix.

• But, what I can also do is I can come and invent a new coordinate system.

• Imagine I invent the second one, where I only have two, two vectors.

• So basically, I want to represent every data point.

• With two coordinates and every what is mean this means that I want to

• represent every data point as linear combination of the, of the two vectors.

• And as I mentioned before now in this new coordinate space I

• can represent the coordinates of every point using only, only two values, right?

• And I can still reconstruct the or, the original coordinate values.

• So what does this mean is in some sense that we, we, reduce the dimensionality or

• we compressed the date in a sense that now I need a fewer num,

• number of coordinates to describe the location of every point right and

• this is what the the role of dimensionality deduction is.

• So, really the way we can think of dimensionality deduction is that we have

• a set of data points embedded in some some high dimensional space as in this case I

• have two dimensional space but clearly the data is in high dimensions but only spends

• a small dimensional spart, part of it so as in this case, I have a set of points.

• That are, that I, that I'm given in, in two-dimensional space but in reality

• these points simply fall on a line and i would like to discover that these points

• are imbedded in a small, small subspace and I would like to present now or

• compress the dimensionality of every point to this small coordinate subspace.

• And what is important here for example in this particular case is that.

• I can now think of representing the coordinates of every point,

• using kind of two dimensions.

• I can represent that position along the, the, the red line.

• And I can represent it with the coordinate that tells me how far away

• from the red line is a given data point.

• And what is interesting now that's,

• is that I can say that instead of representing, still using two coordinates.

• I can could only represent using one coordinate.

• So meaning, I would forget about how far from the red line a point and I would

• only care about the location on the red line where the point can be projected.

• And this way I would be able to represent every point with a single,

• with a single number, basically the position of it along the red line and

• I would incur a bit of an error.

• Right so what we will be doing is we will be in some sense trying to use a,

• a smaller representation of our data as possible.

• So as few columns as possible while also including as little error as

• possible right so,

• what will what will the game we will be playing is between having a smaller data

• representation while also trying to incur as little error as possible.

• So the way we will do this and why we would want to do this is, is the following

• right why would I want to dis, discuss do the dimensionality reduction.

• So the first thing is I would want to for

• example discover hidden correlations in my data.

• And sometimes I would like to discovered really the,

• the latent dimensions along the which d, along which the data varies.

• So this is particularly useful if I think of my da, data as, as my points.

• I think of them as documents.

• Right so I can take every document, represent it as a very long vector,

• where this vector has only values zero and one where zero means a given word.

• You know, the Kth word does not appear in the document, and

• one means the word appear, appears in the document.

• And my goal, for example, would be to identify what are the axes along.

• Which these, the documents, are spread in this,

• all possible words kind of space and what we would find out is that here,

• documents are basically align themselves along different axes that

• correspond to topics like, like sports, politics, technology and so on.

• Another in, useful thing that we would want to do is for example many times we

• can take a large data set and represent it as a much smaller data set.

• In some sense that basically we,

• we are able to remove or get rid of noisy features so, or noisy columns because.

• There our data is not wearing too much, too much.

• So we can kind of get rid of, of that part of the data while still preserving more,

• most, most, most of it.

• So this is the idea in some sense to do remove, to remove noise from

• the data to remove noise and redundant features or noise and redundant columns.

• Another way why we, we may want to do this.

• Is that we want to, for example, be able to interpret or visualize data.

• What this means is that we can have very high dimensional data and

• we can reduce the dimensionality of it, maybe just to two or three dimensions.

• And plotting two or three dimensions is very easy, right?

• We can kind of plot it on the screen.

• So, that's another case.

• And, of course, one important application is that, many ties, times, we want to

• reduce dimensionality of the data so that kind of the data size also shrinks,

• which means it's easier to store, process and analyze the data afterwards, right?

• So these are all the reasons why I would want to, in some sense, find as low or

• dimens, dimension of representation of a given set of data.

So we are starting with a new topic.

B1 中級

# 5 6 次元削減の導入 12 01 (5 6 Dimensionality Reduction Introduction 12 01)

• 3 0
HaoLang Chen に公開 2021 年 01 月 14 日