ベクトル - 知性の数学 #3 (Vectors - The Math of Intelligence #3)

字幕表動画を再生する

Hello world. It's Siraj,
and what's the deal with vectors!
You're going to see this word a lot in machine learning
And it's one of the most crucial concepts to understand.
A huge part of Machine learning is
finding a way to properly represent some data sets
programmatically
Let's say you're a manager at Tesla
And you're given a data set of some measurements
for each car that was produced in the past week.
Each car on the list has three measurements
or features its length width and height
So a given car can then be represented as a point
in three-dimensional space where the values in each dimension
correlates to one of the features we are measuring.
This same logic applies to data points that have 300 features.
We can represent them in
300-dimensional space.
While this is intuitively hard for us to understand as three dimensional beings.
Machines can do this very well.
Robot: Right, what do you want t..... Mother *****
This data point X is considered a Vector.
A vector is a 1-dimensional array.
Think of it as a list of values or a row in a table.
A vector of n-
-elements is an n-Dimensional Vector
with one dimension for each element.
So for a 4-dimensional Data point.
We can use a 1-by-4 array to hold its 4 feature values
and because it represents a set of features.
We call it a feature vector.
More general than a Vector is a Matrix.
A Matrix is a rectangular array of numbers
and a Vector is a row or column of a Matrix.
So each row in a Matrix could represent a different data point
with each column being its respective features.
Less general than a vector is a Scalar
which is a single number.
The most general term for all of these concepts is a Tensor.
A Tensor is a multi-dimensional array
so a First-order Tensor is a Vector,
a Second-order Tensor is a Matrix and
Tensors of order three and higher are
called higher-order Tensors.
So if a 1-D te nsor looks like a line...
Stop.
Who are you?
I think they get it.
I think they get it.
You could represent a social graph that contains
friends of friends of friends as a higher-order Tensor.
This is why Google built a library called
TensorFlow.
It allows you to create a computational graph
where Tensors created from data sets can
flow through a series of mathematical operations that optimize for an objective.
and while they built an entirely new type of chip called a TPU or Tensor Processing Unit.
As computational power and the amount of data we have
increases we are becoming more capable of processing
multi-dimensional data.
Vectors are typically represented in a multitude of ways,
and they're used in many different fields of science,
Especially physics since vectors act as a bookkeeping tool to keep track of two pieces of information,
typically a magnitude and a direction for physical quantity.
For example in Einstein's General theory of relativity
The Curvature of space-time
which gives rise to gravity
is described by what's called
a Riemann Curvature Tensor
which is a tensor of order 4.
So badass.
So we can represent not only the fabric of reality this way.
But the gradient of our optimization problem as well.
During first order optimization,
the weights of our model are
updated incrementally after each pass over the training data set,
given an Error Function like the Sum of Squared Errors,
We can compute the magnitude and
direction of the weight update by taking a step in the opposite direction of the error gradient.
This all comes from Linear Algebra
Algebra
roughly means relationships,
and it explores the relationships between unknown numbers.
Linear Algebra roughly means line-like relationships.
It's the way of organizing information about vector spaces
that makes manipulating groups of numbers
simultaneously easy.
It defines these structures like Vectors and Matrices
to hold these numbers and introduces new rules on how to add,
multiply, subtract and divide them.
So given two arrays,
the algebraic way to multiply them would be to do it like this
and the linear algebraic way would look like this
We compute the dot product,
instead of
multiplying each number like this.
The linear algebraic approach is
three times faster in this case.
Any type of data can be represented as a vector,
images, videos, stock indices,
text, audio signals,
dougie dancing.
No matter the type of data,
it can be broken down into a set of numbers
The model is not really accepting the data.
It keeps throwing errors.
Let me see.
Oh it looks like you got to vectorize it.
What do you mean?
The model you wrote expected tensors of a certain size as its input.
So we basically got to reshape the input data.
so it's in the right vector space
and then once it is
we can compute things like the Cosine distance between Data points and
the Vector norm.
Is there a Python library to do that?
You gotta love NumPy
Vectorization is essentially just a matrix operation,
and I can do it in a single line
Awesome.
Well you vectorize it up.
I've gotta back-propagate out for today.
Cool, where to?
Tinder date
All right, yeah. See ya ...
A researcher named McCullough
used the machine learning model called a neural network
to create vectors for words
WORD2VEC.
Given some input corpus of text,
like thousands of news articles,
it would try to predict the next word
in a sentence given the words around it.
So a given word is
encoded into a vector.
The model then uses that vector to try and predict the next word
if it's prediction doesn't match the actual next word,
the components of this vector are adjusted.
Each words context in the corpus
acts as a teacher,
sending error signals back to adjust the vector.
The vectors of words that are judged
similarly by their context are iteratively nudged closer together
by adjusting the numbers in the vector and so after training the model learns
thousands of Vectors for words.
Give it a new word
and it will find its associated word vector
also called word embedding.
Vectors don't just represent data.
They help represent our models too.
Many types of machine learning models represent their Learnings as vectors.
All types of Neural networks do this.
Given some data it will learn Dense
Representations of that data
These representations are essentially
categories akin to if you have a data set of different colored eye pictures.
It will learn a general
representation for all eye colors.
So given a new unlabeled eye picture,
it would be able to recognize it as an eye
I see vectors
Good.
Once data is vectorized,
we can do so many things with it
A trained Word2Vec model turns words into vectors,
then we can perform mathematical
operations on these vectors.
We can see how closely related words are
by computing the distance between their vectors.
The word Sweden, for example,
is closely related to other wealthy Northern European countries
Because the distance between them
is small when plotted on a graph
Word vectors that are similar tend to
cluster together like types of animals.
Associations can be built like Rome is to Italy as
Beijing is to China
and operations like performing hotels plus motel gives us Holiday Inn
Incredibly, vectorizing words is able to
capture their semantic meanings numerically.
The way we're able to compute the distance
between two vectors
is by using the notion of a vector Norm
A norm is any function G that maps Vectors to real numbers
that satisfies the following conditions
The lengths are always positive.
The length of zero implies zero.
Scalar multiplication
extends lengths in a predictable way
and distances add
reasonably
so in a basic vector space the norm of a vector
would be its absolute value
and the distance between two numbers like this.
Usually the length of a vector is
calculated using the Euclidean norm
which is defined like so
but this isn't the only way to define length.
There are others.
You'll see the terms L1 norm
and L2 norm used a lot in machine learning.
The L2 norm is the Euclidean norm.
The L1 norm is also called the Manhattan distance.
We can use either to normalize a vector to get its unit vector
and use that to compute the distance.
Computing the distance between vectors is useful
for showing users recommendations.
Both of these terms are also used in the process of regularization.
We train models to fit a set of training data
But sometimes the model gets so fit to the training data
that it doesn't have good prediction performance.
It can't generalize well to new data points.
To prevent this overfitting,
we have to regularize our model.
The common method to finding the best
model is by defining a Loss function that describes
how well the model fits the data.
To sum things up, feature vectors are used to represent numeric
or symbolic characteristics of data called features
in a mathematical way.
They can be represented in
multi-dimensional vector spaces where we can perform
operations on them
like computing their distance and adding them
and we can do this by computing the vector norm
which describes the size of a vector.
Also useful for
preventing overfitting.
The Wizard of the week Award goes to Vishnu Kumar.
He implemented both gradient descent and
Newton's method to create a model
able to predict the amount of calories burned for cycling a certain distance
the plots are great and the
code is architected very legibly.
Check it out. Amazing work, Vishnu.
And the runner up for the last-minute
entry is Hamad Shaikh
I'd love to have details your notebook was
This week's challenge is to implement both
L1 and L2 regularization on a linear regression model
Check the Github readme in the description for details and winners will be announced in a week.