上級 82 タグ追加 保存
動画の字幕をクリックしてすぐ単語の意味を調べられます!
単語帳読み込み中…
字幕の修正報告
Welcome, everybody.
It's a great pleasure to welcome you to our CC Mei Distinguished
seminar series.
This is a series that is sponsored
by the Department of Civil and Environmental Engineering
and the CC Mei Fund, and this is our first Distinguished seminar
of the term.
It's a great pleasure to see it's a full house.
Hopefully for the people that will be late,
they will still find some seats.
And so for today's inauguration talk of the term,
we will be hearing from Professor George Sugihara,
and George Sugihara is a Professor
of Biological Oceanography at the Physical Oceanography
Research Division, Scripps Institute of Oceanography
at UC San Diego.
I'm co-hosting Professor George Sugihara with Professor Serguei
Saavedra here in CEE.
So professor Sugihara is a data-driven theoretician
whose work focuses on developing minimalist inductive theory,
extracting information from observational data
with minimal assumptions.
He has worked across many scientific domains,
including ecology, finance, climate science, medicine,
and fisheries.
He's most known for topological models in ecology,
empirical dynamic forecasting models, research
and genetic early warning signs of critical transitions,
methods of distinguishing correlation
from causal interaction time series,
and has championed the idea that causation
can occur without correlation.
He provided one of the earliest field demonstrations of chaos
in ecology and biology.
Professor Sugihara is the inaugural holder
of the McQuown Chair of Natural Science at the Scripps
Institute of Oceanography at UCSD.
He has won many other awards and recognitions,
including being member of National Academies
Board on Mathematical Sciences and their applications
for a few years.
And today, he will discuss understanding nature
holistically and without equations.
And that's extremely intriguing for all of us.
And so without further ado, please join me
in welcoming Professor Sugihara.
[APPLAUSE]
This is in my presenter notes, so I'm reading it
off of the screen here.
I want to make a disclaimer, however.
In the abstract, it says that these ideas are intuitive.
Are you good?
Are we good?
OK.
So the abstract says that the ideas that I'm going to present
are intuitive, but this is not entirely true.
In fact, for whatever reason, at one point,
the playwright Tom Stoppard approached me,
and he said that he was interested in writing something
about these ideas and wondered if it
would be possible to explain these to a theater audience.
And just read the dark black there.
His response was that if he tried to explain it
to a theater audience, they'd probably
be in the lobby drinking before he
got through the first sentence.
So the ideas are in fact decidedly counter-intuitively.
And this is a fact that in a sense
goes against how we usually try to understand things.
So I'll explain what that means in a second.
So we're all familiar with Berkeley's famous dictum,
but despite this warning, correlation
is very much at the core of Western science.
Untangling networks of cause and effect
is really how we try to understand nature.
It's essentially what the business of science
is all about.
And for the most part and very much
despite Berkeley's warning, correlation
is very much at the core of how we try to get a grasp on this.
It's an unspoken rule, in fact, that within science
and with how we normally operate,
it's a correlation is a reasonable thing to do.
It's innocent until it's proven guilty.
Thus, distinguishing this intuitive correlation
from the somewhat counter-intuitive causation
is at the crux, and it's the topic of this talk today.
So I'm going to develop a discussion
for making this distinction that hinges on two main elements.
First, the fact that the nature is
dynamic in the temporal sequence matters.
Meaning that nature is better understood as a movie than as
snapshots, OK?
And secondly is the fact that nature is nonlinear,
that it consists of interdependent parts that
are basically non-separable, that context really matters.
That nature can't be understood as independent pieces
but rather each piece needs to be studied
in the context surrounding it.
So let's start with a nice, simple example.
All right.
Consider these two time series.
One might be a species, or these might
be two species interacting, or one
might be an environmental driver and responding species,
or a driver and a physiological response,
or money supply and interest rates, something like that.
So if you look at 10 years of data,
you say your first hypothesis is that these things are
positively correlated.
You have this kind of working model for what's going on.
If you roll forward another dozen years,
you find your hypothesis holds, but then it
falls apart a little bit here and in the middle,
right in here.
And then it sort of flips back on here towards the end.
So out of 18 years of observations, actually
more like 22 years of observations,
we find that our hypothesis that these things are correlated
is a pretty good one.
If this was an ecology pattern, if this
was a pattern from ecology, we'd say that this
is a really good hypothesis.
So we might make an adaptive caveat here, kind of an excuse
for what happened when it became uncorrelated, but more or less,
this looks like a pretty good hypothesis.
This is, however, what we see if we roll forward
another couple of decades.
In fact, for very long periods of time,
these two variables are uncorrelated.
They're totally unrelated.
However, they appear from a statistical sense
to be unrelated, but they were actually
generated from a coupled two-species difference
equation.
So this is a simple example of nonlinear dynamics.
We see to two things can appear to be coupled
for short periods of time, uncoupled,
but for very long periods of time,
there's absolutely no correlation.
So not only does correlation not imply causation,
but with simple nonlinear dynamics, lack of correlation
does not imply lack of causation.
That's actually something that I think is fairly important.
In retrospect, what I just showed you,
you might think this is obvious, but apparently this
is not well known, and it contradicts a currently held
view that correlation is a necessary condition
for causation.
So this was Edward Tufte who said
that empirically observed variation is
a necessary condition for causation.
OK.
So the activity of correlation, I think,
reflects the physiology of how we learn.
And one can argue that it's almost wired
into our cognitive apparatus.
So the basic notion beyond Hebbian learning
is that cells that fire together wire together.
So the mechanism of how we learn is really
very sort of supportive of the whole notion of correlation.
So I think it's very fundamental to how we perceive things
as human beings.
OK.
The picture that emerges is not only
that correlation does not necessarily imply causation,
but that you can have causation without correlation.
OK, and this is the realm of nonlinear systems.
This is interesting, because this is also
the realm of biological systems.
So within this realm, there's a further consequence
of non-linearity that was demonstrated in the model
example, and that's this phenomenon
of mirage correlation.
So correlations that come and go and that even change sign.
So here is a nice, simple example of mirage correlation.
This is an example not from finance but from ecology.
This is a study by John McGowan, and it was an attempt
to try to explain harmful algal blooms at Scripps,
these red tides.
So these spikes here are spikes in chlorophyll
found at Scripps Pier.
And what we see in the blue at the bottom
are sea surface temperature anomalies.
And so the idea was that the spikes in chlorophyll
were really caused by the sea surface temperature anomalies.
This is about a decade's worth of observations.
They were about to publish it, but they
were kind of slow in doing so.
And in the meantime, this correlation reversed itself.
And not only did it reverse itself,
it then became completely uncorrelated.
So I think this is a classic example
of a mirage correlation.
OK.
So here's another example from Southern California.
Using data up to 1991, there's a very significant relationship
between sea surface temperature here,
and this is a measure of sardine production,
so-called recruitment.
So this was reported in '94 and was subsequently written
into state law for managing harvest.
So if you are above 17 degrees, the harvest levels are higher.
If you're below 17 degrees, they were lower.
However, when data-- if you add it to this existing data,
data from '94 up to 2010, this is what you find.
The correlation seemed to disappear in both cases.
So these are two different ways of measuring productivity,
and the correlation disappeared in both of them.
So this statute that was written into state law
has now been suspended.
And this is where it now stands.
All right, so another famous example from fisheries
was this meta-analysis on 74 environment recruitment
correlations that were reported in the literature.
So these correlations were tested
subsequent to the publication of each original paper
by adding additional data to see if they were upheld.
And only 28 out of the 74 were.
And among the 28 that were upheld
was the sardine, so we know what happened there.
OK, so relationships that we thought we understood
seemed to disappear.
This sort of thing is familiar in finance
where relationships are uncovered but often disappear
even before we try to exploit them.
OK.
So how do we address this?
The approach that I'm going to present today
is based on not only your state space reconstruction, which
I refer to here with a little less technical
but I think more descriptive name,
which is empirical dynamics.
So EDM, Empirical Dynamic Modeling,
is basically a holistic data-driven approach
for studying complex systems from their attractors.
It's designed to address nonlinear issues
such as mirage correlation.
I'm now going to play a brief video that I
think is going to explain all.
This is something that my son actually made for me when
I tried to explain it to him.
And he said, no, no, no, you can do this--
it doesn't take three hours to explain this to someone.
You can do this in like two minutes
with a reasonable video.
So he made this nice video for me.
The narration is by Robert May.
[VIDEO PLAYBACK]
- This animation illustrates the Lorentz attractor.
The Lorentz is an example of a coupled dynamic system
consisting of three differential equations, where each--
[END PLAYBACK]
Oh, technical difficulties.
Sorry.
Let me start it again.
Hold on.
[VIDEO PLAYBACK]
- This animation illustrates the Lorentz attractor.
The Lorentz is an example of a coupled dynamic system
consisting of three differential equations
where each component depends on the state and dynamics
of the other two components.
Think of each component, for example, as being species--
foxes, rabbits, grasses.
And each one changes depending on the state of the other two.
So these components shown here as the axes
are actually the state variables or the Cartesian coordinates
that form the state space.
Notice that when the system is in one lobe,
X and Z are positively correlated.
And when the system is in the lobe,
X and Z are negatively correlated.
The other wing of the butterfly.
We can view a time series thus as a projection
from that manifold onto a coordinate axis of the state
space.
Here we see the projection onto axis X and the resulting time
series recording displacement of X.
This can be repeated on the other coordinate axes
to generate other simultaneous time series.
And so these time series are really
just projections of the manifold dynamics
on the coordinate axes.
Conversely, we can recreate the manifold
by projecting the individual time series back into the state
space to create the flow.
On this panel, we can see the three time series, X, Y,
and Z, each of which is really a projection
of the motion on that manifold.
And what we're doing is the opposite here.
We are taking a time series and projecting them back
into the original three-dimensional state space
to recreate the manifold.
It's a butterfly attractor.
[END PLAYBACK]
OK.
To summarize, these time series are really observations
of motion on an attractor.
Indeed, the jargon term in dynamical systems
is to call a time series an observation function.
Conversely, you can actually create attractors
by taking the appropriate time series,
plotting them in the right space,
and generating some kind of a shape.
OK, this is really the basis of this empirical dynamic
approach.
What is important, I think, to understand here
is that the attractor and the equations
are actually equivalent.
Both contain identical information,
and both represent the rules governing the relationships
among variables.
And depending on when they are viewed,
these relationships can appear to change.
And this is what can give rise to mirage correlations.
So over the short term here, there might be correlations.
But over a longer term--
so for example, if it's in this lobe--
I'm very bad with machines.
All right.
If it's in that lobe, you'll get a positive relationship.
If it's in the lobe on this side,
you'll get a negative correlation.
If you sample the system sparsely
over long periods of time, you'd find no apparent correlation
at all, OK?
OK, let's look at another real example of this.
So this is an application that I was initially skeptical about,
mainly because I couldn't see how to get time series.
But luckily, I was wrong here.
These are experimental data obtained
by Gerald Pao from the Salk Institute
on expression levels of transcription factor SWI4
and cyclin CLN3.
This is in yeast.
If you view it statistically, so this is viewed statistically,
the relationship between these two variables,
there's absolutely no statistical relationship.
There's no cross-correlation.
However, if you connect these observations in time,
they're clearly inter-related.
So we see the skeleton of an attractor emerging.
So the way that they generated this data, actually, which--
so when I was originally approached about this,
and they said, well, we want to apply these methods
to gene expression.
And I said, but you can't make a time series
for gene expression.
And they said, oh, yes, we can.
And what they did in this case, because it was yeast,
they were able to shock cells, which synchronizes them
in their cell cycle, and then sample them
every 30 minutes for two days.
And so at each sample, they would
sequence several thousands of genes
and do this every 30 minutes for two days.
You can do a lot if you have post-docs and graduate
students, all right?
OK.
So we were able to get this thing to actually reflect
an attractor.
Very interesting.
Of course, if you randomize these observations in time,
you get absolutely nothing.
You still get singularities.
So you get these crossings in two dimensions.
However, if you include the cyclin CLB2,
the crossing disappear, OK?
So we have this nice cluster of three things,
that actually if you looked at them statistically,
appear to be uncorrelated, or essentially invisible
to bioinformatics techniques that are, in fact, dynamically
interacting.
So here is another short video clip
that I think presents what I consider
to be a really important basic theorem that
supports a lot of this empirical dynamics work.
[VIDEO PLAYBACK]
- There's a very powerful theorem proven by [INAUDIBLE]..
It shows generically that one can reconstruct a shadow
version of the original manifold simply by looking at one
of its time series projections.
For example, consider the three times series shown her.
These are all copies of each other.
They are all copies of variable eggs.
Each is displaced by an amount tau.
So the top one is unlagged, the second one is lag by tau,
and the blue one at the bottom is lag by two tau.
Takens' theorem then says that we
should be able to use these three time
series as new coordinates and reconstruct
a shadow of the original butterfly manifold.
This is the reconstructed manifold produced
from lags of a single variable, and you
can see that it actually does look
very similar to the butterfly attractor.
Each point in the three-dimensional
reconstruction can be thought of as a time segment
with different points capturing different signals
of [INAUDIBLE] of variable eggs.
This method represents a one-to-one map
between the original manifold, butterfly attractor,
and the reconstruction, allowing us
to recover states of the original dynamic system
by using lags of just a single time series.
[END PLAYBACK]
OK.
So to recap, the attractor really
describes how the variables relate to each other
through time.
And Takens' theorem says quite powerfully
that any one variable contains information about the others.
This fact allows us to use a single variable basically
to construct a shadow manifold using
time lags as proxy coordinates that has
a one-to-one relationship with the original manifold.
So constructing attractors, again, from time series data
is the real basis of the empirical dynamic approach.
And as we see, we can do this univariately
by taking time lags of one variable.
We can do this multivariately with a set
of native coordinates, and we can also
make mixed embeddings that have some time lags as well as
some multivariate coordinates.
So let's look at some examples.
So this is an example of using lags with the expression time
series.
This is a mammalian model.
Mouse fibroblast production of an insulin-like growth factor
binding protein.
And again, this is the case of synchronizing and then sampling
over a number of days.
So clearly gene expression is a dynamic process, which
is quite a radical departure, I think,
from normal bioinformatics approaches,
which are essentially static
OK.
Here we have another ecological example.
These are attractors constructed for sockeye salmon returns,
and this is for the Fraser River in Canada, which is
like the iconic salmon fishery.
And you can see for each one of these different spawning lakes,
you get an attractor that looks relatively similar.
They all look like Pringle chips, basically.
And what's interesting about this--
and I'll talk about this a little bit more later--
is that you can use these attractors
that you construct from data to make very good predictions.
And the fact that you can make predictions and make
these predictions out of sample, I think,
should give you some confidence that this is reasonable.
So again, I'm talking about a kind of modeling
where there really are almost no free parameters.
There's one in this case, right?
I'm assuming that I can't adjust the fact that I'm
observing this once a year.
So that's given.
Tau is given.
The time lag is given.
The only variable that I'm using here
that I need to kind of estimate is the number
of dimensions, so the number of embedding dimensions
that we need for this.
In this case, I'm showing it in three dimensions.
Not all of these attractors, of course,
are going to be three-dimensionals.
The ones that I'll show you tend to be,
only because you can see them and they're
easy to understand what's going on.
So the basic process is really involving very few
assumptions and with only one fitted parameter,
with that fitted parameter being the embedding dimension.
OK.
So the fact that I'm able to get to using--
this is again, just using lags--
something coherent in three dimensions
means that I might be able to construct a mechanistic model
that has three variables.
So maybe sea surface temperature, river discharge,
maybe spawning, smolts going into the ocean, something
like that.
OK.
So again, one of the most compelling features, I think,
of this general set of techniques
is that it can be used to forecast.
And the fact that you could forecast
was something that originally got
me interested in this area or this set of techniques.
And it kind of led me into finance,
so I worked for like half a decade as a managing
director for Deutsche Bank.
And things like this were used to manage
on the order of $2 billion a day in notional risk.
So it's very bottom line, it's very pragmatic, and verifiable
with prediction, all of which I find--
plus it's extremely economical.
There are very few moving parts.
OK.
So I'm going to quickly show you two
basic methods for forecasting.
There are many other possibilities that exist,
but these are just two very simple ones, simplex projection
and S-maps.
So simplex projection is basically a nearest neighbor
forecasting technique.
Now you can imagine having the number of nearest neighbors
to be a tunable parameter, but the idea here is to be minimal,
and the nearest neighbors are essentially determined
by the embedding dimension.
So if you have an embedding dimension of e,
you can always--
a point in an e dimensional space
can be an interior point in e plus one dimensions,
which means you just need e plus one neighbors.
And so e plus one-- so the number of neighbors
is determined.
It's not a free variable in this, OK?
So the idea then is to take these nearest neighbors
in this space, which are analogs,
project them forward, and see where they went,
and that'll give you an idea for where the system is headed.
OK.
So again, each point on this attractor
is a history vector or a history fragment, basically.
And so here is this point that I'm trying to predict from.
And I look at the nearest neighbors, and then I--
these are points in the past, right?
And now I say, where do they go next?
And so I get a spread of points going forward,
and I take the center of mass of that spread,
the exponentially weighted center of mass,
and that gives me a prediction.
So how do you predict the future?
You do it by looking at similar points in the past.
But what do you mean by similar?
What you mean by similar is that the points
have to be in the correct dimensionality.
So for example, if I'm trying to predict the temperature
at the end of Scripps Pier tomorrow,
the sea surface temperature, and it's
a three-dimensional process, and let's say the right lag should
be a week, then I'm not just going
to look at temperatures that are similar to today's temperature.
I'm going to look at temperatures where today's
temperature, the temperature a week ago,
and the temperature two weeks ago are most similar, right?
And so the knowing the dimensionality
is quite important for determining what the nearest
neighbors are, all right?
So you take the weighted average and that
becomes your prediction.
Here's an example.
This looks like white noise.
What I'm going to do is cut this data in half,
and I'm going to use the first half to build a model,
I'm going to predict on the second half.
So if I take time lag coordinates, and in this case,
again, I'm choosing on purpose three three-dimensional things,
because they're easy to show.
This is like taking a fork with three prongs,
laying it down on the time series,
and calling one x, the other one y, the other one z.
So I'm going to plot all those points going forward,
and this is the shape I get.
So you actually get what looked like white noise,
and it totally random actually was not.
In fact, I generated it from first differences
of [INAUDIBLE], OK?
So if we now use this simple zeroth order technique
and we try to predict that second half of the time series
that looked totally noisy, you can do quite well.
This is actually predicting to two points
into the future, two steps into the future.
OK.
So again, how did I know to choose three dimensions?
Basically you do this by trial and error.
You try like one, two, three.
And it peaks So this is, again, how well you can predict.
This is the Pearson correlation coefficient.
And this is trying different embedding dimensions,
trying a two-pronged fork, a three-pronged fork, so on.
And again, so the embedding with the best predictability
is the one that best unfolds the attractor, the one that best
resolves the singularities.
And this relies basically on the Whitney embedding theorem.
So if the attractor actually was a ball of thread, OK,
and I tried to embed this ball of thread in one dimension,
that would be like shining a light down across over a line.
Then at any point, I could be going right or left.
So there's singularities everywhere.
If I shine it down on two dimensions, I now have a disk.
At any point I can go right, left, up, down, so forth.
Everywhere is a singularity.
If I know embed it in three dimensions--
so the thread is one-dimensional, right?
If I embed it in three dimensions, all of a sudden,
I can see that I have individual threads.
And if you have these individual threads,
that allows you to make better predictions, right?
So this is how you can tell how well you've
embedded the attractor, how well you
can predict with the attractor.
OK.
All right.
So the other-- sort of the next order of complexity
is basically a first-order map, which
is a weighted autoregressive model where you're effectively
computing a plane along the manifold along this attractor
and using the coefficients of the Jacobian matrix
that you compute for this hyperplane,
basically, to give you predictions.
But when you're computing this plane,
there's a weighting function.
It's this weighting function that we're calling theta here.
And that weighting function determines how heavily you
weight points that are nearby on the attractor versus points
that are far away, OK?
So if theta is equal to zero, then all points
are equally weighted.
That's just like fitting a standard AR model
to a cloud of points, right?
All points are equally valid.
But if the attractor really matters,
then points nearby should be weighted more heavily
than points far away, OK?
So if there's actual curvature in there,
then if you weight more heavily, you're
taking advantage of that information, OK?
So this is if you crank theta up to 0.5,
your weighting points nearby more heavily,
so forth and so on.
OK.
This is a really simple test for non-linearity.
You can actually try increasing that theta,
the tuning parameter.
And if as you increase it the predictability goes up,
then that's an indication that you get an advantage
by acknowledging the fact that the function is
different at different parts on the attractor, which
is another way of saying the dynamics are state dependent,
which is another way of saying the manifold
has curvature to it, OK?
So curvature is actually ubiquitous in nature.
This is a study that my student [? Zach ?] [? Shee ?] did.
And if you look at 20th century records
for specific biological populations,
you find all of them exhibit non-linearity.
We didn't find non-linearity, actually,
for some of the physical measurements.
But again, we were just looking at the 20th century,
and it might've been too short to pick that up.
Other examples include other fish species, sheep, diatoms,
and an assortment of many other kinds of phenomena.
All show this kind of non-linearity.
It seems to be ubiquitous.
Wherever you look for it, it's actually rare
that you don't find it, OK?
So the fact that things are nonlinear is pretty important,
I think.
It affects the way that you should think about the problem
and analyze it.
And in fact, the non-linearity is a property
that I believe can be exploited.
This is an example of doing just that.
So this paper appeared last year in PRSB,
and it used S-maps, this technique that we just
saw, to show how species interactions vary
in time depending on where on the attractor they are, OK?
So it really showed how we can take
real-time measurements of the interactions that
are state dependent, OK?
And the basic idea is as follows.
So the S-map involves calculating a hyperplane
or a surface at each point as the system travels
along its attractor.
So this involves calculating the Jacobian matrix, whose elements
are partial derivatives that measure the effect of one
species on another.
So note that the embeddings here are multi-variate.
So these aren't lags of one variable,
but they're native variables, right?
So I want to know how the relationship
of each native variable affects the other variable
and how that changes through time.
So what I do is at each point, I compute a Jacobian matrix.
If this was an equilibrium system,
there would just be one point, and I
would be looking at the-- it's like
the standard linear stability analysis for an equilibrium
system.
But what I'm doing is I'm taking that analysis,
but I'm applying it to each as the system travels successively
along each point on the attractor.
So the coefficients are in effect
fit sequentially as the system travels along its attractor.
And they vary, therefore, according to the location
on the attractor.
So what's really nice about this is that it's something
that you can actually accomplish very easily on real data.
And here's an example.
This is data from a marine mesocosm that
was collected by Huisman, and what you want to focus on
is the competition between copepods and rotifers.
These are the two main consumers in this.
So these are both zooplanktons that eat phytoplankton.
And this is basically the partial
of how the callenoids vary with the rotifers.
And so you can see that the competition--
so this shows how the coefficients
are changing as you computed along as the system is
traveling along its attractor.
So what's the interesting thing, what
I think is interesting here is that I was totally surprised.
Competition is not a fairly smooth and long-term
relationship, right?
In classical ecology, it's regarded as a constant.
So two species compete, you compute their alpha.ij,
and that's the constant.
In fact, it's very episodic.
It seems to only occur like in these little bottlenecks, which
I think is-- so I mean, this is nature.
This is not my model.
This is what nature is telling me,
that you get competition in these little bottlenecks.
So that fact I found fairly surprising.
But what's even more interesting is
to ask the question, what is it about the system when
it does occur that causes this competition?
And it turns out that what you can do
is make a graph basically of how that coefficient--
this is terrible.
I think I got this when I talked at Stanford last fall.
OK.
All right.
All right, it's broken.
So you can make a plot of what the competition coefficient--
how the competition coefficient varies as a function of food
abundance.
And the obvious thing that you get here
is that when do you get competition?
When food is scarce.
I mean, duh.
That seems like it should be obvious.
But what wasn't clear before is how episodic this all is.
It's not sort of a gradual constant affair.
It's something that happens in these sudden bottlenecks.
So what we have then is a pretty good tool for probing changing
interactions.
And I can see other potential for this
in terms of looking for--
you can compute the matrix and maybe
compute something like an eigenvalue for the matrix
as it changes to look for changes where--
to look for instances where you were
about to enter a critical transition.
So this stuff really hasn't been written up yet.
You should go ahead and do it.
But I see a lot of potential for just
using this fairly simple approach, which again,
is very empirical, and it allows the data to tell you
what's actually happening.
OK.
So let's see how EDM deals with causation.
OK.
This is the formal statement of Granger causality.
So basically he's saying, I'm going
to try to predict Y2 from the universe
of all possible variables.
And this is the variance, my uncertainty in my prediction.
And it says that if however I remove Y1
and I'm trying to predict Y2, and this variance is greater,
than I know that Y1 was causal.
So it says if I exclude a variable
and I don't do as well at predicting, then
that variable was causal.
That's the formal definition of Granger causality.
The problem, however, is that this seems
to contradict Takens' theorem.
So Takens' theorem says the information
about other variables in the system
are contained in each other variable, OK?
So how can you remove a variable if that variable's information
is contained in the others?
So there is a little bit of a problem.
What's interesting is if you look at Granger's '68 paper
where he describes this, he says explicitly,
this may not work for dynamic systems.
So--
[LAUGHTER]
He was covered.
OK.
So I think this is a useful criterion
sort of as a kind of a rule of thumb, practical rule of thumb.
But it really is intended more for stochastic systems rather
than dynamic systems.
OK.
So in dynamic systems, time series variables
are causally related again if they're coupled and belong
to the same dynamic system.
If X causes Y, then information about X
must be encoded in this shadow manifold of Y.
And this is something that you can test with cross-mapping.
This was the paper that was published at the end of 2012
that describes the idea.
And I have one final video clip.
It's not narrated by Bob May.
I had my student [? Hal ?] [? Yee ?] do the narration
on this one.
But it'll explain it.
[VIDEO PLAYBACK]
- Takens' theorem gives us a one-to-one mapping between
the original manifold and reconstructed shadow manifolds.
Here we will explain how this important aspect of attractor
reconstruction can be used to [INAUDIBLE] two time series
variables belong to the same dynamic system
and are thus causally related.
This particular reconstruction is based on lags of variable x.
If we now do the same for variable y,
we find something similar.
Here we see the original manifold M, as well as
the shadow manifolds, Mx and My, created from lags of x and y
respectively.
Because both Mx and My map one-to-one
to the original manifold M, they also
map one-to-one to each other.
This implies that the points that are nearby
on the manifold My correspond to points that are also nearby
on Mx.
We can demonstrate this principle
by finding the nearest neighbors in My
and using their time indices to find
the corresponding points in Mx.
These points will be nearest neighbors on Mx
only if x and y are causally related.
Thus, we can use nearby points on My
to identify nearby points on Mx.
This allows us to use the historical record of y
to estimate the states of x and vice versa,
a technique we call cross-mapping.
With longer time series, the reconstructed manifolds
are denser, nearest neighbors are closer,
and a cross-map estimates increase in precision.
We call this phenomenon convergent cross-mapping
and use this convergence as a practical criterion
for detecting causation.
[END PLAYBACK]
OK.
So with convergent cross-mapping,
what we're trying to do is we're trying to recover states
of the affected variable--
we're trying to recover states of the causal variable
from the affected variable.
And so this is basic.
Let's see.
The idea is that instead of looking specifically
at the cause, we're looking at the effect
to try to infer what the cause was.
So basically from the victim, we can find something
about the aggressor or the perpetrator, right?
OK.
This little piece, I think, will give you
a little bit of intuition.
So these two time series are what you get if alpha is zero.
So this is y is red and x is blue.
And you can see that with alpha equal to zero,
they're independent.
If I crank up alpha, and then this is what I get.
So again, you can see that the blues time series is not
altered, but the red one, but y actually is.
And it's in this alteration of the time series
that I'm able, from the reconstructed manifold,
to be able to backtrack the values of the blue time series.
And so that shows that x was causal on y.
OK.
A necessary condition for a cross-map estimate for--
a necessary condition for a convergence
is to show that the cross-map estimate improves
with data length.
And so that's basically what we see here.
So as points get closer in the attractor,
your estimates should get better,
and so predictions should get better.
So let's look at some examples.
This is a classic predator/prey experiment
that Gauss made famous.
So didinium is the rotifer predator,
paramecium is the prey.
And you can see, you can get cross-mapping
in both directions, sure.
The predator is affecting the prey,
the prey is affecting the predator.
This sort of looks like maybe the predator
is affecting the prey more than the prey is
affecting the predator.
But if you look at this in a time lag way,
so this is looking at different prediction lags
for doing the cross-mapping, you find
that the effect of the predator on the prey
is almost instantaneous, which you kind of expect.
These are rotifers eating paramecia.
But the effect of the paramecia itself on the predator
is delayed, and it's delayed looks
like by about a day or so.
So you get sort of a sensible time delay here.
OK.
This is a field example.
These are sardines and anchovies that
have been sort of a mystery for quite a while.