AIは停滞しない - 考える時間を与えれば｜ノーム・ブラウン｜TED (AI Won’t Plateau — if We Give It Time To Think | Noam Brown

字幕表動画を再生する

AI 自動生成字幕

The incredible progress in AI over the past 5 years can be summarized in one word, scale.

過去5年間のAIの驚異的な進歩は、一言で言えば「スケール」である。
Yes there have been algorithmic advances, but the frontier models of today are still based on the same transformer architecture that was introduced in 2017.

確かにアルゴリズムの進歩はあったが、現在のフロンティア・モデルは2017年に導入されたのと同じトランスフォーマー・アーキテクチャーに基づいている。
And they are trained in a very similar way to the models that were trained in 2019.

そしてそれらは、2019年にトレーニングされたモデルと非常によく似た方法でトレーニングされている。
The main difference is the scale of the data and compute that goes into these models.

主な違いは、これらのモデルに投入されるデータと計算の規模だ。
In 2019, GPT-2 cost about $5,000 to train.

2019年、GPT-2のトレーニング費用は約5,000ドル。
Every year since then, for the past 5 years, the models have gotten bigger, trained for longer, on more data.

それ以来毎年、過去5年間、モデルはより大きくなり、より長い期間、より多くのデータでトレーニングされてきた。
And every year they've gotten better.

そして年々良くなっている。
But today's frontier models can cost hundreds of millions of dollars to train.

しかし、今日のフロンティアモデルは、トレーニングに数億ドルかかることもある。
And there are reasonable concerns among some that AI will soon plateau or hit a wall.

また、AIがすぐに頭打ちになるか、壁にぶつかるのではないかという懸念もある。
After all, are we really going to train models that cost hundreds of billions of dollars?

結局のところ、我々は本当に何千億ドルもするモデルを訓練するつもりなのだろうか？
What about trillions of dollars?

数兆ドルはどうだろう？
At some point, the scaling paradigm breaks down.

ある時点で、スケーリングのパラダイムは崩壊する。
This is, in my opinion, a reasonable concern.

これは妥当な懸念だと私は思う。
And in fact, it's one that I used to share.

そして実際、私が以前共有していたものでもある。
But today, I am more confident than ever that AI will not plateau.

しかし今日、私はAIが停滞することはないと、これまで以上に確信している。
And in fact, I believe that we will see AI progress accelerate in the coming months.

そして実際、AIの進歩は今後数カ月で加速すると私は信じている。
To explain why, I want to tell a story from my time as a PhD student.

その理由を説明するために、私が博士課程の学生だった頃の話をしたい。
I started my PhD in 2012, and I was lucky to be able to work on the most exciting projects I could imagine.

私は2012年に博士号を取得し、幸運にも想像しうる限り最もエキサイティングなプロジェクトに携わることができた。
Developing AIs that could learn, on their own, how to play poker.

ポーカーのやり方を自分で学習できるAIの開発。
Now, I had played a lot of poker when I was in high school and college, so for me, this was basically my childhood dream job.

高校や大学時代にポーカーをよくやっていた私にとって、これは子供のころからの夢のような仕事だった。
Now, contrary to its reputation, poker is not just a game of luck.

その評判に反して、ポーカーは単なる運ゲーではない。
It's also a game of deep strategy.

深い戦略のゲームでもある。
You can kind of think of it like chess with a deck of cards.

トランプのチェスのようなものだと思えばいい。
When I started my PhD, there had already been several years of research on how to make AIs that play poker.

私が博士課程に入ったとき、ポーカーをプレイするAIを作る方法についての研究はすでに数年行われていた。
And the general feeling among the research community is that we had figured out the paradigm, and now all we needed to do was scale it.

そして、研究コミュニティの一般的な感覚は、私たちはパラダイムを解明し、あとはそれをスケールアップさせるだけだというものだった。
So, every year, we would train larger poker AIs for longer on more data.

そのため、毎年、より大きなポーカーAIをより多くのデータでより長く訓練することになる。
And every year, they would get better, just like today's frontier language models.

そして、今日のフロンティア言語モデルのように、年々改良されていくのだ。
By 2015, they got so good that we thought they might be able to rival the top human experts.

2015年までには、人間のトップ・エキスパートに匹敵するかもしれないと思うほど、彼らは優秀になった。
So we challenged four of the world's top poker players to an 80,000-hand poker competition with $120,000 in prize money to incentivize them to play their best.

そこで私たちは、世界トップクラスのポーカープレイヤー4人に、賞金12万ドルをかけた8万ハンドのポーカー大会に挑戦してもらい、彼らのベストプレーに刺激を与えた。
And unfortunately, our bot lost by a wide margin.

そして残念ながら、我々のロボットは大差で敗れた。
In fact, it was clear even on day one that our bot was outmatched.

実際、初日から我々のロボットが劣っていることは明らかだった。
But during this competition, I noticed something interesting.

しかし、この大会で私は面白いことに気づいた。
You see, leading up to this competition, our bot had played almost a trillion hands of poker over thousands of CPUs for about three months.

この大会に参加するまでに、私たちのボットは約3ヶ月間、何千ものCPUを使ってほぼ1兆ハンドのポーカーをプレイしたんだ。
But when it came time to actually play against these human experts, the bot acted instantly.

しかし、実際に人間の専門家と対戦する段になると、ボットは即座に行動した。
It took about 10 milliseconds to make a decision, no matter how difficult it was.

どんなに難しいことでも、決断するのに約10ミリ秒かかった。
Meanwhile, the human experts had only played maybe 10 million hands of poker in their lifetimes.

一方、人間の専門家たちは、生涯でポーカーを1000万回ほどしかやったことがなかった。
But when they were faced with a difficult decision, they would take the time to think.

しかし、難しい決断を迫られたとき、彼らは時間をかけて考える。
If it was an easy decision, they might only think for a couple seconds.

簡単な決断なら、数秒しか考えないかもしれない。
If it was a difficult decision, they might think for a few minutes.

難しい決断であれば、数分間考えるかもしれない。
But they would take advantage of the time that they had to think through their decisions.

しかし、彼らは自分の決断をじっくりと考える時間があることを利用していた。
In Daniel Kahneman's book, Thinking Fast and Slow, he describes this as the difference between System 1 thinking and System 2 thinking.

ダニエル・カーネマンの著書『Thinking Fast and Slow』では、これをシステム1思考とシステム2思考の違いと表現している。
System 1 thinking is the faster, more intuitive kind of thinking that you might use, for example, to recognize a friendly face or laugh at a funny joke.

システム1思考は、より速く、より直感的な思考であり、例えば、友好的な顔を認識したり、面白いジョークで笑ったりするときに使う。
System 2 thinking is the slower, more methodical thinking that you might use for things like planning a vacation or writing an essay or solving a hard math problem.

システム2思考は、休暇の計画やエッセイの執筆、難しい数学の問題を解くときなどに使う、よりゆっくり、より理路整然とした思考である。
After this competition, I wondered whether this System 2 thinking might be what's missing from our bot.

この大会の後、私はこのシステム2の思考が、私たちのロボットに欠けているものではないかと考えた。
It might explain the difference in the performance between our bot and the human experts.

それが、私たちのボットと人間の専門家のパフォーマンスの違いを説明するかもしれない。
So I ran some experiments to see just how much of a difference this System 2 thinking makes in poker.

そこで私は、このシステム2の思考がポーカーにおいてどれほどの違いを生むのか、いくつかの実験を行ってみた。
And the results that I got blew me away.

そして得られた結果は私を圧倒した。
It turned out that having the bot think for just 20 seconds in a hand of poker got the same boost in performance as scaling up the model by 100,000x and training it for 100,000 times longer.

その結果、ポーカーのハンドでボットにわずか20秒間考えさせるだけで、モデルを10万倍にスケールアップして10万倍の時間トレーニングしたのと同じパフォーマンス向上が得られることが判明した。
Let me say that again.

もう一度言おう。
Spending 20 seconds thinking in a hand of poker got the same boost in performance as scaling up the size of the model and the training by 100,000x.

ポーカーの手札を考えるのに20秒かけると、モデルやトレーニングのサイズを10万倍にスケールアップしたのと同じパフォーマンス向上が得られた。
When I got this result, I literally thought it was a bug.

この結果が出たとき、私は文字通りバグだと思った。
For the first three years of my PhD, I had managed to scale up these models by 100x.

博士課程の最初の3年間、私はこれらのモデルを100倍にスケールアップすることに成功した。
I was proud of that work.

私はその仕事に誇りを持っていた。
I had written multiple papers on how to do that scaling.

私はそのスケーリングの方法について複数の論文を書いていた。
But I knew pretty quickly that all of that would be a footnote compared to just scaling up System 2 thinking.

しかし、システム2の思考をスケールアップさせることに比べれば、そんなことは足元にも及ばないことはすぐにわかった。
So based on these results, we redesigned the poker AI from the ground up.

そこで、この結果に基づいて、ポーカーのAIを一から設計し直した。
Now we were focused on scaling up System 2 thinking in addition to System 1.

今、私たちはシステム1に加え、システム2の思考を拡大することに集中していた。
And in 2017, we again challenged four of the world's top poker pros to a 120,000-hand poker competition, this time with $200,000 in prize money.

そして2017年、私たちは再び世界のトップポーカープロ4人に賞金20万ドルをかけて12万ハンドのポーカー大会に挑んだ。
And this time, we beat all of them by a huge margin.

そして今回は、そのすべてに大差をつけた。
This was a huge surprise to everybody involved.

これは関係者全員にとって大きな驚きだった。
It was a huge surprise to the poker community.

ポーカー界にとっては大きな驚きだった。
It was a huge surprise to the AI community.

AIコミュニティにとっては大きな驚きだった。
And honestly, even a huge surprise to us.

そして正直なところ、私たちにとっても大きな驚きだった。
I literally did not think it was possible to win by the kind of margin that we won by.

あのような大差で勝てるとは文字通り思っていなかった。
In fact, I think what really highlights just how surprising this result was is that when we announced the competition, the poker community decided to do what they do best and gamble on who would win.

実際、この結果がいかに驚くべきものであったかを如実に示しているのは、私たちがコンペティションを発表したとき、ポーカー・コミュニティが、彼らが最も得意とするギャンブルを決行し、誰が勝つかを予想したことだと思う。
When we started, when we announced the competition, the betting odds were about 4 to 1 against us.

コンペティションを発表した当初は、4対1のオッズだった。
After the first three days of the competition, when we had won for the first three days, the betting odds were still about 50-50.

大会の最初の3日間を勝ち抜いた時点では、賭けのオッズはまだ五分五分くらいだった。
But by the eighth day of the competition, you could no longer gamble on which side would win.

しかし、大会8日目には、もはやどちらが勝つかというギャンブルはできなくなっていた。
You could only gamble on which human would lose the least by the end.

最後までどの人間が一番損をしないかだけを賭けることができる。
This pattern of AI benefiting by thinking for longer is not unique to poker.

より長く考えることでAIが恩恵を受けるというこのパターンは、ポーカーに限ったことではない。
And in fact, we've seen it in multiple other games as well.

実際、私たちは他の複数の試合でもそれを見てきた。
For example, in 1997, IBM created Deep Blue, an AI that plays chess.

例えば1997年、IBMはチェスをするAI「ディープ・ブルー」を開発した。
And they challenged the world champion Garry Kasparov to a tournament and beat him in a landmark achievement for AI.

そして、世界チャンピオンのガルリ・カスパロフにトーナメントを挑み、AIにとって画期的な快挙となる勝利を収めた。
But Deep Blue didn't act instantly.

しかし、ディープ・ブルーは即座には動かなかった。
Deep Blue thought for a couple minutes before making each move.

ディープ・ブルーは一手一手打つ前に2、3分考えた。
Similarly, in 2016, DeepMind created AlphaGo, and he had to play the game of Go, which is even more complicated than the game of chess.

同様に2016年、ディープマインドはアルファ碁を作り、チェスよりもさらに複雑な囲碁の対局をさせた。
And they too challenged a world champion, Lee Sedol, and beat him in a landmark achievement for AI.

そして彼らもまた、世界チャンピオンのイ・セドルに挑戦し、AIにとって画期的な快挙となる勝利を収めた。
But AlphaGo also didn't act instantly.

しかし、アルファ碁も即座に行動したわけではない。
AlphaGo took the time to think for a couple minutes before making each move.

アルファ碁は、一手を打つ前に2、3分考える時間をとった。
In fact, the authors of AlphaGo later published a paper where they measured just how much of a difference this thinking time makes for the strongest version of AlphaGo.

実際、AlphaGoの作者は後に論文を発表し、この思考時間が最強バージョンのAlphaGoにどれほどの違いをもたらすかを測定した。
And what they found is that when AlphaGo had the time to think for a couple minutes, it would beat any human alive by a huge margin.

そして彼らが発見したのは、アルファ碁が数分間考える時間を持ったとき、生きているどんな人間にも圧倒的な差をつけて勝つということだ。
But when it had to act instantly, it would do much worse than top humans.

しかし、即座に行動しなければならないときは、一流の人間よりもはるかに悪いことをする。
In 2021, there was a paper that was published that tried to measure just how much of a difference this thinking time made a bit more scientifically.

2021年、この思考時間がどれほどの違いをもたらすかをもう少し科学的に測定しようとした論文が発表された。
In it, the authors found that in these games, scaling up thinking time by 10x was roughly the equivalent of scaling up the model size and training by 10x.

その中で著者らは、これらのゲームにおいて、思考時間を10倍に拡大することは、モデルサイズとトレーニングを10倍に拡大することとほぼ同じであることを発見した。
So you have this very clear, clean relationship between scaling up system 2 thinking time and scaling up system 1 training.

つまり、システム2の思考時間を拡大することと、システム1のトレーニングを拡大することの間には、このように明確な関係があるのだ。
Now, why does this matter?

さて、なぜこれが重要なのか？
Well, remember I mentioned at the start of this talk that today's frontier models cost hundreds of millions of dollars to train.

さて、冒頭で今日のフロンティア・モデルのトレーニングには数億ドルかかると述べたのを覚えているだろうか。
But the cost of querying them, the cost of asking a question and getting an answer, is fractions of a penny.

しかし、彼らに問い合わせるコスト、つまり質問をして回答を得るコストは、1円にも満たない。
So this result says that if you want an even better model, there are two ways you could do it.

つまり、この結果は、さらに優れたモデルを求めるなら、2つの方法があることを示している。
One is to keep doing what we've been doing for the past five years and scaling up system 1 training.

ひとつは、過去5年間やってきたことを続け、システム1のトレーニングを拡大することだ。
Go from spending hundreds of millions of dollars on a model to billions of dollars on a model.

モデルに何億ドルもかけていたのが、何十億ドルもかけるようになった。
The other is to scale up system 2 thinking and go from spending a penny per query to 10 cents per query.

もうひとつは、システム2の考え方をスケールアップさせ、クエリ1件あたり1ペニーを10セントにすることだ。
At a certain point, that trade-off becomes well worth it.

ある時点で、そのトレードオフは十分に価値のあるものになる。
Now, of course, all of these results are in the domain of games.

もちろん、これらの結果はすべてゲームの領域である。
And there was a reasonable question about whether these results could be extended to a more complicated setting like language.

そして、これらの結果を言語のような複雑な設定に拡張できるかどうかについては、妥当な疑問があった。
But recently, my colleagues and I at OpenAI released O1, a new series of language models that think before responding.

しかし最近、私とOpenAIの同僚は、反応する前に考える言語モデルの新シリーズ、O1をリリースした。
If it's an easy question, O1 might only think for a few seconds.

簡単な質問なら、O1は数秒しか考えないかもしれない。
If it's a difficult decision, it might think for a few minutes.

難しい決断なら、数分間考えるかもしれない。
But just like the AIs for chess, go, and poker, O1 benefits by being able to think for longer.

しかし、チェスや囲碁、ポーカーのAIがそうであるように、O1はより長く考えることができることで利益を得ている。
This opens up a completely new dimension for scaling.

これは、スケーリングのまったく新しい次元を切り開くものだ。
We're no longer constrained to just scaling up system 1 training.

私たちはもはや、システム1のトレーニングの規模を拡大することだけに縛られることはない。
Now we can scale up system 2 thinking as well.

これでシステム2の思考もスケールアップできる。
And the beautiful thing about scaling up in this direction is that it's largely untapped.

そして、この方向で規模を拡大することの素晴らしい点は、それがほとんど未開拓だということだ。
Remember, I mentioned that the Frontier models of today cost less than a penny to query.

今日のフロンティアの機種は、問い合わせに1円もかからないと言ったのを覚えているだろうか。
Now, when I mention this to people, a frequent response that I get is that people might not be willing to wait around for a few minutes to get a response from a model or pay a few dollars to get an answer to their question.

さて、このことを人に話すと、よく返ってくる反応は、モデルからの返事を得るために数分間待ち続けたり、質問に対する答えを得るために数ドルを支払ったりするのは嫌がるかもしれない、というものだ。
And it's true that O1 takes longer and costs more than other models that are out there.

そして、O1が他のモデルよりも時間とコストがかかるのは事実だ。
But I would argue that for some of the most important problems that we care about, that cost is well worth it.

しかし、私たちが関心を寄せる最も重要な問題のいくつかについては、そのコストは十分に見合うものだと私は主張したい。
So let's do an experiment and see.

では、実験をして見よう。
Raise your hand if you would be willing to pay more than a dollar for a new cancer treatment.

新しいがん治療に1ドル以上払ってもいいと思う人は手を挙げてください。
All right, basically everybody in the audience.

よし、基本的に観客のみんな。
Keep your hand up.

手を上げていなさい。
How about a thousand dollars?

1000ドルではどうですか？
How about a million dollars?

100万ドルはどうだ？
What about for more efficient solar panels?

より効率的なソーラーパネルについてはどうだろう？
Or for a proof of the Riemann hypothesis?

それともリーマン仮説の証明のため？
The common conception of AI today is chatbots.

今日のAIの一般的な概念はチャットボットである。
But it doesn't have to be that way.

しかし、そうである必要はない。
This isn't a revolution that's 10 years away or even 2 years away.

これは10年先の革命でも、2年先の革命でもない。
It's a revolution that's happening now.

今起きている革命だ。
My colleagues and I have already released O1 Preview and I have had people come to me and say that it has saved them days worth of work, including researchers at top universities.

私の同僚と私はすでにO1プレビューをリリースしており、一流大学の研究者を含め、数日分の仕事を節約できたと言う人が私のところに来た。
And that's just the preview.

これはプレビューに過ぎない。
I mentioned at the start of this talk that the history of AI progress over the past 5 years can be summarized in one word.

この講演の冒頭で、過去5年間のAIの進歩の歴史は一言で要約できると述べた。
Scale.

規模が大きい。
So far, that has meant scaling up the System 1 training of these models.

これまでのところ、それはこれらのモデルのシステム1のトレーニングを拡大することを意味している。
Now, we have a new paradigm.

今、私たちは新しいパラダイムを手に入れた。
One where we can scale up System 2 thinking as well.

システム2の考え方もスケールアップできる。
And we are just at the very beginning of scaling up in this direction.

そして私たちは、この方向で規模を拡大するほんの始まりにすぎない。
Now, I know that there are some people who will still say that AI is going to plateau or hit a wall.

さて、それでもAIは停滞するとか、壁にぶつかるとか言う人がいることは知っている。
And to them, I say, want to bet?

そして私は彼らに、賭けようか？
Thank you.

ありがとう。
Thank you.

ありがとう。

AIは停滞しない - 考える時間を与えれば｜ノーム・ブラウン｜TED (AI Won’t Plateau — if We Give It Time To Think | Noam Brown | TED)