DeepSeekとは？AIモデルの基本を解説 (What is DeepSeek? AI Model Basics Explained)

字幕表動画を再生する

AI 自動生成字幕

Chances are you've heard about the newest entrant to the very crowded and very competitive realm of AI models, DeepSeek.

AIモデルという非常に混雑した、非常に競争の激しい領域に新たに参入したDeepSeekのことを耳にしたことがある人も多いだろう。
It's a startup based in China and it caught everyone's attention by taking over OpenAI's coveted spot for most downloaded free app in the US on Apple's App Store.

中国を拠点とする新興企業で、アップルのApp Storeで米国で最もダウンロードされた無料アプリの座をOpenAIに譲り、皆の注目を集めた。
So how?

では、どうやって？
Well by releasing an open source model that it claims can match or surpass the performance of other industry leading models and at a fraction of the cost.

オープンソースモデルをリリースすることで、業界をリードする他のモデルに匹敵する、あるいはそれを上回るパフォーマンスを、わずかなコストで実現できると主張している。
Now the specific model that's really making a splash from DeepSeek is called DeepSeek R1.

今、ディープシークで話題になっているのは、ディープシークR1というモデルだ。
And the R here, that implies reasoning because this is a reasoning model.

そしてここでのRは、推論を意味する。これは推論モデルだからだ。
DeepSeek R1 is their reasoning model.

DeepSeek R1は彼らの推論モデルである。
Now DeepSeek R1 performs as well as some of the other models including OpenAI's own reasoning model.

現在、DeepSeek R1は、OpenAI独自の推論モデルを含む他のいくつかのモデルと同等の性能を発揮している。
That's called O1 and it can match or even outperform it across a number of AI benchmarks for math and coding tasks.

これはO1と呼ばれるもので、数学とコーディングのタスクに関する多くのAIベンチマークでO1に匹敵するか、上回ることさえできる。
Which is all the more remarkable because according to DeepSeek, DeepSeek R1 is trained with far fewer chips and is approximately 96% cheaper to run than O1.

ディープシークによれば、ディープシークR1はO1よりもはるかに少ないチップ数で学習され、実行コストは約96％安い。
Now unlike previous AI models which produced an answer without explaining the why, a reasoning model solves complex problems by breaking them down into steps.

理由を説明せずに答えを出すこれまでのAIモデルとは異なり、推論モデルは複雑な問題をステップに分解して解決する。
So before answering a user query, the model spends time thinking.

つまり、ユーザーからの問い合わせに答える前に、モデルは考えることに時間を費やす。
Thinking in air quotes here.

ここでは空気引用符で考える。
And that thinking time could be a few seconds or even minutes.

そして、その思考時間は数秒から数分かもしれない。
Now during this time, the model is performing step-by-step analysis through a process that is known as chain of thought.

この間、モデルは思考の連鎖として知られるプロセスを通じて、段階的な分析を行っている。
And unlike other reasoning models, R1 shows the user that chain of thought process as it breaks the problem down, as it generates insights, as it backtracks as it needs to, and as it ultimately arrives at an answer.

そして、他の推論モデルとは異なり、R1は問題を分解し、洞察を生み出し、必要に応じて後戻りし、最終的に答えにたどり着くまでの思考プロセスの連鎖をユーザーに示す。
Now I'm going to get into how this model works, but before that, let's talk about how it came to be.

では、このモデルがどのように機能するのかについて説明するが、その前に、このモデルがどのようにして誕生したのかについて話そう。
DeepSeek R1 seems to have come out of nowhere, but there are in fact many DeepSeek models that brought us to this point.

ディープシークR1は突如として現れたように見えるが、実はここに至るまでには多くのディープシークモデルがある。
A model avalanche, if you like.

雪崩の模型といったところか。
And my colleague Aaron can help dig us out.

同僚のアーロンが掘り出してくれるだろう。
Thanks Martin, there is certainly a lot to dig out here.

マーティン、ありがとう。確かにここには掘り起こすべきことがたくさんある。
There's a lot of these models, but let's start from the very top and beginning of all this.

このようなモデルはたくさんあるが、一番上、そしてすべての始まりから始めよう。
So we begin and we go to, let's say DeepSeek version 1, which is a 67 billion model that was released in January of 2024.

ディープシーク・バージョン1（2024年1月にリリースされた670億のモデル）を見てみよう。
Now this is a traditional transformer with a focus on the feed-forward neural networks.

さて、これはフィードフォワード・ニューラルネットワークに焦点を当てた伝統的なトランスフォーマーである。
This gets us down into DeepSeek version 2, which really put this on the map.

これでDeepSeekはバージョン2に突入した。
This is a very large 236 billion model that was released not that far away from the original, which is June 2024.

これは、2024年6月というオリジナルからそれほど遠くない時期に発表された2360億の超大型モデルである。
But to put this into perspective, there are really two novel aspects around this model.

しかし、このモデルには2つの新しい側面がある。
The first one was the multi-head latent attention.

最初のものは、多頭の潜在的な注目である。
And the second aspect was the DeepSeek mixture of experts.

そして2つ目の側面は、ディープシークの混合専門家である。
It just made the model really fast and performant.

そのおかげで、このモデルはとても速く、パフォーマンスも高くなった。
And it set us up for success for the DeepSeek version 3, which was released December of 2024.

そして、2024年12月にリリースされたディープシーク・バージョン3の成功に向けた準備が整った。
Now this one is even bigger.

これはさらに大きい。
It's 671 billion parameters.

パラメータは6710億だ。
But this is where we began to see the introduction of using reinforcement learning with that model.

しかし、このモデルで強化学習の導入が始まった。
And some other contributions that this model had is it was able to balance load across many GPUs because they used a lot of H800s within their infrastructure.

また、このモデルの他の貢献としては、インフラ内でH800を多く使用していたため、多くのGPUに負荷を分散させることができた。
And that was also built around on top of DeepSeek V2.

これもDeepSeek V2の上に構築されたものだ。
So all these models accumulate and build on top of each other, which gets us down into DeepSeek R1.0, which was released in January of 2025.

つまり、これらすべてのモデルが積み重なり、2025年1月にリリースされたDeepSeek R1.0に行き着くのだ。
So this is the first of the raising models now, right?

今、これが最初のレイズモデルなんだね？
It is.

そうだ。
Yeah.

そうだね。
And it's really neat how they began to train these types of models.

そして、彼らがこの種のモデルを訓練し始めたのは本当に素晴らしいことだ。
So it's a type of fine tuning.

つまり、微調整の一種なんだ。
But on this one, they exclusively use reinforcement learning, which is a way where you have policies and you want to reward or you want to penalize the model for some action that it has taken or output that it has taken.

強化学習とは、モデルが取った行動や出力に対して、報酬を与えたりペナルティを与えたりする方法だ。
And it self-learns over time.

そして、時間とともに自己学習していく。
And it was very performant.

そして、とてもパフォーマンスが良かった。
It did well.

よくやったよ。
But it got even better with DeepSeek R1, which was, again, built on top of R1.0.

しかし、R1.0の上に構築されたDeepSeek R1でさらに良くなった。
And this one used a combination of reinforcement learning and supervised fine tuning, the best of both worlds, so that it could even be better.

そしてこのチームは、強化学習と教師ありの微調整の組み合わせを使っている。
And it's very close to performance on many standards and benchmarks as some of these open AI models we have now.

そして、多くの標準やベンチマークにおいて、現在あるいくつかのオープンAIモデルと非常に近いパフォーマンスを持っている。
And this gets us down into now distilled models, which is like a whole other paradigm.

そしてこれは、まったく別のパラダイムのようなものだ。
Distilled models.

蒸留モデル。
Okay.

オーケー。
So tell me what that is all about.

それがどういうことなのか教えてほしい。
Yeah, great question and comment.

ああ、素晴らしい質問とコメントだ。
So first of all, a distilled model is where you have a student model, which is a very small model, and you have the teacher model, which is very big.

まず第一に、蒸留されたモデルとは、非常に小さなモデルである生徒のモデルと、非常に大きなモデルである教師のモデルがあるということです。
And you want to distill or extract knowledge from the teacher model down into the student model.

そして、教師モデルから生徒モデルへと知識を抽出するのです。
In some aspects, you could think of it as model compression.

ある面では、モデルの圧縮と考えることもできる。
But one interesting aspect around this is this is not just compression or transferring knowledge, but it's model translation, because we're going from the R1.0, which is one of those mixture of expert models, down into, for example, a LAMA series model, which is not a mixture of experts, but it's a traditional transformer.

というのも、R1.0はエキスパートの混合モデルの1つですが、これを例えばLAMAシリーズのモデルに落とし込むわけです。
So you're going from one architecture type to another, and we do the same with QUINT.

つまり、あるアーキテクチャタイプから別のアーキテクチャタイプへ移行するわけだが、QUINTでも同じことをしている。
Right?

そうだろう？
So there's different series of models that are the foundation that we then distill into from the R1.0.

つまり、R1.0から私たちが抽出する基礎となる、さまざまな一連のモデルがあるのです。
Well, thanks.

まあ、ありがとう。
That's really interesting to get the history behind all this.

その背景にある歴史を知るのは本当に興味深い。
It didn't come from nowhere.

どこからともなくやってきたわけではない。
But with all of these distilled models coming, I think you might need your shovel back to dig your way out of those.

しかし、これらの蒸留されたモデルすべてが来ているため、それらを掘り出すためにシャベルが必要になるかもしれない。
Thank you very much.

ありがとうございました。
There's going to be a lot of distilled models.

蒸留されたモデルがたくさん出てくるだろう。
So you're exactly right.

だから、君の言う通りだ。
I think I'm going to go dig.

掘りに行こうかな。
Thanks.

ありがとう。
So R1.0 didn't come from nowhere.

だから、R1.0はどこからともなく生まれたわけではない。
It's an evolution of other models.

他のモデルの進化形だ。
But how does DeepSeq operate at such comparatively low cost?

しかし、なぜDeepSeqはこれほど低コストで運用できるのだろうか？
Well, by using a fraction of the highly specialized NVIDIA chips used by their American competitors to train their systems.

まあ、アメリカの競合他社が使用している高度に専門化されたエヌビディアのチップの何分の一かを使ってシステムを訓練しているのだから。
In fact, I can illustrate this in a graph.

実は、これをグラフで説明することができる。
So if we consider different types of model and then the number of GPUs that they use.

そこで、さまざまなタイプのモデルと、それらが使用するGPUの数を考えてみる。
Well, DeepSeq engineers, for example, they said that they only need 2000 GPUs, that's graphical processing units, to train the DeepSeq V3 model.

例えば、DeepSeqのエンジニアは、DeepSeq V3モデルをトレーニングするのに必要なGPUは2000個だと言っている。
DeepSeq V3.

DeepSeq V3。
Now, in isolation, what does that mean?

さて、単独ではどういう意味だろうか？
Is that good?

それでいいのか？
Is that bad?

それが悪いことなのか？
Well, by contrast, Meta said that the company was training their latest open source model.

対照的に、メタは最新のオープンソースモデルをトレーニング中だという。
That's Llama 4.

それがラマ4世だ。
And they are using a computer cluster with over 100,000 NVIDIA GPUs.

そして、彼らは10万以上のエヌビディアGPUを搭載したコンピューター・クラスターを使用している。
So that brings up the question of how is it so efficient?

では、なぜそんなに効率的なのかという疑問が湧いてくる。
Well, DeepSeq R1 combines chain of thought reasoning with a process called reinforcement learning.

さて、DeepSeq R1は、強化学習と呼ばれるプロセスに思考の連鎖推論を組み合わせたものだ。
This is a capability that Aaron mentioned just now, which arrived with the V3 model of DeepSeq.

これは先ほどアーロンが言及した機能で、DeepSeqのV3モデルから搭載された。
And here, an autonomous agent learns to perform a task through trial and error without any instructions from a human user.

そしてここでは、自律型エージェントは、人間のユーザーからの指示がなくても、試行錯誤を通じてタスクを実行することを学習する。
Now, traditionally, models will improve their ability to reason by being trained on labeled examples of correct or incorrect behavior.

さて、伝統的にモデルは、正しい、あるいは正しくない行動のラベル付き例で訓練されることによって、推論能力を向上させる。
That's known as supervised learning or by extracting information from hidden patterns.

これは教師あり学習、あるいは隠れたパターンから情報を抽出することで知られている。
That's known as unsupervised learning.

これは教師なし学習として知られている。
But the key hypotheses here with reinforcement learning is to reward the model for correctness.

しかし、強化学習で重要な仮説は、モデルの正しさに報酬を与えることだ。
No matter how it arrived at the right answer and let the model discover the best way to think all on its own.

どのようにして正解に辿り着いたかは関係なく、モデル自身に最適な思考法を発見させるのだ。
Now, DeepSeq R1 also uses a mixture of experts architecture or MOE.

さて、DeepSeq R1もミクスチャー・オブ・エキスパート・アーキテクチャ（MOE）を採用している。
And a mixture of experts architecture is considerably less resource intensive to train.

また、専門家が混在するアーキテクチャは、トレーニングに要するリソースがかなり少なくて済む。
Now, the MOE architecture divides an AI model up into separate entities or sub networks, which we can think of as being individual experts.

さて、MOEアーキテクチャは、AIモデルを個別のエンティティまたはサブネットワークに分割する。
So in my little neural network here, I'm going to create three experts.

そこで、この小さなニューラルネットワークで、3人のエキスパートを作ることにする。
A real MOE architecture probably have quite a bit more than that.

本物のMOEアーキテクチャーには、おそらくそれ以上のものがあるだろう。
But each one of these is specialized in a subset of the input data.

しかし、これらはそれぞれ入力データのサブセットに特化している。
And the model only activates the specific experts needed for a given task.

そしてこのモデルは、与えられたタスクに必要な特定のエキスパートだけをアクティブにする。
So a request comes in, we activate the experts that we need, and we only use those rather than activating the entire neural network.

つまり、リクエストが来たら、必要なエキスパートをアクティブにし、ニューラルネットワーク全体をアクティブにするのではなく、そのエキスパートだけを使う。
So consequently, the MOE architecture reduces computational costs during pre-training and achieves faster performance during inference time.

その結果、MOEアーキテクチャは事前学習時の計算コストを削減し、推論時の性能を高速化する。
And look, MOE, that architecture isn't unique to models from DeepSeq.

それにMOE、あのアーキテクチャーはDeepSeqのモデルに特有のものではない。
There are models from the French AI company Mistral that also use this.

フランスのAI企業ミストラルのモデルもこれを採用している。
And in fact, the IBM Granite model that is also built on a mixture of experts architecture.

そして実際、IBMのグラナイト・モデルもまた、エキスパート・アーキテクチャーの混合で構築されている。
So it's a commonly used architecture.

だから、よく使われるアーキテクチャなんだ。
So that is DeepSeq R1.

これがDeepSeq R1だ。
It's an AI reasoning model that is matching other industry leading models on reasoning benchmarks while being delivered at a fraction of the cost in both training and inference.

このAI推論モデルは、推論ベンチマークにおいて他の業界をリードするモデルに匹敵する一方、トレーニングと推論の両方でわずかなコストで提供されている。
All of which makes me think that this is an exciting time for AI reasoning models.

これらのことから、私は今がAIの推論モデルにとってエキサイティングな時期だと考えている。
Thank you.

ありがとう。