DeepSeekを支えるエンジニアリングの鍵｜YC Decoded (The Engineering Unlocks Behind DeepSeek

字幕表動画を再生する

AI 自動生成字幕

There's a new AI model in town.

新しいAIモデルが登場した。
Chinese AI company DeepSeek recently made waves when it announced R1, an open-source reasoning model that it claimed achieved comparable performance to OpenAI-01 at a fraction of the cost.

中国のAI企業DeepSeekは最近、OpenAI-01に匹敵する性能をわずかなコストで実現したとするオープンソースの推論モデル「R1」を発表し、話題を呼んだ。
The announcement unleashed a wave of social media panic and stock market chaos.

この発表はソーシャルメディアパニックと株式市場の混乱を引き起こした。
NVIDIA losing nearly 600 billion dollars in market cap today alone.

エヌビディアは今日だけで6000億ドル近い時価総額を失った。
But for those following AI developments closely, DeepSeek and R1 didn't come out of nowhere.

しかし、AIの発展を注視している人々にとって、ディープシークとR1は突然現れたわけではない。
The company has been publishing its research and releasing its model weights for months, following a path similar to Meta's Lama model.

同社は、メタ社のラマ・モデルと同じような道をたどりながら、何カ月も前から研究を発表し、モデルの重さを公表してきた。
This is in contrast to other major AI labs like OpenAI, Google DeepMind, and Anthropic that have closed weights and publish more limited technical What's changed is just that now the broader public is actually paying attention.

これは、オープンAI、グーグル・ディープマインド、アンソロピックといった他の主要なAIラボが、ウェイトを閉じてより限定的な技術的発表を行なっているのとは対照的である。
So let's decode what the real developments here are, where they come from, and why they matter.

では、ここで何が本当の進展なのか、それはどこから来ているのか、そしてなぜそれが重要なのかを読み解いてみよう。
First of all, it is important to distinguish between two relevant models here, DeepSeek R1 and DeepSeek V3.

まず、DeepSeek R1とDeepSeek V3という2つのモデルを区別することが重要である。
DeepSeek V3, which was actually released this past December, is a general-purpose base model that achieves comparable performance to other base models like OpenAI's GPT-40, Anthropic's Cloud 3.5 Sonnet, and Google's Gemini 1.5.

実際にこの12月にリリースされたDeepSeek V3は、OpenAIのGPT-40、AnthropicのCloud 3.5 Sonnet、GoogleのGemini 1.5といった他のベースモデルに匹敵する性能を達成した汎用ベースモデルだ。
DeepSeek R1, which was released at the end of January, is a reasoning model built on top of DeepSeek V3.

1月末にリリースされたDeepSeek R1は、DeepSeek V3の上に構築された推論モデルである。
In other words, DeepSeek took V3 and applied various algorithmic improvements to it in order to optimize its reasoning ability, resulting in R1, a model that achieves comparable performance to OpenAI's O1 and Google Flash 2.0 on certain complex reasoning benchmarks.

言い換えれば、DeepSeekはV3を、その推論能力を最適化するために様々なアルゴリズムの改良を施し、その結果、特定の複雑な推論ベンチマークでOpenAIのO1やGoogle Flash 2.0に匹敵する性能を達成したモデル、R1を生み出した。
But many of the algorithmic innovations responsible for R1's remarkable performance were actually discussed in this past December V3 paper or even before that in DeepSeek's V2 paper, which was published in May 2024, or the DeepSeek math paper, which came out February 2024.

しかし、R1の驚異的なパフォーマンスを実現したアルゴリズムの革新の多くは、実はこの12月のV3論文、あるいはそれ以前にも、2024年5月に発表されたディープシークのV2論文や、2024年2月に発表されたディープシークの数学論文で議論されていた。
V3 stitches together many of these innovations, which were designed primarily with compute and training efficiency in mind.

V3は、主に計算とトレーニングの効率を念頭に置いて設計されたこれらの革新的な技術の多くをつなぎ合わせている。
One way DeepSeek optimized for efficiency and got more floating-point operations per second, or FLOPs, from the GPUs by training V3 natively in 8-bit floating-point formats, rather than the usual 16-bit or 32-bit formats.

DeepSeek が効率を最適化し、V3 を通常の 16 ビットや 32 ビット形式ではなく、8 ビット浮動小数点形式でネイティブにトレーニングすることで、GPU から 1 秒あたりにより多くの浮動小数点演算（FLOP）を引き出す方法の 1 つです。
This is not a new idea.

これは新しいアイデアではない。
Many other labs are doing it too.

他の多くのラボもやっている。
But it was key for getting such massive memory savings without sacrificing performance.

しかし、パフォーマンスを犠牲にすることなく、これほど大規模なメモリ節約を実現するための鍵だった。
A crucial enhancement is their FP8 accumulation fix, which periodically merges calculations back into a higher-precision FP32 accumulator to prevent small numerical errors from compounding.

重要な強化点は、FP8アキュムレーション修正である。これは、定期的に計算をより高精度のFP32アキュムレーターにマージし、小さな数値誤差が複合化するのを防ぐものである。
The result?

結果は？
Far more efficient training across thousands of GPUs, cutting costs while maintaining model quality.

何千ものGPUにまたがるトレーニングをはるかに効率化し、モデルの品質を維持しながらコストを削減。
But why does this efficiency matter?

しかし、なぜこの効率が重要なのか？
Given its hardware constraints and U.S. exports controls on the sale of GPUs to China, DeepSeek needed to find a way to get more training and more bandwidth from their existing cluster of GPUs.

ハードウェアの制約と、GPUの中国への販売に関する米国の輸出規制を考慮すると、ディープシークは既存のGPUクラスタからより多くのトレーニングと帯域幅を得る方法を見つける必要があった。
You see, at AI labs, these GPUs, which do number crunching and matrix multiplication to train these models, are actually sitting idle most of the time.

AIラボでは、モデルを訓練するために数値計算や行列の乗算を行うGPUは、実際にはほとんどの時間アイドル状態になっている。
At FP8, it is typical to only see around 35% model FLOPs utilization, or MFU, meaning GPUs are only being utilized at peak potential about a third of the time.

FP8では、モデルFLOPs利用率（MFU）が35％程度にとどまるのが一般的で、GPUがピーク時のポテンシャルで利用されているのは全体の3分の1程度であることを意味する。
The rest of the time, these GPUs are waiting for data to be moved, either between caches or other GPUs.

それ以外の時間は、これらのGPUはキャッシュや他のGPU間でデータが移動されるのを待っている。
This is NVIDIA's key advantage.

これはエヌビディアの重要な強みである。
It is not just about GPUs.

GPUだけの問題ではない。
It is about an integrated solution they've been building for over a decade that includes the networking with InfiniBand, software with CUDA, and developer experience.

InfiniBandによるネットワーキング、CUDAによるソフトウェア、そして開発者エクスペリエンスを含む、彼らが10年以上にわたって構築してきた統合ソリューションのことだ。
Essentially, NVIDIA provides a deeply integrated system that lets AI researchers program GPU clusters, less as a distributed system, and closer to what Jensen describes as one giant GPU.

基本的に、エヌビディアは、AI研究者がGPUクラスターを分散システムとしてではなく、ジェンセンの言う1つの巨大なGPUに近い形でプログラムできるよう、深く統合されたシステムを提供する。
Another clever way DeepSeek makes the most out of their hard work is their particular implementation of a mixture of experts' architecture.

DeepSeekがそのハードワークを最大限に活用するもうひとつの賢い方法は、専門家の混合アーキテクチャーを特別に実装していることだ。
DeepSeek v3 has 671 billion model parameters, but only 37 billion are activated for a given token prediction.

DeepSeek v3には6,710億個のモデルパラメータがあるが、あるトークンの予測で有効化されるのは370億個に過ぎない。
By contrast, the largest and most capable Lama3 model doesn't use a mixture of expert architecture, so it activates its full 405 billion for each token prediction.

対照的に、最大かつ最も有能なLama3モデルは、エキスパート・アーキテクチャの混合を使用していないため、各トークンの予測に4050億をフルに活用する。
In other words, v3 activates 11x fewer parameters for each forward pass, saving tons of computation.

言い換えれば、v3はフォワードパスごとに11倍少ないパラメーターを作動させ、膨大な計算を節約している。
Mixture of experts isn't a new concept, but it's been challenging to train models with this architecture efficiently.

エキスパートの混合は新しい概念ではないが、このアーキテクチャでモデルを効率的に訓練するのは困難だった。
DeepSeek introduced novel techniques that stabilize performance and increase GPU utilization.

DeepSeekは、パフォーマンスを安定させ、GPUの利用率を高める新しい技術を導入した。
Additionally, to overcome key performance bottlenecks, v3 makes use of multi-head-related attention, or MLA, which DeepSeek first revealed with its v2 paper, which was published in May 2024.

さらに、主要な性能ボトルネックを克服するため、v3ではマルチヘッド関連アテンション（MLA）を利用している。これは、ディープシークが2024年5月に発表したv2論文で初めて明らかにしたものだ。
MLA is a solution designed to tackle KV cache storage limitation, one of the biggest sources of VRAM overhead in large models.

MLAは、KVキャッシュ・ストレージの制限に取り組むために設計されたソリューションであり、大規模モデルにおけるVRAMオーバーヘッドの最大の原因の1つである。
Instead of storing full key and value matrices, MLA manages to compress them down into a latent representation, reconstructing them only when needed.

MLAは、完全なキーと値の行列を保存する代わりに、それらを潜在的な表現に圧縮し、必要なときだけ再構成することに成功している。
This helped the v2 model reduce its KV cache size by 93.3% and boosted its maximum generation throughput to 5.76 times.

これにより、v2モデルはKVキャッシュサイズを93.3％削減し、最大世代スループットを5.76倍に向上させた。
Finally, unlike traditional models that predict only the next token, v3 makes use of multi-token prediction, or MTP.

最後に、次のトークンのみを予測する従来のモデルとは異なり、v3はマルチトークン予測（MTP）を利用する。
MTP enables v3 to anticipate multiple future tokens at each step.

MTPは、v3が各ステップで複数の将来のトークンを予測することを可能にする。
This densifies training signals, providing more feedback per step for better data efficiency and faster learning.

これにより、トレーニング信号の密度が濃くなり、1ステップあたりにより多くのフィードバックが得られるため、データ効率が向上し、学習速度が速くなる。
It also improves representation planning, allowing the model to pre-plan sequences for smoother, more coherent outputs.

また、表現プランニングも改善され、よりスムーズで首尾一貫したアウトプットのために、モデルがシーケンスを事前にプランニングできるようになる。
During inference, MTP modules can be repurposed for speculative decoding, reducing sequential processing steps and significantly speeding up generation.

推論中、MTPモジュールは投機的解読に再利用することができ、逐次処理ステップを減らし、生成を大幅に高速化する。
Taken all together, this makes v3 one of the most impressive base models on the market, and it's been out for some time now.

これらを総合すると、v3は市場で最も印象的なベースモデルのひとつとなる。
However, the recent release of DeepSeq's R1 reasoning model is what really made waves.

しかし、最近発表されたDeepSeqのR1推論モデルは、大きな話題となった。
Most LLMs can be improved by being prompted to think step-by-step, but what sets reasoning models apart is that they are specifically trained to break down hard problems and think about them for paragraphs at a time.

ほとんどのLLMは、ステップバイステップで考えるように促されることで改善できるが、推論モデルが他と違うのは、難しい問題を分解し、段落ごとに考えるように特別に訓練されていることだ。
In September, OpenAI showed the power of this new approach with O1.

9月、OpenAIはO1でこの新しいアプローチの威力を示した。
This achieves state-of-the-art results in math, coding, and science benchmarks.

これにより、数学、コーディング、科学の各ベンチマークにおいて、最先端の結果を達成している。
With R1, DeepSeq took a similar approach and published the secret sauce.

R1でDeepSeqは同様のアプローチをとり、秘密のソースを公開した。
OpenAI and DeepSeq achieved their impressive results through reinforcement learning, a technique to shape an LLM's behavior based on feedback and reward signals.

OpenAIとDeepSeqは、フィードバックと報酬信号に基づいてLLMの行動を形成する技術である強化学習によって、素晴らしい結果を達成した。
Modern LLMs use some variation of reinforcement learning with human feedback, aka RLHF, or reinforcement learning from AI feedback, aka RLAIF, to improve their model's usefulness and alignment.

現代のLLMは、人間のフィードバックによる強化学習（別名RLHF）、あるいはAIのフィードバックによる強化学習（別名RLAIF）のバリエーションを用いて、モデルの有用性と整合性を高めている。
But reasoning models apply RL specifically towards the task of thinking step-by-step through complex problems.

しかし、推論モデルは、複雑な問題を段階的に考えるというタスクに特化してRLを適用している。
So how did DeepSeq apply RL to get a reasoning model?

では、DeepSeqはどのようにRLを適用して推論モデルを得たのか？
At a high level, they assemble a bunch of problems with verifiable outputs, especially in math and coding problems, and then design a training pipeline to get the model to think for a bit and output the correct answers.

高度なレベルでは、特に数学やコーディングの問題で、検証可能な出力を持つ問題をたくさん組み立て、トレーニングパイプラインを設計して、モデルに少し考えさせて正しい答えを出力させる。
But they don't give the model any external examples of how to think, whether from humans or AI.

しかし、人間であれAIであれ、思考方法の外部的な例をモデルには与えない。
And their grading process was extremely simple.

そして、彼らの採点プロセスは極めてシンプルだった。
Rather than using a complex AI to give the model fine-grained feedback, DeepSeq uses simple rules to evaluate the model's final output on accuracy and formatting.

DeepSeqは、複雑なAIを使ってモデルにきめ細かなフィードバックを与えるのではなく、単純なルールでモデルの最終的な出力を精度とフォーマットで評価する。
They use these output scores to update their model through a novel technique they published in February 2024 called Group Relative Policy Optimization, or GRPO.

彼らは2024年2月に発表したGRPO（Group Relative Policy Optimization）と呼ばれる新しい手法によって、これらの出力スコアを使用してモデルを更新する。
Remarkably, with this process alone, DeepSeq saw reasoning emerge over thousands of RL steps.

驚くべきことに、このプロセスだけで、DeepSeqは何千ものRLステップにわたって推論を展開した。
The model learned skills like extended chain of thought and even experienced an aha moment where it recognized its own mistakes and backtracked to correct its reasoning.

このモデルは、思考の連鎖のようなスキルを学び、自らの間違いに気づき、推論を修正するために後戻りするというハッとする瞬間さえ経験した。
This model was R1-0, one of the first large models to achieve top-tier results purely through reinforcement learning.

このモデルはR1-0で、純粋に強化学習によってトップクラスの結果を達成した最初の大型モデルのひとつである。
Pure RL has long been a subject of investigation in Western research labs, such as DeepMind's AlphaGo, which simulated thousands of random games of self-play to beat the world's top Go player in 2016.

ディープマインドのアルファ碁は、何千ものランダムな自己対戦をシミュレートし、2016年に世界トップの囲碁棋士を打ち負かした。
In 2019, OpenAI achieved notable success using reinforcement learning to train a robotics hand to solve a Rubik's Cube and beat a top human team in competitive Dota 2.

2019年、OpenAIは強化学習を使ってロボットハンドを訓練し、ルービックキューブを解かせ、競技Dota 2で人間のトップチームを打ち負かすという注目すべき成功を収めた。
But unconstrained by human examples, R1-0's thinking steps suffered from poor readability, switching between English and Chinese at random.

しかし、R1-0の思考ステップは、英語と中国語が不規則に切り替わり、可読性が低い。
So DeepSeq introduced a cold start phase, fine-tuning unstructured reasoning examples before RL to get R1.

そこでDeepSeqはコールドスタート段階を導入し、RLの前に構造化されていない推論例を微調整してR1を取得するようにした。
This eliminated the language mixing issues and made outputs far more comprehensible.

これにより、言語の混在の問題はなくなり、出力ははるかに理解しやすくなった。
The results are impressive.

結果は印象的だ。
R1 achieves comparable performance to R1 on certain math and coding benchmarks.

R1は、特定の数学とコーディングのベンチマークでR1と同等の性能を達成した。
But the pace of innovation is speeding up.

しかし、技術革新のスピードは加速している。
Just two weeks after R1 was released, OpenAI released R3 Mini, which outperforms R1 on key benchmarks.

R1がリリースされてからわずか2週間後、OpenAIは主要ベンチマークでR1を上回るR3 Miniをリリースした。
So if R1 didn't actually come out of nowhere, what explains the hype cycle?

では、R1が実際に突然現れたのではないとしたら、このハイプ・サイクルはどう説明されるのだろうか？
One explanation is the sheer accessibility of DeepSeq's model.

その理由のひとつは、DeepSeqのモデルへのアクセスのしやすさである。
R1 is freely accessible through their website and app, and it is free to download, run locally, and customize.

R1はウェブサイトやアプリから自由にアクセスでき、ダウンロードもローカルでの実行もカスタマイズも無料だ。
Also, because of all the efficiency improvements, it offers near state-of-the-art performance at a price of other reasoning models.

また、すべての効率を改善したため、他の推論モデルの価格で最先端の性能に近いものを提供している。
Another explanation is that a lot of the hype cycle didn't actually have to do with the specific algorithmic improvements that we described, but with misconceptions around V3's alleged $5.5 million in training costs.

もうひとつの説明は、ハイプ・サイクルの多くが、我々が説明したような具体的なアルゴリズムの改善とは関係なく、V3のトレーニング・コスト550万ドル（約6億円）疑惑にまつわる誤解だったということだ。
There's some important fine print here.

ここには重要な細則がある。
The $5.5 million figure refers only to the cost of the final training run for V3.

この550万ドルという数字は、V3の最終トレーニングにかかる費用のみを指している。
It doesn't include any of the training costs of R1 or the associated R&D or hardware operating expenses, which are presumably in the hundreds of millions.

これには、R1のトレーニング費用や関連する研究開発費、ハードウェアの運営費などは含まれていない。
Given the extreme algorithmic optimizations here, that $5.5 million training run number actually seems perfectly possible.

アルゴリズムが極限まで最適化されていることを考えれば、この550万ドルというトレーニング実行回数は、実際には完全に可能だと思われる。
And it is worth noting that this work is reproducible.

そして、この仕事が再現可能であることは注目に値する。
A UC Berkeley lab recently applied R1-0's key techniques to produce complex reasoning in a smaller model for just $30.

カリフォルニア大学バークレー校の研究室は最近、R1-0の主要技術を応用して、複雑な推論をわずか30ドルの小型モデルで実現した。
What DeepSeq really proves is that there is still room for new players on the frontier.

DeepSeqが本当に証明しているのは、フロンティアにはまだ新しいプレーヤーの余地があるということだ。
In particular, there's room for rebuilding the stack for optimizing GPU workloads, improving software at inference layer tooling, and developing AI generated kernels.

特に、GPUワークロードを最適化するためのスタックの再構築、推論レイヤーのツーリングにおけるソフトウェアの改善、AI生成カーネルの開発の余地がある。
Ultimately, this is fantastic news for AI applications in consumer or B2B, since it means the cost of intelligence keeps going down.

結局のところ、これは消費者向けあるいはB2B向けのAIアプリケーションにとって素晴らしいニュースだ。
So the big takeaway here, this is the best possible time to be building a startup.

つまり、今が新興企業を立ち上げるのに最適な時期だということだ。11.
If you're accepted, you'll receive $500,000 in investment plus access to the best startup community in the world.

合格すれば、50万ドルの投資と世界最高のスタートアップ・コミュニティへのアクセスが与えられる。
So apply now and come build the future with us.

今すぐ応募して、私たちと一緒に未来を築きましょう。

DeepSeekを支えるエンジニアリングの鍵｜YC Decoded (The Engineering Unlocks Behind DeepSeek | YC Decoded)