DeepSeek's surprisingly inexpensive AI model, DeepSeek V3, is shaking up the AI industry and causing significant ripples, notably a major stock price drop for NVIDIA. While DeepSeek initially claimed a training cost of only $6 million using 2048 GPUs, a closer look reveals a far more substantial investment.
Image: ensigame.com
DeepSeek V3's innovative architecture is key to its performance. It utilizes:
- Multi-token Prediction (MTP): Predicting multiple words simultaneously for improved speed and accuracy.
- Mixture of Experts (MoE): Employing 256 neural networks, activating eight for each token, boosting training speed and performance.
- Multi-head Latent Attention (MLA): Repeatedly extracting key information from text fragments to minimize the risk of overlooking crucial details.
Image: ensigame.com
However, SemiAnalysis uncovered DeepSeek's use of approximately 50,000 Nvidia Hopper GPUs (including 10,000 H800, 10,000 H100, and additional H20 GPUs) spread across multiple data centers. This massive infrastructure represents a total server investment of roughly $1.6 billion, with operational costs estimated at $944 million. DeepSeek, a subsidiary of High-Flyer, a Chinese hedge fund, owns these data centers, providing unparalleled control and innovation speed. The company's self-funded status further enhances its agility.
Image: ensigame.com
DeepSeek's high salaries (some researchers earn over $1.3 million annually), attracting top Chinese talent, further contribute to its success. The initial $6 million figure only reflects pre-training GPU costs, omitting research, refinement, data processing, and infrastructure. DeepSeek's actual investment in AI development exceeds $500 million. Despite this, its lean structure enables efficient innovation compared to larger, more bureaucratic competitors.
Image: ensigame.com
DeepSeek's story highlights the potential of well-funded, independent AI companies to compete with established giants. While the "budget-friendly" narrative is somewhat inflated, the cost advantage compared to competitors (e.g., DeepSeek's $5 million for R1 versus ChatGPT's $100 million for ChatGPT4o) remains significant. The company's success ultimately stems from substantial investment, technological advancements, and a highly skilled team.