DeepSeek Just Broke the CUDA Monopoly (Jensen Huang Saw It Coming)

DeepSeek V4 optimized for Huawei's Ascend chips challenges Nvidia's CUDA dominance. The US-China AI gap is closing as two tech stacks emerge.

M

MantraVid Admin

April 27, 2026

1 min read4 views
Share:
deepseek-jensen

DeepSeek Just Broke the CUDA Monopoly (And Jensen Huang Saw It Coming)

Jensen Huang is not a man known for understatement. But when he said on the Dwarkesh Patel podcast that DeepSeek optimising for Huawei first would be "a horrible outcome for our nation," you should have listened.

Because it's happening. And it's not even that dramatic.

The Warning That Nobody Heard

On April 15, 2026, Nvidia's CEO dropped a bombshell. His exact words: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for our nation."

Jense also said “If the next years are critical, then we have to make sure that all of the world's AI models are built on the American tech stack”

I would argue a slight alternative “We have to make sure that all of the worlds AI models are built on the best tech stack, and America should build that stack.

Huang's concern wasn't just geopolitical, but also technical. He pointed out that if "future AI models are optimized in a very different way than the American tech stack," and as China's standards diffuse worldwide, it "will become superior to" the United States.

Hong Kong media outlet HK01 correctly identified what most people missed: Nvidia's true "moat" isn't the raw performance of its GPUs. It's the CUDA software ecosystem. CUDA has been the de facto standard for AI development for two decades. It's a self-reinforcing cycle—developers write for CUDA, CUDA gets better, developers write more for CUDA.

If DeepSeek were to first optimize its flagship model on Huawei's CANN framework rather than CUDA, it would break that cycle. And a single chip replacement cannot match that kind of strategic damage.

DeepSeek V4: The Model Behind the Headlines

On April 24, 2026, DeepSeek launched previews of two new models. Both are released under an MIT open-source licence. Not a restrictive one. An MIT one.

Model

Total Parameters

Activated Parameters

Context Window

V4‑Pro

1.6 trillion

49 billion

1 million tokens

V4‑Flash

284 billion

13 billion

1 million tokens

Both are specifically optimised for agentic coding and complex multi‑step tasks.

The Benchmarks

DeepSeek claims V4‑Pro is competitive with leading closed‑source models such as GPT‑5.4, Gemini 3.1 and Claude Opus 4.6/4.7. Third‑party evaluations are mixed but largely positive:

  • Arena.ai: V4‑Pro (thinking mode) ranks 5th among open‑source models in the coding arena and 3rd overall amongst other opensource models. Described as "a major leap over V3.2."

  • Vals AI: V4‑Pro takes 1st place among open‑weight models on its Vibe Code Benchmark, outperforming even some closed models such as Gemini 3.1 Pro, with a ~10× improvement over V3.2.

DeepSeek's own official documentation is measured. It acknowledges a remaining gap of about 3–6 months behind the very best closed‑source systems. Which is to say: close enough to matter.

The Pricing That Breaks Everything

DeepSeek is pursuing an aggressive cost‑leadership strategy. Here's what they're charging:

Model

Input (per 1M tokens)

Output (per 1M tokens)

Comparison

V4‑Pro

$1.74

$3.48

~15% of GPT‑5.5

V4‑Flash

$0.14

$0.28

>99% cheaper than Claude Opus 4.7

GPT‑5.5

$5.00

$30.00

Claude Opus 4.7

$5.00

$25.00

China's Stockstar notes that this pricing, combined with domestic chip adaptation, represents an order‑of‑magnitude cost advantage in core inference. DeepSeek has since applied a temporary 75% promotional discount on V4‑Pro, further lowering the input price to $0.03 per million tokens—roughly 1/700th the weighted price of GPT‑5.5 Pro.

They're not just competing. They're pricing themselves like they're trying to put everyone else out of business.

Huawei's Ascend Hardware: Closing the Gap

Huawei's new Ascend 950PR chip, which powers the Atlas 350 AI accelerator, was announced in March 2026 and is now being produced at scale.

Specification

Atlas 350 (Ascend 950PR)

Nvidia H20

FP4 Throughput

1.56 PFLOPS

N/A (Hopper lacks native FP4)

Memory

112 GB HBM (Huawei HiBL 1.0)

Varies

Memory Bandwidth

Up to 1.4 TB/s

Varies

Power

600 W

~400 W

Interconnect

2 TB/s (LingQu protocol)

The Atlas 350 is designed primarily for inference workloads and is the first Chinese accelerator to support native FP4 precision—a format Nvidia introduced only with the Blackwell generation. Because FP4 reduces memory requirements, it enables larger models to be deployed on the same hardware, boosting throughput in real‑world conditions.

Huawei announced that its full Ascend SuperNode product line—including A2, A3 and 950 series chips—is fully adapted to DeepSeek V4, with support for both Flash and Pro variants. The collaboration involved rewriting the underlying code from CUDA to Huawei's proprietary CANN framework.

DeepSeek acknowledged that V4 may face throughput issues until the Ascend 950PR supernodes "ship at scale" in the second half of 2026. This reflects the practical challenges of migrating from a mature CUDA‑based ecosystem to a still‑developing CANN stack.

The commercial logic is compelling. Reports indicate that the Ascend 950PR is priced at roughly one‑third to one‑quarter the cost of an Nvidia H200, while delivering roughly half the absolute compute, yielding highly favourable price‑performance for inference. Major Chinese technology firms: Alibaba, ByteDance and Tencen, have already placed orders for hundreds of thousands of Ascend chips.

US Export Controls: Policy Uncertainty

The US policy on AI chip exports to China remains fluid and unpredictable:

  • January 2026: The US government approved limited exports of Nvidia's H200 chips to China, but subject to stringent licensing requirements, third‑party testing and end‑use restrictions.

  • March 2026: Reports indicated Nvidia had halted production of H200 chips destined for China and redirected TSMC capacity to its next‑generation Vera Rubin platform.

  • April 2026: US Commerce Secretary Howard Lutnick confirmed that no H200 chips have yet been sold to China, citing approval hurdles on both sides and Chinese policy focused on domestically developed alternatives.

Pending legislation: The US House Foreign Affairs Committee advanced the MATCH Act, which would extend controls to non‑US suppliers and potentially require embedded tracking capabilities in advanced chips.

Chinese analysts point to distrust driven by the instability of US export rules and security concerns, including reported "backdoor" risks in Nvidia chips, as factors stalling purchases. Nvidia itself warned that sales conditions must be "commercially practical" or the market will continue to shift to foreign alternatives.

The US - China AI Gap Narrows to Near Zero

Stanford University's 2026 AI Index Report, published in April 2026, finds that the performance gap between the best US and Chinese AI models has effectively closed.

The Elo score gap between the leading US model (Claude Opus 4.6 at 1503) and the top Chinese model stood at just 2.7% as of March 2026, down from 17.5% in May 2023.

Chinese and US models have repeatedly swapped the top spot in rankings since early 2025.

Other indicators:

Metric

US

China

AI Patents

74% of global total

Research Publications & Citations

World leader

Industrial Robot Installations

54% of global total

Data Centers

5,427 (10× any other country)

AI Investment

$650 billion (mega‑cap hyperscalers, 2026E)

Alibaba $53 billion over 3 years

AI Talent

Historical leader, but inflow declining 89% since 2017

Growing domestic pool

US and China have structural asymmetries:

The US retains an edge in foundational model innovation and capital deployment

China leads in research output, industrial deployment, and cost‑efficient model development.

Implications and Strategic Assessment

For Nvidia

Nvidia faces a multi‑faceted threat:

Ecosystem risk: The migration of a flagship opensource model from CUDA to CANN represents the first public counter‑example to the assumption that frontier models must target CUDA first. This could catalyze a gradual loss of developer mindshare.

Market access risk: Even when export licences are granted, security concerns, policy unpredictability and competition from domestic alternatives are limiting actual sales.

Long‑term structural risk: If China successfully builds a parallel, lower‑cost AI hardware‑software stack, it could set global standards and erode Nvidia's pricing power in the inference market.

For China

The DeepSeek‑Huawei collaboration signals China's progress toward AI self‑sufficiency, but challenges remain:

  • The CANN ecosystem is still maturing; toolchain support and developer experience lag behind CUDA.

  • Ascend chips remain behind Nvidia's most advanced products in absolute performance, especially for large‑scale training.

  • Throughput constraints persist until manufacturing scales further.

For the Global AI Landscape

A potential bifurcation into two parallel AI technology stacks—one anchored on CUDA and US‑centric models, the other on CANN and Chinese models, this would have profound consequences:

Diffusion of standards: As Huang warned, AI models optimized for Chinese hardware could spread globally, particularly in developing economies seeking affordable AI infrastructure.

Cost re‑benchmarking: DeepSeek's aggressive pricing (50× cheaper than comparable closed models) resets expectations for the entire industry.

Geopolitical dimension: AI becomes not just a technology race but a contest over whose infrastructure and standards will underpin global AI deployment. Brookings notes that China's strategy focuses not on achieving AGI, but on integrating AI broadly into manufacturing, healthcare and government services, a path that may be less compute‑intensive and more scalable with domestic hardware.

Conclusion

Jensen Huang's "horrible outcome" warning accurately identifies the stakes. DeepSeek V4 running on Huawei's Ascend chips represents the first concrete demonstration that a top‑tier open‑source model can be optimised outside the Nvidia CUDA ecosystem.

While the US still maintains advantages in capital, foundational research and the most advanced training hardware, the cumulative effect of narrowing model performance, cost disruption and ecosystem migration suggests that the center of gravity in AI is shifting.

The coming years will likely see an acceleration of the two‑stack world that Huang fears—with far‑reaching consequences for industry structure, national competitiveness and global technology governance.

Related Posts