China’s Plan for Winning the AI Race Hinges on the Token Economy, Not Chips |
Pacific Money | Economy | East Asia
China’s Plan for Winning the AI Race Hinges on the Token Economy, Not Chips
China is translating the old concept of “encircling the cities from the countryside” for the digital age.
On its surface, the U.S. chip sanctions regime appears to have locked in an American victory in the AI race. As of late 2025, the best U.S. AI chips were roughly five times more powerful than China’s leading chips; according to one analysis, that gap is projected to widen to 17 times by the second half of 2027. Yet this single-axis framing sits in striking tension with the assessment offered by U.S. industry leaders themselves. Testifying before the U.S. Senate Commerce Committee in May 2025, AMD CEO Lisa Su stated explicitly that maintaining the U.S. competitive edge in AI innovation “actually requires excellence at every layer of the stack.” AI competitiveness, in other words, is a multi-layered system spanning silicon, software, models, energy, and ecosystems – and a chokepoint at any single layer is insufficient to secure the whole.
China’s response operates precisely on this multi-layered logic: rather than confront the American chip fortress head-on, it circumvents it – replicating the strategy of “encircling the cities from the countryside” that it has already deployed successfully in solar panels and consumer electronics, among other sectors. The logic is straightforward: forgo a frontal assault on the high-end market, and instead penetrate the global mid-to-low-end application market through algorithmic efficiency, energy advantage, and aggressive pricing, until scale dynamics begin to compress the high-end fortress in reverse.
The Three Layers of Chinese Advantage
The central terrain of this contest is what Jensen Huang, Nvidia’s CEO, termed token factory economics – a metric cluster anchored on tokens per watt and complemented by cost per token. Speaking at NVIDIA GTC 2026, Huang framed AI factories as fundamentally power-constrained systems: capacity does not scale with demand, so efficiency becomes decisive, and tokens per watt, token speed, and cost per token emerge as the core metrics. Across both of these metrics, China is now constructing a structural advantage.
At the algorithmic level, Chinese companies can glean more tokens from fewer chips. DeepSeek reportedly trained its V3 model for $6 million – compared to roughly $100 million for OpenAI’s GPT-4 – using approximately one-tenth of the compute consumed by Meta’s comparable LLaMA 3.1 model. The Mixture-of-Experts (MoE) architecture allows Chinese developers to compensate for their generational silicon disadvantage with structural efficiency.
At the hardware level, Chinese domestic chips are now rapidly closing the gap with the H20, Nvidia’s China-specific export variant. According to research by Guosen Securities, Baidu’s third-generation Kunlun P800 chip reaches roughly 345 TFLOPS at FP16, on par with Nvidia’s A100, with interconnect bandwidth approaching that of the H20. In September 2025, Alibaba T-Head’s Parallel Processing Unit (PPU) accelerator was demonstrated on Chinese state television as performing on par with the H20; China Unicom has since deployed over 16,000 PPUs at its Qinghai data center. Crucially, on the cost dimension, the PPU’s domestic 7nm process and 2.5D packaging make a single card 40 percent cheaper than the imported H20. Together, these developments are reshaping the competitive landscape on both tokens per watt and cost per token simultaneously.
China is also driving down cost per token through its energy strategy. By the end of 2025, China’s installed power generation capacity reached 3.89 billion kilowatts, with wind and solar contributing 1.84 billion kW – 47.3 percent of the total. Chinese electricity costs run 30-50 percent below those in the United States. Changjiang Securities has gone so far as to characterize tokens as a “power derivative,” noting that electricity accounts for 60–70 percent of large-model operating costs. Tokens, in effect, allow China to export the economic value of its domestic electricity globally – without exporting a single kilowatt.
At the market level, pricing itself becomes the weapon. MiniMax M2.5 and Zhipu GLM-5 charge $0.30 per million input tokens on OpenRouter, compared with $5 for Anthropic’s Claude Opus 4.6 – roughly one-sixteenth the price. The true force of this differential, however, lies in the fact that it does not come at the expense of performance.
On SWE-Bench Verified – the industry’s gold-standard coding benchmark – MiniMax M2.5 scores 80.2 percent, trailing Claude Opus 4.6’s 80.8 percent by a mere 0.6 percentage points. Both models complete benchmark tasks in nearly identical time (22.8 minutes for M2.5 versus 22.9 minutes for Opus 4.6), yet the per-task cost differs by a factor of 20: roughly $0.15 for M2.5 against $3.00 for Opus 4.6. For a mid-sized engineering team, this translates into monthly costs of $225 versus $4,500 for substantively equivalent output.
To be analytically honest, this near-parity is concentrated in coding and agentic tool use; on pure mathematical reasoning (AIME) and abstract reasoning (ARC-AGI), the flagship models from OpenAI and Google retain clear leads. Yet the dimensions where China has reached parity – coding, document processing, office automation, customer-service agents – are precisely the most commercially salient enterprise workloads. Chinese tokens, in other words, are penetrating the global enterprise AI market by provided 99 percent of the capability at 5 percent of the price.
This is the essence of the “encircling the cities from the countryside” playbook: victory does not require producing the most advanced commodity, but producing a good-enough commodity at structurally lower cost until the opponent’s premium pricing model loses its market sustainability.
The strategic effect is already visible. According to OpenRouter, the world’s largest LLM API aggregation platform, Chinese models surpassed U.S. models in weekly token call volume for the first time during the week of February 9-15, 2026, reaching 4.12 trillion tokens against 2.94 trillion for U.S. models; the following week extended that lead to 5.16 trillion – a 127 percent increase in just three weeks. During the week of February 16-22, four of the top five most-used models on the platform were Chinese – MiniMax M2.5, Moonshot’s Kimi K2.5, Zhipu’s GLM-5, and DeepSeek V3.2 – collectively accounting for 85.7 percent of total top-five call volume. By February 24, Chinese models had captured 61 percent of OpenRouter’s........