Google Develops Four-Partner Chip Supply Chain to Compete with Nvidia in AI Inference

Summary: Google is creating the AI industry’s most diverse custom chip supply chain, with four design partners (Broadcom, MediaTek, Marvell, Intel) and a roadmap from the Ironwood TPU, currently shipping in millions, to TPU v8 chips at TSMC 2nm by late 2027. Detailed before Google Cloud Next, the strategy divides the next generation: Broadcom’s “Sunfish” for training, MediaTek’s “Zebrafish” for 20-30% lower-cost inference, with Marvell considering adding a memory processing unit and an additional inference TPU, positioning Google’s custom silicon as the primary challenge to Nvidia’s dominance in AI inference.

Google is assembling the most diverse custom chip supply chain in the AI sector, with four design partners, a fabrication relationship with TSMC, and a roadmap extending from today’s inference chips to 2-nanometre processors expected by late 2027. Highlighted in a Bloomberg feature before Google Cloud Next this week, Google’s silicon program is positioned as the main competitor to Nvidia in AI inference, the computing phase where models serve rather than learn.

Ironwood, Google’s seventh-generation TPU, is the first designed specifically for inference and ten times more powerful than TPU v5p, offering 192GB of HBM3E memory per chip with 7.2 terabytes per second of bandwidth, scaling to 9,216 liquid-cooled chips in a single superpod producing 42.5 FP8 exaflops. Ironwood is now accessible to Google Cloud customers. Google plans to produce millions this year, with Anthropic committing to up to one million TPUs and a rental deal with Meta.

The four-partner supply chain

Google’s chip program involves four design partners, each managing different product line segments.

Broadcom, under a long-term agreement from 6 April to 2031, supplies TPUs and networking components, handling high-performance chip variants. It’s designing the next-generation TPU v8 training chip, “Sunfish,” targeting TSMC’s 2-nanometre process for late 2027. Broadcom dominates over 70% of the custom AI accelerator market, projecting $100 billion in AI chip revenue by 2027.

MediaTek designs the cost-optimised TPU v8 inference variant, “Zebrafish,” also targeting TSMC 2nm for late 2027. MediaTek’s designs, starting with Ironwood’s I/O modules, run 20-30% cheaper. Broadcom builds the training chip, and MediaTek builds the inference chip, giving Google leverage by having each partner aware of the other’s involvement.

Marvell Technology is negotiating to develop a memory processing unit and new inference TPU, potentially becoming the third design partner if a contract is agreed. Google plans nearly two million memory processing units, with design finalisation expected next year. Marvell’s custom silicon business operates at a $1.5 billion annual rate with 18 cloud-provider design wins, and Nvidia invested $2 billion in March.

Intel joined on 9 April with a multi-year deal to supply Xeon processors and custom infrastructure processing units for Google’s AI data centre infrastructure, covering the networking and general-purpose compute layers around the TPUs rather than the AI accelerators.

TSMC fabricates all of Google’s custom silicon, serving as a structural relationship: every Google-designed chip, regardless of the designer, is processed through TSMC’s fabs.

Why inference changes the economics

The shift from training to inference as the major AI compute cost is Google’s strategic premise. Training is a singular, intensive event, while inference is continuous and scales with users, queries, and AI products. Google manages billions of AI-augmented search queries, Gemini conversations, and Cloud AI API calls daily. At this scale, inference cost determines the economics of AI business.

Nvidia’s GPUs are dominant in training workloads due to their programmability and CUDA software ecosystem, which create switching costs not easily replicated by custom chips. However, inference workloads are predictable, repetitive, and suitable for fixed-function optimisation that excels with custom silicon. An inference chip costing less per query than an Nvidia GPU wins on the critical metric at Google’s scale, even if it lacks the GPU’s versatility.

Google is investing in multiple