Hyperscalers want to replace Infiniband with Ethernet – Here’s why

  • Cisco, DriveNets and Arista are all working on Ethernet-based replacements for InfiniBand

  • The need to support massive amounts of AI workloads without congestion while also avoiding vendor lock-in is driving this trend

  • Analyst says its too early to predict how market share might shake out

If you haven’t heard the steady drumbeat of vendors talking up enhanced Ethernet as the networking technology of choice for AI workloads, allow us to turn up the volume for you. DriveNets has been one of the loudest in this space, but Arista and Cisco are also pushing the same agenda. And the latter just announced it has secured $500 million worth of AI orders its AI Ethernet fabrics from Tier 1 hyperscalers during its fiscal Q4 2023 earnings call.

But what exactly is ethernet supposed to be replacing and what’s driving demand?

As 650 Group co-founder and analyst Alan Weckel previously told Silverlinings, the industry has been using NVIDIA’s InfiniBand networking solution for early deployments of artificial intelligence (AI) and machine learning (ML) technology because of its ability to support massive scale. But InfiniBand comes with one major caveat – NVIDIA is the only major vendor, which creates lock-in.

So, hyperscalers are looking for alternatives to meet rapidly rising demand for AI workloads.

Quickly moving toward Ethernet

Rakesh Chopra, Cisco Fellow, told Silverlinings that the industry is “quickly moving towards Ethernet as the medium” of choice. But Ethernet has its own issues – namely that it can become congested as more jobs are added to the infrastructure. That’s why Cisco and DriveNets are working on versions of enhanced Ethernet.

“With the growth of AI and other bandwidth and power-hungry use cases, hyperscalers must look at adopting the most efficient technologies that run multiple simultaneous workloads,” Chopra explained.

“Fully scheduled and enhanced ethernet are ways to improve the performance of an ethernet based network and significantly reduce job completion time," he continued. "With enhanced Ethernet customers can reduce their job completion time by 1.57x, making their AI/ML jobs complete quicker and with less power.”

For its part, Cisco is doing this through the Ultra Ethernet Consortium, which also counts AMD, Arista, Broadcom, HPE, Intel, Meta and Microsoft among its members. (Though Cisco also offers a scheduled fabric solution today based on its Silicon One networking chip.)

Chopra said the Consortium is “in the process of creating the Ultra Ethernet Transport (UET) protocol, a new transport-layer protocol for Ethernet that will better address needs of AI workloads.” According to the group’s website, it expects the first products based on its standards to hit the market in 2024.

On Cisco’s recent earnings call, CEO Chuck Robbins said it expects to be trialing Ethernet networks for AI over the next 12 months or so. He added “So, we think, you know, into FY '25 and beyond, this thing will begin to shift to more of an Ethernet-based infrastructure.”

Weckel said the Tier 1 interest Cisco highlighted on its call “shows the urgency and immense market opportunity of AI Networking for existing vendors like Cisco and new vendors like DriveNets.”

“Hyperscalers are aggressively trialing products across all vendors to build training and inference clusters to support new workloads,” he continued. “While it is too early to project market share, we can expect a robust vendor ecosystem to support AI as every workload and hyperscaler is a new opportunity and won't have the same performance between all the vendor's offerings.”