Nvidia’s 1,000-watt chips will be hot – who can cool them?

  • Cooling insanely hot chips coming down the pike is a big challenge for data center operators

  • Liquid cooling is going to be key, according to a Dell'Oro analyst

  • Icetope is taking a precision liquid cooling route

Dell’s COO really stepped in it during the company’s recent earnings call, accidentally spilling the beans about Nvidia’s plans to release a chip next year called the B200 – an ultra GPU which is expected to consume somewhere in the realm of 1,000 watts per GPU. But perhaps more important than the leak itself are the questions it raised.

One in particular stands out: how will data center operators cool these insanely hot chips?

Dell’Oro Group Research Director Lucas Beran told Silverlinings that as far as he’s concerned, some kind of liquid cooling will basically be a requirement. While you could in theory deploy one or two of these chips with just air cooling, he said that would likely limit performance.

Beran’s assertion flies in the face of Dell COO Jeff Clarke’s claims that “you really don’t need direct liquid cooling to get to the energy density of 1,000 watts per GPU.”

Beran said Clarke could have been referring to the ability to use liquid-assisted air cooling (like the system JetCool offers), but he insisted that in order to deploy 1,000+ watt chips efficiently and sustainably at scale, liquid cooling will be the way to go.

We’ve written plenty about the expected rise of liquid cooling in response to the proliferation of high-power GPUs. So, you already know about direct-to-chip and immersion cooling (right? RIGHT?!). But it seems there’s another option on the scene – Iceotope’s precision liquid cooling tech.

New kid on the iceblock

The company’s solution is kind of like direct liquid cooling on steroids.

As Iceotope Chief Commercial Officer Nathan Blom explained, the company took a systemic approach to direct liquid cooling, making sure each potential hotspot – CPUs, GPUs, SSDs and power supply components – is hit by the coolest possible liquid.

A recent whitepaper released by the company showed that Iceotope’s system was able to effectively cool chips up to 1,000 watts. Blom said Iceotope has already successfully tested well beyond that power level in its labs. And as far as he can see in terms of future generations of chips, “there’s nothing on the roadmap that can’t be supported” by Iceotope’s technology as it stands today.

The company’s approach is already gaining traction – both with cloud providers and telecoms alike. Blom noted it has deployed in test facilities with “sizable companies, hyperscalers,” though he couldn’t share more because these are covered by non-disclosure agreements. He added Iceotope’s technology was featured in Intel, HPE, Samsung and Vodafone’s booths at Mobile World Congress last month.

“The whole industry, up until very recently, has been in testing and proof-of-concept areas and I think we’re seeing that shift now to production level,” Blom said. “So, I think late 2024 and 2025 are going to be massive.”

Cooling doubts

Pressed on how quickly direct liquid cooling (or precision liquid cooling, in Iceotope’s case) will hit its limit, Blom said “The upper limit in liquid cooling will be dictated by the liquid," — i.e. the cooling properties of the liquid used.

He added a lot of companies are working to develop new dielectric fluids (i.e. those that don’t conduct electricity) with greater cooling capacity.

Indeed, Beran said Iceotope is far from the only single-phase liquid cooling company working to meet the need to cool higher-power chips.

He added there are two primary ways to increase cooling capacity: lowering the temperature of the fluid used in the system or increasing the flow rate.

In Iceotope’s case, Beran said they seemed to have gone for a higher flow rate in the short term. The aforementioned whitepaper, for instance, indicated Iceotope used a flow rate of 7 liters per minute. That compares to the usual one to two liters per minute that most other direct liquid cooling systems use.

Why does the flow rate matter? Well, Beran said higher flow rates can increase wear and tear, potentially damaging or eroding cold plates or other cooling system components faster. So, at a certain point, he said, it might make more sense to just go with a two-phase liquid cooling system (one in which the liquid changes to gas to transfer heat before being condensed back into a liquid) rather than a strained single phase one.

“That’s why it’s super important to single-phase liquid cooling technologies to have innovations in fluid, innovations in coldplate technologies or manufacturing to keep up with the demands of these next generations of processors, because it doesn’t seem like the TDPs are going to stop going up incredibly fast anytime soon,” Beran concluded.