Immersion Cooling Dunks Servers to Cut Power

Submerging computers in a synthetic nonconductive oil may prove far better at cooling them than other strategies

5 min read

Charles Q. Choi is a Contributing Editor for IEEE Spectrum.

Dina Genkina is the computing and hardware editor at IEEE Spectrum

Men in aprons, gloves, and face protection lower equipment into a reflective liquid.

A computer server is submerged in liquid at Sandia National Laboratories.

Craig Fritz

Electronics and fluids don’t generally mix. But teams from different corners of the globe are showing that immersing data-center gear in specialized fluids could be the best way to keep them cool.

Computers may fail if they get too hot, so they often use power-hungry fans to cool them down. Recently engineers have deployed ways to cool supercomputers by circulating water in pipes near processors. Fluids are far more dense than air, which makes them more efficient at drawing heat away from computers. This efficiency is increasingly important—a 2023 study finds that the energy required to keep the servers in data centers from overheating represents 30 to 40 percent of the total energy that data centers consume.

However, water cooling faces problems of its own. The water carrying heat from computers is typically piped to cooling towers. There its heat converts a separate supply of water into mist that evaporates into the atmosphere. In 2022, Google’s data centers consumed about 19 billion liters of freshwater for cooling.

Sandia researchers are testing out cooling computers by submerging them entirely in nonconductive oil.

Now, two separate results are putting a different technology on the map—immersion cooling, or dunking whole data centers in oil. The oil is nonconductive and noncorrosive, so that it can be in direct contact with electronics without short-circuiting or damaging them. The technology holds the potential to cut energy usage in half, says Oliver Curtis, co-CEO of immersion-cooled data center company Sustainable Metal Cloud.

“We’ve proven that you can get the same amount of performance, but for half the amount of energy, and if you can do that, it’s our social responsibility to proliferate this technology,” Curtis says.

Dunking an AI Factory

Yesterday, the MLPerf AI training competition announced a new benchmark—energy consumption. As the name suggests, it measures the power each submitting machine consumes when performing each of its other benchmarks, such as training a large language model or a recommendation engine. This new category had only one submitting organization, Singapore-based Sustainable Metal Cloud (SMC).

SMC was looking to show off the efficiency gains that result from its immersion-based cooling system. The system’s fluid is an oil called polyalphaolefin, which is a commonly used automotive lubricant. The oil is forced slowly through the dunked servers, allowing for efficient heat transfer.

The SMC team has figured out what modifications need to be made to servers to make them compatible with this cooling method over the long-term. Beyond removing the built-in fans, they switch out thermal interface materials that connect chips to their heat sinks, as some of those materials degrade in the oil. Curtis says the modifications they make are small but important to the functioning of their setup.

“What we’ve done there is we’ve created the perfect operating environment for a computer,” Curtis says. “There’s no dust, there’s no movement, no vibration, because there’s no fans. And it’s a perfect operating temperature.”

SMC’s systems, which it calls HyperCubes, consist of 12 or 16 oil tanks, each housing a server. The servers are connected to each other in between tanks through ordinary interconnects, looping out of the oil in one tank and into the adjacent tank. Curtis claims that this approach saves 20 to 30 percent of total energy usage at the server level.

In addition, SMC builds sitewide heat-exchange systems, one for each HyperCube. In a traditional data center, in addition to fans attached directly to servers, centralized air conditioning is needed to keep servers cool. Curtis says the system-level heat exchanger does the job of the A/C more efficiently, supplying a further 20 percent energy reduction.

SMC calls its combined HyperCubes and dedicated heat exchangers “AI Factories.” The company deployed its first HyperCube in Tasmania in 2019, and subsequently built and delivered more than 14 others in Australia. In 2022, SMC installed its first AI Factory in Singapore, accessible via the cloud for commercial use in Asia.

BenchmarkSMC Energy (kJ)
SMC Time to TrainBest Time to Train
Natural language processing 1,793 5.39 5.31 (Supermicro)
Recommender systems 1,266 3.84 3.84 (SMC)
GPT-3 1,676,757 56.87 50.73 (Nvidia)
Image recognition 7,757 2.55 2.49 (Oracle)
Object detection 21,493 6.31 6.08 (Nvidia)
Medical imaging 5,915 1.83 1.83 (SMC)

Because SMC was the only company to enter MLPerf’s new energy category, it is hard to validate its exact energy-saving claims. However, the performance of its platform on various benchmarks was on par with comparable competitors—that is, other systems that, like SMC, use Nvidia’s H100 GPUs in the same numbers. And its energy results are now out there as a gauntlet, thrown down for other companies to beat.

Researching Oil for the Chill

Separately, Sandia National Laboratories, in New Mexico, is testing immersion cooling with the aim of providing an independent, publicly available assessment. So far, immersion cooling “has a lot of advantages, and it’s really hard for me to see any disadvantages that would sway me to other technologies,” says Dave Martinez, engineering program project lead for Sandia’s infrastructure computing services.

The liquid Sandia is using is from Submer Technologies in Barcelona. It’s is a synthetic, biodegradable, nontoxic, nonflammable, noncorrosive fluid made using food-grade components. The fluid has 1/8th the electrical conductivity of air and is roughly the viscosity of cooking oil, Martinez says.

In tests, Sandia is placing entire computers—server racks and their power cables—in immersion tanks loaded with the fluid. This strategy aims to capture all of the heat the electronics generate to provide even cooling. The coolant gives up its heat to the open air, given the right difference in temperature.

According to Submer, its immersion cooling system is 95 percent more efficient than traditional cooling technologies. Martinez suggests it may cut energy consumption by 70 percent compared with standard methods. In addition, after the coolant absorbs heat, it can be used to warm buildings during winter months, he says.

When it comes to replacing a component—say, a chip on a board—a gantry system above the tank can lift out a server rack. “We just let it drip until there’s no oil left,” Martinez says. “We might have to clean it all up a tiny bit, not a whole lot. It is just one more step than a normal system. But my assumption is that the failure rate of these parts will go down a lot because the cooling is more effective than a fan-based system.”

In partnership with Albuquerque-based data company Adacen, Martinez and his colleagues began testing Submer’s fluid and equipment in May.

“Right now, we’re seeing a lot more pros than cons,” Martinez says. “It’s not just the energy saved, which is pretty tremendous. Without all the fans, there’s virtually no noise, too. You might not even know there’s a data center there.”

Sandia’s tests involve checking temperatures inside and outside the immersion tank, measuring the amount of energy that cooling requires, the reliability of the hardware, examining whether some coolant flow patterns work better than others, calculating infrastructure costs, and figuring out how best to use fans or water to remove what heat the coolant does release. The lab also plans to overclock the computers and see how much of a performance boost the coolant might provide without damaging the electronics, Martinez says.

Submer notes there are potential challenges its coolant faces. For instance, plasticizer compounds in PVC cables may leak into the coolant, potentially making the cables stiffer and brittle. However, the company notes that cables with outer sheaths made of materials like polyurethane resin do not show this problem.

Sandia plans to finish its tests in July and write up its results in August. “Sandia is exploring what our next data center is going to look like,” and immersion cooling could play a part, Martinez says. “Right now this is looking pretty good as a player in our future.”

The Conversation (3)
Anjan Saha
Anjan Saha29 Jun, 2024
M

If we recover or reuse the waste heat from liquid Immersive Data centre through heat exchanger for useful purposes like hot water requirement for hotel, restaurants, diaries, coloured dye processing industries, Kitchen, Wash rooms, Room heating purposes in cold countries , it will have social, economic and cost benefits impacts for everybody. In normal air cooling process 90- 98% Electricity used to operate Server is wasted as heat energy and it is cooled by precision AC. It is not uncommon for Electrical equipment like large Transformers, are cooled by mineral oil and Generators are cooled by liquid hydrogen or water

Rafael Hernandez
Rafael Hernandez14 Jun, 2024
LS

Submerging computers in a synthetic nonconductive oil is an idea to research in depth and the authors presented a general view of the main issues. However, I would like to comment in two issues, the authors should clarify: 1) The authors wrote: "Google’s data centers consumed about 19 billion liters of freshwater for cooling". This statement is meaaningless if you do not refer the period of time (year?) and the scope (worldwide?); and 2) the authors use the word efficiency (ratio of useful work to energy expended) in a methaphoric manner to describe an undefined ratio, please define ratio.

Richard Benjamin
Richard Benjamin14 Jun, 2024
LS

Texas Instruments explored immersion cooling of its super computers in Austin, Texas 50 years ago using fluorocarbons. I was a part of that team. Nice to see history repeating itself.