Research — 18 Oct, 2023

What is an 'AI datacenter'?

AI is the hottest topic in the tech industry in 2023. With the popularization of OpenAI tools after years of development, there are many questions about the long-term implications of this technology, as well as what immediate changes it might bring. In the datacenter business, the focus of discussion is the infrastructure required for AI. The general consensus is that AI will intensify demand for datacenter space, particularly from hyperscale customers. Beyond that, industry media outlets seem to insist on the emergence of a new class of facilities — "AI datacenters" — set to reshape the overall market.

On a broader level, AI seems to have fallen prey to the typical dynamic that emerges once a new technical breakthrough takes shape: hype. Although AI is poised to cause significant changes in the global technology landscape, we believe a lot of hype has emerged surrounding the possible implications of AI in the datacenter industry specifically. In order to discern the long-term impacts of AI, we must first take a step back and define an AI datacenter — or more simply, define what the datacenters of the future will look like.

SNL Image

The fun part about a giant wave of hype, as we have seen with AI, is that it causes us to examine what is possible, what is reasonable and what is simply make-believe. When it comes to AI's impact on datacenters, what we are really asking is, "What will datacenters of the future look like?" As of now, it seems that AI has finally brought to fruition the high-density deployments that the datacenter industry has been talking about for over a decade. The hype seems to suggest that a separate class of facilities will emerge — an AI datacenter, as it were — but for the short term, most datacenter providers are able to accommodate the higher density requirements with their existing designs. In the longer term, GPU-based deployments will likely continue to push densities higher, and some providers are already considering redesigns that will allow them to better meet those power requirements. It is possible, however, that an AI datacenter will have other implications. AI could significantly disrupt components such as cooling, networking and datacenter management. Looking at these elements could provide a better vision of what the datacenter of the future might actually look like.

SNL Image

A matter of density

What many industry leaders perceive as a boom in demand for datacenter space caused by AI in the last 12-18 months was preceded by years of growth in the datacenter industry led by cloud service providers. Since 2020, cloud providers have seen a sharp rise in demand for their services and have consequently grown their infrastructure to better service end users, taking up large chunks of space in leased datacenters. However, we have observed another surge in demand starting in late 2021 to early 2022. The one main difference between this wave of demand and the previous one seems to be that customers' density requirements are also shifting, as those companies look for facilities offering higher amounts of power per square foot. Datacenter and cloud providers alike agree that AI workloads will require high-density infrastructure at the datacenter level, making it easy to assume that all this new demand can be attributed to the surge of interest in AI.

One might expect that these higher density requirements would send datacenter providers scrambling to completely redesign their facilities and build new sites from scratch to house them. However, at least in the short term, that does not appear to be the case. Most providers have not had to re-architect their facilities for these workloads, as they are able to take high-density deployments with a lower number of racks, offsetting the higher density with lower-density deployments in the same facility. For instance, some datacenter providers claim they are already able to accommodate deployments as high as 50 kW per rack in their datacenters. Nevertheless, most datacenter providers agree that some form of redesign will be required (likely within the next five years), as rack densities are expected to continue to rise with the release of more power-hungry systems.

Cooling concerns

One of the main concerns surrounding high-density datacenters is the excess heat generated by the latest generation of high-performance servers. Therefore, a necessary consequence of per-rack densities rising would be the popularization of new cooling methods. Conversations about cooling alternatives are already taking place. Liquid cooling is one alternative that has drawn attention in the industry in recent years, although it remains far behind air cooling in terms of adoption.

In 451 Research's recent Voice of the Enterprise: Datacenters, Datacenter Infrastructure 2023 survey, 1,012 IT decision-makers were asked about infrastructure trends in the current industry space. Forty-three percent (43%) of survey respondents stated they use an air-cooling system in their facility and plan to use it for the next five years. Fifty-one percent (51%) indicated they use air-cooling systems but plan to move to liquid cooling in the next five years, out of which 31% plan to do it in the next 12 months, 17% in two to four years and 4% in five years. Only 3% of respondents stated they had already moved to liquid cooling. Out of respondents whose organizations are considering or have moved to liquid cooling, 34% are in favor of immersion, 19% prefer direct-to-chip (cold plate) and 47% are interested in both. This is similar to what we have been seeing from datacenter providers globally, with all the major players experimenting with various liquid-cooling methods (including rear door heat exchangers); however, providers seem to be leaning toward options other than immersion. This makes sense; immersion cooling will require a bit more rethinking of the current datacenter designs. That is not to say that it has been ruled out, it just seems that other methods may be leveraged more widely, more quickly.

No matter the cooling approach, AI datacenters will need to address more than 50 kW per rack across an entire datacenter.

Networking

AI relies on cross-traffic. Specifically, the work involves parallel processing, where resources including GPUs, CPUs and memory work independently, and the job must wait for the slowest of these elements. No wonder, then, that enterprises cite networking as the top bottleneck to AI performance. In our Voice of the Enterprise: AI & Machine Learning, Infrastructure 2023 survey, 45% of respondents said they needed higher-performance networking in order to improve the performance of their AI/machine-learning workloads. Networking got a stronger response than any other infrastructure resource, including accelerators in the cloud (37%), faster x86 servers (36%) and memory capacity (34%).

Most of the networking required is within a GPU cluster, such as GPU-interconnection like NVSwitch, and not necessarily a datacenter-level concern, although customers will require dense, high-speed connections between racks. A central question for enterprises, or anyone setting up infrastructure to be consumed as a service, is whether to base that internal network on InfiniBand, which can operate at a massive scale, or Ethernet, which is ubiquitous and less expensive. The key challenge with InfiniBand, and something that will be a concern for datacenter providers, is its limitations on cable length. This becomes a problem for datacenter providers because customers will require contiguous space for their GPU-based infrastructure to ensure that the clustering is functional. Subsequently, this also necessitates adequate cooling, as mentioned above, due to the fact that all these high-density racks will need to be in close proximity to one another.

That said, connectivity beyond the datacenter walls could be crucial if an AI inference task relies on data residing elsewhere. In the case of hyperscale tenants of leased datacenters, an enterprise customer might store data in a different public cloud, at an edge location or on-premises. This suggests datacenter operators will need a strong interconnection story that serves enterprises' hybrid and multi-cloud strategies.

Implications for datacenter management

As the demand for higher-density racks increases and the methods to cool those racks evolve, the ability to properly manage the floor space within the datacenter, particularly around monitoring for and responding to hotspots and greater load demands, becomes ever more important. As we envision the datacenter of the future, could AI be leveraged as part of the solution to the "problem" it is creating? AI is already being deployed in the industry to an extent to identify infrastructure issues inside a datacenter. Cloud providers Google and Huawei, for example, deploy data center infrastructure management, or DCIM, that will control air cooling on the datacenter floor, identifying cold and hot spots in the facility. This helps in maintaining a uniform temperature across the facility and ensures that cooling units run more efficiently.

The path to greater adoption of AI approaches in datacenter management faces some roadblocks. The first is the simple fact that DCIM is not used by many providers as it is intended to be. Datacenter operations currently either use multiple DCIM platforms and make them interact with each other or lag behind DCIM entirely, relying on spreadsheets or other old-school techniques for datacenter management. The second issue is that DCIM itself needs to evolve. DCIM is currently responsible for generating data while people manually adjust equipment. The next logical step for AI in datacenter management would be going beyond identifying issues and actually solving them.

The benefits of incorporating AI into these processes are many. Besides evidently streamlining internal processes, the technology could also help datacenter providers in achieving their sustainability goals by making power consumption more efficient. Likewise, it could also reduce the number of employees needed to operate a datacenter, which could be beneficial because many providers currently struggle with a shortage of skilled workers in the datacenter sector.

This article was published by S&P Global Market Intelligence and not by S&P Global Ratings, which is a separately managed division of S&P Global.

451 Research is part of S&P Global Market Intelligence. 

Gain access to our full news & research coverage and the industry-specific data that informs our insights.