Skip to Content Skip to Menu Skip to Footer

This is a thought leadership report issued by S&P Global. This report does not constitute a rating action, neither was it discussed by a rating committee.

Highlights

AI data center power demand in the US is growing at ~19% annually (global growth is 16%), with grid interconnection queues stretching six or more years in key markets.

AI infrastructure efficiency in models and hardware is improving fast. Google cut the energy cost of a Gemini prompt 33-fold in a single year. The paradox is that efficiency isn’t reducing consumption; it enables much more of it.

Data centers can function as grid assets rather than pure loads. NVIDIA's DSX Flex Architecture has been shown to reduce power consumption by 25% during grid stress events without interrupting critical workloads.

Introduction

Power is becoming the binding constraint on AI infrastructure growth. Dominion Energy — the utility at the center of US data center alley — now has 70 GW of pending connection requests, a figure that has tripled since early 2025 and continues to grow at 2–3 GW per month. Interconnection timelines in the most active markets have now stretched beyond six years. Supply and demand are moving further apart. 

But the defining issue here is not AI’s energy consumption. It is whether AI changes the overall energy equation.

At its core, the math is straightforward: AI's net energy impact equals what it consumes minus what it helps save across the broader economy. What is not straightforward is whether those savings will materialize fast enough and in the right geographies to keep pace with demand. That gap, not gross consumption, is the defining challenge of the next decade. Exhibit 1 forecasts how much electricity, broken down by region, will be available to run data centers through 2030.

Efficiency gains within the AI ecosystem

AI models have become vastly more efficient. Research teams from Epoch AI, MIT FutureTech, MIT CSAIL and Northeastern University found that the compute resources needed to reach a given pretraining performance level have halved roughly every eight months since 2012. That rapid progress hasn't just come from advances in chips and overall hardware; smarter algorithms have played a large part.

The adoption of more resource-efficient machine-learning approaches such as mixture-of-experts architecture (which only activates relevant model parameters for a given input), along with advanced AI optimization techniques like quantization, kernel fusion and speculative decoding, have sharply reduced how much energy each model output consumes. Google, for example, says its Gemini models used 33 times less energy per text prompt in May 2025 than they did a year earlier. These strides have mainly benefited inference workloads, which now make up the majority of AI compute use in enterprise settings.

Meanwhile, AI hardware efficiency is also improving exponentially. Semiconductor equipment vendor Applied Materials — whose deposition, etch and metrology systems are embedded in virtually every advanced logic fab — has revised its long-term performance-per-watt roadmap from a 1,000x target (set in 2018) to 10,000x by 2040, driven by materials engineering advances. If that trajectory holds, it represents a compounding efficiency gain at the materials layer that sits beneath the headline GPU benchmarks.

GPU generations were historically released on a two-year cycle, but since 2024, NVIDIA and AMD, pressured by the huge demand for AI infrastructure, have both moved toward a one-year update rhythm for their flagship data center architectures. NVIDA claims that its new Vera Rubin platform, which began sampling to customers in February 2026, will deliver 10 times the performance per watt of the current-generation Blackwell platform, and can train mixture-of-experts models using only 25% of the GPUs required by Blackwell for the same outcome. For inference work, token costs could also drop by 10 times.

The problem is that a Rubin GPU draws up to 2.3 kW of power, which is roughly double that of Blackwell. A full NVL72 rack consumes 350–600 kW, requiring infrastructure upgrades that existing facilities weren't built to support.

Despite this, NVIDA cites Blackwell system-level efficiencies of up to 50x performance per watt and 35x performance per dollar compared with the prior Hopper generation, achieved through the more holistic co-design process of the chips and the interconnect and software optimizations, which enables more work at the same fixed power and budget levels. The biggest energy savings come from system consolidation, though. For example, a 2,000-GPU Blackwell system for training using only 4 MW can effectively replace a cluster of 8,000 Hopper GPUs consuming 15 MW.

Compared with traditional CPUs, GPUs are optimized for massive parallel processing and matrix operations, making them far more efficient for running AI workloads. They also use integrated high-bandwidth memory to avoid data bottlenecks. Specialized AI chips (such as Google's TPU and Amazon's Inferentia) are further optimized for specific algorithms and cloud stacks. These efficiencies can be used to reduce total energy consumption or to increase computational rack density. They generate less heat per calculation. And they enable system consolidation, saving on surrounding infrastructure such as cabling and power distribution.

However, the overall power requirement continues to rise with each new GPU generation. This is not new — GPU graphics cards for gaming do the same. While performance per watt rises, so does the absolute power draw required to meet the extreme demands of massive AI models. At the same time, demand for larger clusters (concentrated for training or distributed for inference) continues unabated and is forecast to continue doing so for at least the next few years.

One area where there is still considerable room for improvement is in GPU utilization rates, which could be improved from the current 30%-40% range up toward the 70%-80% target range of CPU servers running virtualization software. Various orchestration, scheduling and cluster management technologies aimed at pushing utilization rates higher are now under development.

Data center power and cooling efficiencies on the rise

The shift toward higher-density rack architectures is driven by the rising power demands of AI accelerators rather than efficiency alone. Modern AI GPUs and networking equipment have pushed rack densities from 5–15 kW historically to more than 100 kW today, with roadmaps heading toward 300–500 kW. Removing this heat exceeds the limits of conventional air cooling, accelerating adoption of liquid-based solutions such as direct-to-chip cold plates and immersion cooling. These technologies provide better heat transfer, reduce fan energy and enable warmer-water operation with more efficient heat rejection.

When combined with optimized coolant distribution units and warm-water economization that reduces or eliminates compressor use, liquid cooling can significantly lower facility energy overhead. Heat reuse and thermal capture offer additional benefits, including district heating or industrial applications, although feasibility depends greatly on location.

At the same time, electrical system architecture is evolving rapidly to support the extreme power density and dynamic load profiles of AI clusters. A major shift is toward higher-voltage DC distribution (e.g., 400–800 VDC), which reduces conversion stages compared with traditional AC chains, lowering losses and copper requirements, and enabling megawatt-scale rack delivery.

Emerging solid-state transformer technologies further this approach by converting medium-voltage utility power directly to usable DC in compact, power-electronic platforms that support modular, scalable deployments and easier integration with on-site storage and renewables. Redundancy strategies are also becoming more workload-aware, moving away from heavily layered legacy topologies. In parallel, advanced power management and AI-assisted controls help manage fast GPU load transients, stabilize grid interaction and improve utilization of electrical infrastructure.

All of these technology advancements can collectively push power usage effectiveness (PUE) toward 1.1 or below in leading deployments, compared with legacy facilities typically stuck in the 1.5–2.0 range.

Ultimately, these efficiency gains reduce energy per computation rather than total energy consumption. AI workloads exhibit strong rebound effects: As compute becomes more efficient and cheaper per token or training run, demand expands through larger models, more inference usage and new applications.

Historical evidence from the worlds of computing and networking suggests that efficiency improvements tend to enable new growth rather than cap total demand. Consequently, infrastructure efficiency can meaningfully moderate the trajectory of overall power growth — potentially avoiding tens of percent of incremental capacity versus a counterfactual baseline — but is unlikely to fully offset it.

AI as a grid asset

The view that data centers will be nothing other than a net strain on the grid is starting to change. AI infrastructure could behave more as a grid-friendly asset and less as a large-load liability. This vision portends AI as a fast-responding resource that serves to stabilize power systems overall. This notion shifted from a largely dismissed theory to a real-world proof of concept throughout 2025 and into early 2026 and was a major theme at CERAWeek. Hyperscalers have shown openness to playing their part and making the requisite behavior changes, and innovators like Emerald.AI are proposing solutions that blunt normal trade-offs.

AI training workloads don't behave like traditional industrial processes. A steel mill or semiconductor plant can't just pause its operations without risking physical or financial loss. In contrast, large-scale AI training is computational and modular. Tasks can be throttled or redistributed across sites without losing progress — with some important technical caveats, such as proximate distance. Batch inference jobs can be delayed briefly, and fine-tuning runs can be rescheduled with minimal cost.

NVIDIA's Vera Rubin DSX reference architecture, introduced in early 2026, was the first major AI platform to formally integrate grid responsiveness. Its DSX Flex software allows data centers to adjust GPU power dynamically in line with real-time grid conditions. One demonstration showed a 256-GPU cluster cutting power use by 25% for three hours during a grid stress event without interrupting critical workloads. The coalition behind the "grid-responsive AI factory" announcement at CERAWeek 2026 suggests this concept is moving rapidly from pilot to deployment. Ultimately, a network of these facilities acts as a virtual power plant, stabilizing the grid by reducing demand and eliminating the need to fire up natural gas peaker plants.

Even with a standardized signaling system like DSX Exchange, the regulatory and contractual side of utility work will remain a bottleneck. The industry is currently calling these facilities "flexible AI factories," but it could take a few years before support for this approach becomes a standard requirement for new large-scale builds.

AI is also advancing grid modernization well beyond the data center. According to 451 Research's Voice of the Enterprise: IoT, OT Perspective survey, most utilities expect AI to play a crucial role in areas like load forecasting (cited by 72% of respondents), virtual power plants (70%), grid monitoring (60%), microgrid segmentation (59%) and digital twin modeling (58%). (See Exhibit 2.)

The Electric Reliability Council of Texas (ERCOT), which manages 90% of the state's electric load, claims that improved renewable forecasting is already cutting both reserve requirements (mandated capacity margins for grid operators to ensure system reliability) and curtailment rates (the percentage of available renewable energy intentionally not used because it exceeds grid capacity or demand).

Dynamic line rating systems — which replace conservative static assumptions with real-time weather and conductor data — have demonstrated capacity increases of 10-40% or more on monitored lines in US and European deployments without adding a single watt of net new capacity.

Rebound risk

When machines are more efficient, they get cheaper to run. Lower costs lead to very elastic demand, and the resulting extra use drives demand higher again. This is the Jevons paradox. When steam engines became more efficient, coal use didn’t fall; it soared because cheaper engines spread everywhere. The same pattern is showing up in AI. Google's 33‑fold drop in energy use per Gemini prompt cut costs and made it easier for companies to scale usage. So far, there's no sign of compute usage slowing down.

The rebound effect doesn't make efficiency worthless, but it limits what it can solve. The real question now is whether AI can do more than consume power wisely — whether it can help change how energy is produced and managed from the start.

Looking forward

It’s still too early to know whether AI will ever "pay for itself" in energy terms — and framing the question that way risks missing the point. The more relevant test is directional: Is AI pushing the energy system toward greater efficiency and flexibility, or is it simply amplifying demand faster than the system can adapt?

Efficiency gains across models and hardware are arriving at a fast and furious pace. Early evidence points to a near-term future where data centers evolve from rigid, always-on loads into flexible grid participants. Hyperscalers will be on board with this approach if the technology proves effective; that shift could materially ease one of the biggest bottlenecks in power infrastructure today — interconnection capacity. By adopting flexible load arrangements, hyperscalers can potentially gain access to grids that would be otherwise capacity-constrained.

But there is a real risk that the industry is underestimating the counterforce. Lower costs are already accelerating adoption, and inference-heavy workloads are expanding into every corner of the economy. In that environment, efficiency doesn't reduce demand — it enables it. The historical pattern is clear, and AI is not exempt from it.

The next phase of market development will not be decided by model innovation alone. It will hinge on four variables: the pace of efficiency gains, the speed at which grid capacity can be expanded or made more flexible, the magnitude of Jevons Paradox, and the ability of policy and market structures to keep up. Today, those systems are not moving in sync.

Ultimately, the tension between AI’s insatiable energy demand and its increasing efficiency will not be resolved by a simple net-zero equation. Instead, AI is acting as an accelerant, forcing a decade of grid modernization to happen in just a matter of years. The "AI energy problem" may never truly disappear, but the infrastructure built to solve it, including flexible factories, virtual power plants and AI-optimized distribution, will leave behind a more resilient and responsive energy system than the one we have today. The goal for the industry is no longer just to offset a footprint, but to lead the transition into an era where high-density compute and a stable grid are no longer mutually exclusive.