14 Nov, 2024

Energy, chips, quality data emerge as major bottlenecks for GenAI

The explosive interest in generative AI is reshaping industries, yet this growth is hampered by significant bottlenecks.

Enterprises across sectors are eager to invest in GenAI initiatives. For the past several quarters, AI has been a top spending priority among organizations, according to the Tech Demand Indicator, a measure of organizations' spending intent from S&P Global Market Intelligence 451 Research. Spending intent on AI stood at 57.70 in the third quarter — with a reading above 50 signaling expanding tech spending.

Despite organizations' willingness to invest, the momentum of GenAI's expansion is being slowed by three key constraints: chips, energy and quality data.

Struggling to chip in

NVIDIA Corp., a pivotal player in the AI chip market, is grappling with unprecedented demand. The company has already sold out its production capacity for 2025, highlighting the strain on supply chains.

Taiwan Semiconductor Manufacturing Co. Ltd., NVIDIA's primary manufacturer, faces production bottlenecks, particularly in advanced packaging processes.

"Front-end capacity (Taiwan Semiconductor Manufacturing Co.'s foundry) can be used for both smartphone and AI chips," Leping Huang, an analyst at Huatai Securities, told S&P Global Market Intelligence. "The bottleneck is on the back end, mainly in the advanced packaging process." Huang said that this is why back-end equipment manufacturers such as Disco Corp. and Advantest Corp. continue to experience strong demand.

Taiwan Semiconductor Manufacturing Co. has announced that it doubled its advanced packaging capacity this year and plans to double it again in 2025. However, CEO C.C. Wei cautioned that even this expansion might not suffice.

"We are putting a lot of effort to increase the capacity of the [chip-on-wafer-on-substrate,]" the CEO said. CoWoS is an advanced packaging technology that makes high-performance computing and AI components. "Today's situation is that our customers' demand far exceeds our ability to supply. So, even though we work very hard and increase the capacity ... more than 2x as of this year compared with last year, and probably double again, but [it's] still not enough."

Another bottleneck in the chip sector is high-bandwidth memory. High-bandwidth memories (HBMs) are produced by only three companies: SK hynix Inc. and Samsung Electronics Co. Ltd. from South Korea, and Micron Technology Inc. in the US. The emerging HBM market is projected to grow at double-digit rates in the coming years as the three suppliers expand their capacity.

SK hynix's HBM revenue has been minimal in recent years within its Dynamic RAM division, but it is expected to constitute the majority of revenue in five years, according to Visible Alpha estimates. SK hynix announced that it has already sold out its capacity for 2025 and is working to increase supply.

"It is ... true that our production capacity is facing limitations in meeting all of the increased demand in excess of our original plan," a spokesperson for SK hynix said on the latest earnings call.

SNL Image

Running out of power

Energy has frequently been identified as a major bottleneck in the development of GenAI. AI chips consume about 10x more energy than traditional datacenter chips. The new Blackwell NVIDIA chips are expected to consume two to three times that amount. Although Blackwell chips are more energy-efficient, they consume more power per datacenter rack due to their enhanced performance.

Training new GenAI models also necessitates chips being located in a single datacenter or a cluster of datacenters, rather than being scattered. This resulted in certain markets seeing a surge in demand for energy. Dominion Energy Inc. — which serves Northern Virginia, the largest datacenter market in the world — has seen significant load growth in its Virginia service territory.

"We continue to see strong datacenter growth in Virginia and have already connected 14 new datacenters year to date," Dominion Chair, President and CEO Robert Blue said Nov. 1 on an earnings call. Blue added that the company expects to connect 16 datacenters this year.

"In aggregate, we have datacenter demand of over 21 gigawatts, as of July 2024, which compares to about 16 gigawatts as of July 2023," Blue said.

Hyperscalers are rapidly forming partnerships with energy firms to secure new renewable energy contracts. According to S&P Global Commodity Insights, Amazon.com Inc. leads in renewable energy contracted capacity, followed by Meta Platforms Inc. and Alphabet Inc. Nuclear energy is experiencing a resurgence, as it is the only energy source that is both clean and stable.

SNL Image

However, nuclear energy is not anticipated to power AI workloads until at least the end of the decade, according to Dan Thompson, an analyst at 451 Research.

"More likely, the insatiable energy demand will be met by utilities adding additional natural gas generation. Natural gas is plentiful in the US, is less dirty than coal, but is unfortunately not emissions-free, which will have a negative impact on companies' sustainability efforts," Thomspon said.

"The number of requests to connect gas-based electricity generation facilities has skyrocketed," he noted.

In June, S&P Global Commodity Insights revised its forecasts for total power growth in the US to 2.1% from 1.2% between 2024 and 2030. A significant portion of this growth is expected to come from datacenters, which are projected to consume double or triple the current energy levels by 2030. This growth could occur if the energy grid avoids significant issues.

"Unlike Europe, the US does not face an energy problem, but it does have an energy infrastructure issue," said S&P Global Ratings analyst Aneesh Prabhu.

Ratings analysts estimate that the US energy grid would require about $15 billion in capital expenditure for transmission to support growth through 2030. While this may not seem substantial, it is generally not a problem that can be resolved merely by financial investment.

"Expanding transmission infrastructure assets is a long-term planning process requiring permitting and siting, typically conducted at a measured pace. It necessitates regulatory approvals that often involve numerous filings and considerable time," Prabhu said.

The waiting time for grid connection ranges from three to five years at US regional transmission organizations. The quickest solution to this issue is to construct datacenters closer to energy generation facilities. While this approach could work for GenAI training, it may not be feasible for GenAI inference, which requires proximity to the end user.

SNL Image

Data is the new oil

The phrase "data is the new oil" has taken on a new dimension in the era of GenAI. Quality data is crucial for training foundation models, yet it is often unavailable, uncaptured, or unprepared for AI.

"Many enterprises possess vast amounts of documents, but they are neither tagged nor indexed," said Si Chen, head of strategy at data management platform Appen Ltd.

Even more challenging are organizations that fail to capture all internal data or have data scattered across various departments. "The abandonment rates for AI projects are significant," said Eric Hanselman, chief analyst at 451 Research. "The major bottleneck is the digital maturity of the organization."

A survey by 451 Research found that data quality was the primary bottleneck to AI adoption after budget constraints. Numerous companies have emerged to provide data management services for GenAI applications, assisting organizations in preparing data for model ingestion. When data is unavailable, companies often resort to synthetic data, though this approach has drawbacks.

SNL Image

"We often find synthetic data lacks common sense compared to human-generated data," Chen said, adding that models perform best when human-generated data is combined with synthetic data.

When organizations fail to capture data, increased investment in infrastructure, including cloud and on-premise solutions, is necessary. According to 451 Research's Tech Demand Indicator, cloud infrastructure spending is prioritized over AI technologies. IT infrastructure priorities rose significantly in the third quarter, suggesting that companies realized they needed to become more digitally mature before successfully applying GenAI technologies.