Research — Sept 2025

Familiar Signal, New Context The Evolution of Earnings Call Sentiment Analysis from Lexicons to LLMs

By Mengmeng Ao, Frank Zhao, Ronen Feldman, Ilan Attar, Leonid Hatskin, and Benjamin Rozenfeld


Large language models (LLMs) have garnered widespread attention for their ability to understand natural language. Their application in equity investing, however, remains in its infancy due to novelty and cost. This study leverages LLM-extracted features from earnings call transcripts and transforms them into actionable stock selection signals. These features are significantly correlated with their traditional (rules-based) NLP counterparts — the lexicon-driven approach that flags and scores individual phrases as positive or negative — confirming that both approaches measure the same ‘ground truth’. The added computational cost and opaqueness of the LLMs appear justified: a fine-tuned LLM-based sentiment strategy would have delivered twice the long-short return performance (LLM 8.4% versus lexicon 4.2%), with a notable advantage in recent years as mispricing opportunities have narrowed.

Key findings in the US market since 2010 are:

  • LLM-based sentiment strategy doubles return: A long-short strategy utilizing an LLM-based sentiment signal yielded 8.4% per year, effectively doubling the performance of a lexicon-based benchmark, which stood at 4.2%.
  • Financial performance is (still) king: Sentiment tied to direct financial results produced the strongest excess long-short returns (6.4%), outpacing operations (4.5%), competition (3.7%) and macro factors (2.8%).
  • Importance is important: When the LLM flagged events surrounding financial performance as high importance, the sentiment signals delivered 6.4% per year in excess long-short returns — double the payoff from medium-importance ones (3.2%) and nearly four times that of low-importance ones (1.7%).
  • Recent performance remains robust: An LLM-based financial sentiment strategy has continued to deliver strong returns in recent years, while the lexicon-based benchmark has weakened, squeezed from both sides by mass buyside adoption (which increases price efficiency) and by corporates refining their language to game the algorithms.

Explore the data used to conduct this research:

ProntoNLP Transcript Analytics

The ProntoNLP Transcript Analytics dataset is a powerful new tool for analyzing Machine Readable Transcripts data. ProntoNLP efficiently processes and extracts insights around performance metrics, generating key indicators for future corporate performance across essential KPIs. By using Natural Language Processing and an optimized Large Language Model, ProntoNLP accurately recognizes and scores important phrases, helping you easily identify valuable information and separate it from the noise.

Machine-Readable Transcripts

Transcripts is a global data set that was added to the S&P Global Market Intelligence's Xpressfeed product in September 2017. Among its key features, the data set captures the different segmentations of earnings calls in the follow ways:

  • Sections (e.g., prepared remarks, sell-side analyst questions, responses to questions)
  • Speaker types (e.g., executives, sell-side analysts, shareholders)
  • Professionals (e.g., Tim Cook) where the individual professional identifiers serve as a unique key that connects the transcripts data set with the S&P Global Market Intelligence's Professionals and Sell-side Estimates data sets.

Textual Data Analytics

TDA was launched in October 2019 and is productized from Quantitative Research & Solution's previous publications with an advanced suite of analytics and metrics added in May 2022. It is an off-the-shelf NLP solution that tailors to our Machine-Readable Transcripts and outputs over 800 predictive and descriptive analytics for equity investing and various data science workflows. The analytics could be accessed via SQL, Snowflake or (Databricks) Workbench.

Want to replicate this research?

GO DEEPER IN OUR WEBINAR

Learn how to apply LLM transcript analysis in practice.