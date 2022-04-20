Context of Strategy

Following S&P Global Market Intelligence release of the new Machine Readable Filings (MRF) dataset in 2020, several papers[1][2] were published by the Quantamental Research team. Said papers zoomed in on the strong relationship between historical consistency in commonly shared sections of forms 10-K and 10-Q, and companies’ performance as measured by their historical stock prices.

From the perspective of an additional use case, the focus of this exercise is to showcase an example of an investment management workflow wherein sentiment analysis techniques are applied to the 10-K and 10-Q filings in a manner analogous to the already available Textual Data Analytics product. To that purpose, a simple historical strategy simulation is run with the goal of attempting to generate excess returns relative to the S&P 1500 and S&P 500 benchmarks between 2007 and June 2021.

Machine Readable Filings

The dataset is comprised of textual data sourced from publicly traded companies’ filings to the U.S. Securities and Exchange Commission (SEC) with a starting date of 2006. Its value-add proposition is to facilitate the parsing of textual data by providing:

Metadata specific to each filing.

Structured raw text as classified in the underlying companies’ filings sections.

Omission of non-textual information, such as tables and images.

Text-cleansing to enable NLP use cases. Key examples of these steps are maintaining consistency across reporting periods regardless of structure changes, reclassifying of headings sections for standardization purposes, and removing irrelevant elements such as table headers and page numbers.

A Simple Investment Strategy

What does it entail to leverage Machine Readable Filings (MRF) in an investment management workflow?

Natural language Processing[3] (NLP) is leveraged to transform and quantify the MRF dataset, wherein the Loughran-McDonald Master Dictionary is chosen to support this paper’s sentiment-based analysis[4] It should be noted that this is also the dictionary used in the Textual Data Analytics (TDA) product built using the Machine Readable Transcripts dataset.

In this example, the positive-to-negative metric, one of many within the bag-of-words category of sentiment-based signals is selected. Though the suite of tools[5] used to perform these analyses permits to assess the efficacy of several dozens of metrics, for simplification purposes the focus in this document is solely on the positive-to-negative metric[6]. As the initial step, the metric’s coverage is reviewed for the available common document sections as measured by quarterly averages per year using the S&P 1500 as the universe[7].

While the Management Discussion & Analysis (MD&A) and the Risk Factors (RF) sections are selected, the Quantitative and Qualitative Disclosures about Market Risk (Q&QDMR) section is excluded given the limited section’s coverage.

Using the positive-to-negative scores for the aforementioned sections, a logic is built to calculate companies’ quarter-over-quarter percentage changes on the positive-to negative scores. Moreover, to avoid look-ahead bias, we ensure that the associated sentiment scores are as of companies’ filing dates.

Next, utilizing the positive-to-negative signals, the following strategy is simulated on both the S&P 1500 and S&P 500 for each section (MD&A and RF) separately:

Long only

Quarterly rebalancing i.e. rebalancing at the end of each calendar quarter

Equally weighted holdings

Invest on companies in the top quintile based on the quarter-over-quarter positive-to-negative score change

A company must remain on the top quintile to be held in subsequent quarter(s) Portfolio Results The simulated strategies for period 2007 through June 2021 outperform the S&P 1500 as a benchmark: The Management Discussion and Assessment section returns 461.57% in comparison to the S&P 1500 total return of 311.79%. Sharpe ratio of this strategy portfolio is superior with a 0.51 compared to the benchmark portfolio 0.49 [8] [9] .

. The Risk Factors section returns of 564.41% in comparison to the S&P 1500 311.79% previously mentioned. Sharpe ratio of 0.52 compared to the benchmark’s 0.49 [10][11]. MRF Pos-To-Neg Strategy vs S&P 1500 Though with smaller excess returns, the proposed strategy also outperforms S&P 500 when executing it using the quintile groups derived from the S&P 500 universe. In this case: The Management Discussion and Assessment section returns 314.30% in comparison to the S&P 500 total return of 309.48% and with a Sharpe ratio of 0.43 compared to the benchmark portfolio 0.496 [12] [13] .

. The Risk Factors section returns 316.56% in comparison to the S&P 500 309.48% whereas the Sharpe ratio is 0.40 compared to the benchmark’s 0.496 [14][15]. MRF Pos-To-Neg Strategy vs S&P 500 Summary This document described the steps to leverage the Machine Readable Filings dataset for creating a signal to guide a simple investment strategy. The exposed use case was purposefully simple and of narrow focus but revealed the ability of generating excess returns. Readers may attempt replicating this strategy. The implementation addressing the NLP requirements to produce the selected metric used in this document is available upon request. Alternatively, a sample and customizable open-source code may be explored as a starting point. Said code is available via the S&P Global Marketplace Workbench. Overall, the ever-expanding NLP dictionaries and techniques, coupled with MRF’s large number of investment management applications, makes this dataset a valuable proposition to practitioners, either as the sole driver or to complement existing strategies. References Coverage for U.S. MRF using S&P 1500 yearly averages of quarterly filings as of fiscal end-period for 10-K and 10-Q common sections Data S&P Global Market Intelligence Machine Readable Filings S&P Global Compustat®

Reference Documentation S&P Global Market Intelligence TDA User Guide S&P Global Market Intelligence Machine Readable Filings

Python Sample Code Refer to Workbench Notebook

Tools Alteryx Designer Python S&P Global Market Intelligence ClariFI® S&P Global Marketplace Workbench

