Blog — 8 Jul, 2021

Effectively Summarizing Text in Legislative Bills Using FiscalNote

Much like 10-Ks and 10-Qs, government bills are getting longer and longer. The average length of a document for a law passed in the 1947-48 Congress was 2.5 pages. Today, the average length is 17.9 pages. A famous example is the 5,593 page pandemic relief package passed in late December 2020, the longest bill that ever passed through Congress.

Members of Congress rely on specialists to read these bills. These are typically lawyers who can understand the technicalities of the language, and whom then provide the Member of Congress with a summary and insight ahead of a vote.

So how is everyone else able to read through these texts, summarize them, and understand their impact? Automated summarization techniques can help make this possible. For example, these techniques could assist a fundamental analyst who wants to stay ahead of the upcoming bills affecting the sector he/she reviews.

For demonstration purposes, we use FiscalNote to analyze an industry that experienced more than average growth in the number of bills discussed in Congress. FiscalNote’s proprietary technology aggregates laws and regulations from Congress and federal agencies in real time to help assess potential risks within an industry or portfolio. Data is tagged at the fourth level of GICS industry classification, and we used this level of granularity throughout the analysis.

Congressional terms are two years long and Presidential terms are four years, so we calculated the growth in the number of bills by comparing congressional session 113, that started on January 3, 2014, to session 115, that started on January 3, 2017. Both sessions started in the year of a President's inauguration, which we assume to be good for comparison purposes. The overall increase in the number of bills between the two sessions was 19.9%, counting bills in all stages of the legislative process. The top growth rates by industry sector are shown below in Table 1.

Table 1: Growth in the Number of Bills by Industry

Industry

113 to 115 increase

Fertilizers and Agricultural Chemicals

64.6%

Airport Services

61.1%

Multi-Sector Holdings

60.0%

Internet and Direct Marketing Retail

54.1%

Diversified Chemicals

47.0%

Personal Products

43.6%

Investment Banking and Brokerage

42.6%

Drug Retail

41.7%

Asset Management and Custody Banks

38.8%

Health Care Distributors

38.5%

Source: FiscalNote, as of June 14, 2021.

The first three industries had a lower absolute number of bills over the time period in question, so we concentrated on the fourth: Internet and Direct Marketing Retail. For congressional session 117, we have 99 bills for this industry in the FiscalNote database at various stages of discussion. FiscalNote provides an industry score along with the GICS tagging, which captures the relevance to the industry, and bill S.1274 had the highest score.

There are two main types of summaries generated in automated ways: extractive and abstractive. Extractive techniques use the existing sentences of the document, rank them by order of importance based on a few possible methods (e.g., topic, TF-IDF, and Latent Semantic Analysis (LSA)), and retain only the highest ranking ones, which are supposed to be representative of the essence of the text. Abstractive techniques, on the other hand, read through the text and then create a summary from scratch, without being constrained to use only existing words/sentences. We use the extractive method here based on cosine similarity, which measures how similar documents are irrespective of their size.

Looking at S.1274, the full text of the bill is here (2,169 words), and a summary compiled by a human is here (82 words). This is what the piece of legislation is about:

The Remote and Mobile Worker Relief Act would:

  • Provide uniformity in state and local income tax assessment and withholding obligations for employees that may travel on behalf of their employer to work in a state that is different from the state where they reside.
  • Preserve the status quo by allowing employers to continue to assign the income for employees temporarily working remotely due to the COVID-19 pandemic at their pre-pandemic work location versus the location where they may have been working remotely.

A simple extractive summarizing method gets quite close. This summarizing method splits the whole text into sentences, calculates the similarity between sentences, ranks them and picks the top (1). While clearly there are some differences, and the language in the code-generated summary remains somewhat complex, summaries such as this one could be used by an analyst to go through records in a fraction of the time as compared to what would take them to read all bills in full. This summary contains 94 words:

Limitations on withholding and taxation of employee income (a) In general No part of the wages or other remuneration earned by an employee who is a resident of a taxing jurisdiction and performs employment duties in more than one taxing jurisdiction shall be subject to income tax in any taxing jurisdiction other than; (1) the taxing jurisdiction of the employee's residence; and (2) any taxing jurisdiction within which the employee is present and performing employment duties for more than 30 days during the calendar year in which the wages or other remuneration is earned.

A quick and easy count of words in the summary (Using Excel®) after removing stop words,[1] gives us a clear picture of a bill that wants to tackle employee taxation and look specifically at jurisdiction, as shown in Table 2.

Table 2: Word Count

Word

Count

jurisdiction

5

taxing

5

employee

3

duties

2

employment

2

income

2

remuneration

2

wages

2

For illustrative purposes only.

There are a few factors that an analyst would want to look at to assess the content of the bill:

  • How many Congress members are supporting the bill and what parties do they represent?
  • Is the bill targeting a single specific industry, or more?
  • Is this a simple resolution, or a bill introduced in the House/Senate?
  • How do the words compare to other bills proposed by the same body, in the same congressional session?

These issues are addressed below.

What Type of Bill is This?

The type of the bill is embedded in its name, enabling anyone to quickly extract the letter at the beginning of the bill name to understand its type. In this case, S.1274 was introduced in the Senate. Generally speaking, bills introduced in the House or Senate are more crucial than those introduced with a simple resolution.

Who is Supporting the Bill?

Our sample bill is only associated with two Senators, the main sponsors being Senator John Thune (R) from South Dakota and Senator Sherrod Brown (D) from Ohio. On average for the 117 congressional session, each bill has approximately 10 sponsors, which already gives us an indication that, while the bipartisan nature of the bill might make it more likely to pass, the number of sponsors makes it much less likely to pass and, ultimately, have an impact on the targeted industries.

How Many Industries are Impacted?

This bill impacts 94 different sub-industries, with an average score of 0.86. When the data is aggregated to the first level of GICS, only one out of the 11 sectors is potentially not affected by the bill (Information Technology). To put these numbers into perspective, bills introduced in congressional session 117 have an average of 7.45 GICS codes tagged to each one, with an average industry score of 0.65.

With these numbers, it is clear that this bill is not uniquely targeting the Internet and Direct Marketing Retail industry, and looks to touch a large number of other industries. This is quite common for bills that consider issues such as taxation and minimum wages.

How Long is this Bill?

This bill has 13,393 characters compared to an average of 13,237 for all bills in this congressional session. Generally speaking, lengthier bills have a higher probability of becoming law, as they usually have larger bipartisan sponsorship and often combine smaller bills that couldn't get enough support as standalone pieces of legislation.

All this information paints a clear picture: We're looking at a bill of average length, with a lower than average number of sponsors and a higher than average number of industries being impacted. Based on the automated summary of the bill, we can see the bill is dealing with employee taxation and jurisdiction.

Using the FiscalNote dataset can help save analysts hours of work and provide a framework to quickly scan through many legislative proposals.

Click here for more information on FiscalNote.



[1] Stop words are words which are filtered out before or after processing of natural language data or text.

Explore the data used in this blog on the S&P Global Marketplace

Blog

The Heightened Regulatory Environment: Is the Banking Sector Facing More Fines?