research Market Intelligence /marketintelligence/en/news-insights/research/a-perspective-on-machine-learning-in-credit-risk content

Login to Market Intelligence Platform

New User / Forgot Password

Looking for more?

Contact Us

Request a Demo

You're one step closer to unlocking our suite of comprehensive and robust tools.

Fill out the form so we can connect you to the right person.

  • First Name*
  • Last Name*
  • Business Email *
  • Phone *
  • Company Name *
  • City *

* Required

In this list

A Perspective On Machine Learning In Credit Risk

Can ComScore Break Nielsen's Near-Monopoly On Ratings?

Most TV Everywhere Viewing Is Live TV In The Home

Consumer Insights Online Video User Overview

Public Companies Going Private

Credit Analysis
A Perspective On Machine Learning In Credit Risk


Co-Authored by Danny Haydon

Aug. 20 2018 — There have been major advances in the application of Machine Learning (ML) in the recent past due to a plethora of industry drivers that have revolutionized the utilization of these techniques in the risk management sphere, and beyond. In this primer we will cover the key transformational drivers causing these high adoption rates, some of the techniques, and how to assess their utility within credit risk.


Firstly, data in general has experienced a large expansion in several dimensions; size, velocity and variety. Simultaneously the abilities to record, store, combine and then process large datasets from many disparate sources has experienced wholesale improvements. This is not limited to just traditional sources, but also alternative data which fueled the need to extract information value from these sources. However, the side effect of this data expansion is an elevated level of data pollution that needs to be contended with. Data pollution includes noisy, conflicting and difficult to link datasets.

Secondly, the ease of access to enhanced computational efficiency through hardware that can run specialized operations in large scale, and also in coding language enhancements which have moved towards functional programming, have transformed the game in terms of integrating Machine Learning techniques. Languages, such as R, become the hub for numerical computing using functional programming. They leverage a lengthy history of providing numerical interfaces to computing libraries. Supervised and unsupervised algorithms allow data scientists to process these datasets into actionable insights with relative ease and to code with cheaply executable hardware.

Thirdly, reproducible research and analysis has been widely adopted by the data science community. This is defined as a set of principles about how to do quantitative and data science driven analysis, where the data and code that leads to a decision or conclusion should be able to be replicated in an efficient and clear way.

Finally, the pervasiveness of Open Source libraries, packages and toolkits has opened doors for the community to contribute via teams of specialists, sharing code base and packaging them into easy and modular functions.

ML Techniques in Risk and considerations in their application

The typical phases of applying ML within a Risk context include the following pipeline:

Fig 1: Generalized Machine Learning Pipeline

Assessing which ML techniques to use and when is an important step that needs to be done thoughtfully with the target context in mind. There is no prescriptive method that is purely tied to a particular class of algorithms; the risk context always needs to be kept in mind in order to assess the tradeoffs.

A simple example to consider is the variance-bias tradeoff. Variance reflects the instability of the model to various factors. For example, if small changes to the data result in big changes to the model, then the technique has a high variance. Bias is the ability of the model to show fidelity to the underlying pattern. See Fig 2 for a simple example of this.

Fig 2: Demonstration of model fit comparison visualization

In the above figure we see that Random Forest exhibits low bias, but high variance to the dataset. Quadratic Regression exhibits low variance, but high bias to this data set. Nonlinear regression, in this trivial example with ex-ante known data generating process, seems to achieve low bias and low variance and provide appropriate fit. In the real world, however, finding a sweet spot between over-fitting and under-fitting is less trivial and requires appropriate definition of model selection criteria and exploration of different levels of model complexities. The key takeaway here is that none of these techniques are categorically wrong; it really depends on what tradeoffs we have to make to achieve as close to low bias and low variance as is possible. We need the model to adapt as the real-world adapts and ideally contend with polluted information with minimal supervision, while being as transparent as possible. These are all competing objectives and need to be accounted for within the applied risk domain.

Within a risk scoring context a simple example of being able to communicate to the business the supervision and complexity tradeoff is shown below.

Fig 3: Supervision and Complexity Trade-offs

Here we see that given the characteristics of the dataset, there is a trade-off between coupling the model with the data and the level of transparency of the ultimate model.

Another application of ML in credit risk is within sentiment analysis. A generalized sentiment analysis pipeline is provided below:

Fig 4: Generalized Sentiment Analysis Pipeline

Sentiment analysis methods can generally be split into either deterministic models that rely on a dictionary (bag of words) or neural network models that typically engage a deep learning exercise. The sentiment analysis can be further divided into ‘classification’ and ‘attribution’ where in each case given a target variable, a sentiment polarity label is assigned to a particular article (in the classification case) or attributes segmented within articles which are actually relevant and would impact the target variable.

Fig 5: Usage in Sentiment Analysis

Once again we see the considerable tradeoffs between supervision and complexity. Dependent on the risk context any of these techniques would be applicable.

We have covered the key drivers of the adoption of ML within a credit risk context and showed a few simple examples of the uses. It is important to consider the tradeoffs which are largely dependent on the actual final application. ML functions are a complementary class of techniques but they are not a panacea for every use case within credit risk. Ultimately, being able to communicate their value to the business audience and why they are being used in this context is of critical importance.

Moody Hadi
Senior Director – Innovation & Product Research
Risk Services
S&P Global Market Intelligence

Danny Haydon
Head of Relationship Management, Americas
Risk Services
S&P Global Market Intelligence

All figures are for illustrative purposes only. Source: S&P Global Market Intelligence as of July 2018. Content including credit-related and other analyses are statements of opinion as of the date they are expressed and are not statements of fact, investment recommendations or investment advice. S&P Global Market Intelligence and its affiliates assume no obligation to update the content following publication in any form or format.

The authors would like to express their thanks to Max Kuhn and Jonathan Regenstein from R Studio who provided their expertise and input into the article contents. R Studio is not affiliated with S&P Global or its divisions.

Learn more about Market Intelligence
Request Demo

Technology, Media & Telecom
Can ComScore Break Nielsen's Near-Monopoly On Ratings?


The following post comes from Kagan, a research group within S&P Global Market Intelligence.

To learn more about our TMT (Technology, Media & Telecommunications) products and/or research, please request a demo.

Sep. 17 2018 — Advertising agencies are becoming increasingly frustrated with the inability of Nielsen Holdings PLC's Nielsen Media Research to convince the major media companies to embrace its new cross-platform measurement system, called Total Audience Measurement. This creates a huge opportunity for comScore Inc., formerly Rentrak.

ComScore is trying to reinvent itself following its delisting in 2017 — it was relisted June 1 — following an accounting scandal. The company's stock has fallen from $65 per share intraday on Aug. 17, 2015, to close at just $18.06 per share on Sept. 6. It currently has a total enterprise value of less than $1.2 billion, paltry in comparison to Nielsen Holdings' $12.9 billion.

In April, comScore named Bryan Wiener, previously executive chairman of Dentsu's digital media agency 360i LLC, as its CEO. On Sept. 5, the company announced that it hired Sarah Hofstetter to serve as president and head up commercial strategy, including sales and marketing.

Wiener and Hofstetter have worked together for two decades, most recently at 360i, where Hofstetter was CEO and chairwoman. The two executives' deep ties to the advertising community may be just what is needed to bring a competing cross-platform measurement system to the broadcast and cable network industries.

Cable network ad revenue grew for decades before stumbling, albeit modestly, during the last recession. More recently, despite a booming economy the cable network ad industry has faltered, in part due to cord cutting and cord shaving but also because current ratings do not include all of online viewing and out-of-home viewing.

Currently, the ratings only include online viewing within a three-day period, which includes the exact same commercial load as linear. Many media companies do not believe that online viewers will tolerate the huge ad load that exists on linear TV and do not include the same commercial pods that appear on linear TV when serving up the shows online.

Although negotiations between Nielsen Media Research and the major media companies have been going on for some time, many in the industry are tired of the delays in adopting a new system and are looking at alternate ways to measure viewing.

Learn more about Market Intelligence
Request Demo

Technology, Media & Telecom
Most TV Everywhere Viewing Is Live TV In The Home


The following post comes from Kagan, a research group within S&P Global Market Intelligence. To learn more about our TMT (Technology, Media & Telecommunications) products and/or research, please request a demo.

Summary: subscribers to telco operators were more likely to indicate they streamed TV Everywhere content compared to cable and DBS subscribers.

Sep. 17 2018 — Streaming live TV Everywhere to a mobile device inside the home is the TV Everywhere activity most often performed at 52% of multichannel TV respondents, according to data from Kagan’s MediaCensus online consumer survey.

While 58% of respondents surveyed in multichannel homes viewed TV Everywhere in the last three months, just 46% did so out of their home. Click here for the full Kagan report.

Viewing live TV inside the home was not only the TV Everywhere activity performed by the most respondents; it was also the most frequently performed.

Subscribers to telco operators were more likely to indicate they streamed TV Everywhere content compared to cable and DBS subscribers. Subscribers to some operators are more likely to stream TV Everywhere content, with AT&T U-verse (64%) being the highest and WOW! (42%) the lowest among operator subscribers surveyed.

Younger subscribers, especially Millennials, were more likely to stream TV Everywhere content compared to older subscribers.

For more information about the terms of access to the raw data underlying this survey, please contact

Data presented in this article is from the MediaCensus survey conducted in February 2018. The online survey included 20,035 U.S. internet adults matched by age and gender to the U.S. Census, with additional respondents subscribing to the top multichannel video operators in the U.S. The survey results have a margin of error of +/-0.7 ppts at the 95% confidence level. Generational segments are as follows: Gen Z: 18-20, Millennials: 21-37, Gen X: 38-52, Boomers/Seniors: 53+.

Consumer Insights is a regular feature from Kagan, a group within S&P Global Market Intelligence's TMT offering, providing exclusive research and commentary.

Learn more about Market Intelligence
Request Demo

Technology, Media & Telecom
Consumer Insights Online Video User Overview


49% of survey respondents use more than one SVOD service.

Sep. 14 2018 — Data from Kagan’s U.S. online consumer surveys shows that 23% of respondents exclusively use one service, while almost half (49%) use more than one SVOD service. The service which is most commonly used exclusively is Netflix, while users of smaller services almost always use at least one other service.

Netflix is so universally used that it is both the most exclusively used service and the service most often used in conjunction with another service. In terms of demographics, Netflix users are very similar to the general population compared to smaller services that tend to have a younger user base.

With the exception of Netflix, most respondents indicated they have never subscribed to the top four services, including Netflix, Hulu, Amazon Prime Video and HBO NOW. Among those who indicated they dropped one of the top services, price was a principal reason for dropping, although content-specific reasons differed by service. Content is one of the most defining characteristics of online streaming services, which can be seen in the content viewed and most enjoyed on each service. In large part users of Netflix, Hulu and Amazon Prime Video most enjoy the content each service is known for.

A broader overview of this data was presented in a recent webcast.

Data presented in this blog is from U.S. Consumer Insights surveys conducted in September 2017 and March 2018. The online survey included 2,526 (2017) and 2,523 (2018) U.S. internet adults matched by age and gender to the U.S. Census. The survey results have a margin of error of +/-1.9 ppts at the 95% confidence level.

Learn more about Market Intelligence
Request Demo

Capital Markets
Public Companies Going Private

Sep. 14 2018 — The recent tweet from Elon Musk has understandably made big news, but it is worth pointing out that the appetite for taking public companies private has been a key area of activity this year. S&P Global Market Intelligence’s data shows that 2018YTD is already at 39% of 2017 numbers, standing at €17.8bn of deal value across 32 completed deals, globally. Going-private closed deal count is at a healthy 49% compared to full 2017 numbers.

In terms of most popular sectors for going-private deals, since 2013 - Information Technology has been leading the pack with €108.9bn of aggregate deal value recorded across 104 deals, while Consumer Discretionary* is trending as a distant second with €49.7bn of total deal value.

The top target location for going private deals is the US, and interestingly – China comes in at second place, with UK following. The three regions have seen total deal size of €218.8 during the period of 2013 through 2018YTD. The popularity of these locations is further supported by the fact that after going private, average target’s EBITDA values have increased compared to when those companies were public. The US-based going private targets grew their EBITDA by average of 56% since leaving the public market, while Chinese and the UK-located companies grew EBITDA by 10% and 38%, respectively. Overall, the going private moves proved to be successful for ex-public companies globally within the 2013 – 2018YTD deals’ time frame, where their average Net Income values grew by 58% while EBITDA values grew by a smaller but yet attractive 29%.

In terms of the deal pipeline, 18 going-private deals were announced globally since 1st January 2018 and would add €25.8bn of aggregate deal value to already closed €17.8bn.

The following was originally published on Angel News on August 16, 2018: Public companies going private, S&P Global comment

Learn more about Market Intelligence
Request Demo