We talk a lot about theoretical aspects of ML/IA, but this episode brings in Giorgio Baldassarri and Kieran Shand from S&P Global’s Credit Modeling and DMS teams to talk about the practical aspects of Machine Learning (ML) implementations with host Eric Hanselman. They are tackling a complex problem in tracking credit risk and building ML models to identify higher-risk credit instruments. It’s a situation where the data can be incomplete and, in a regulated industry, model explainability is key.
Subscribe to Next in Tech
SubscribeTranscript provided by Kensho.
Eric Hanselman
Welcome to Next In Tech, an S&P Global Market Intelligence podcast for the world of emerging tech lives. I'm your host, Eric Hanselman, Principal Research Analyst for the 451 Research arm of S&P Global Market Intelligence. And today, we're going to be picking up from some of the work we've been talking about around data sets and data delivery and actually moving into some applications. And I've got 2 members from our credit risk team who are going to go into some details about how they're using data and analytics in credit risk capabilities.
We've got Giorgio Baldassarri, the Group Manager for our quantitative modeling team; and Kieran Shand, an application specialist with our data management solutions group. Welcome to both of you.
Giorgio Baldassarri
Thank you. Hi.
Kieran Shand
Thank you.
Eric Hanselman
It's great to have you on the podcast, especially to go dig into real applications, the idea that you're actually starting to put a lot of these capabilities to work and really how they've been progressing. So, before we really dive into the data aspects, I wanted to first start by looking at really what the problem is that you're trying to address in credit risk.
Giorgio Baldassarri
Yes, Eric. I think that this is a very relevant question, pretty important to address also from the point of view of our customers and users. So the major problem is really about trying to understand whether a counterparty will be able to service or to repay its debt. In simple terms, it's just that.
Eric Hanselman
So when you're looking to really assess and -- assess that level of risk, clearly, there are a lot of different factors that come into play in terms of how you really address the problem. So how are you looking at building models to address this? And what kinds of analytics and technologies are you putting in place?
Giorgio Baldassarri
There are several aspects that we need to take into account when we are building, in particular, statistical models to understand -- to get an understanding about the ability of a counterparty, being, for example, a company, to repay its debt. And in particular, we use statistical techniques to try and establish a statistical link between the historical behavior of companies that have certain characteristics from a fundamentals standpoint. So we look at company financials, for example, as a good measure of the ability of companies to repay their debt.
The challenge is that sometimes these data are not coherent or consistent, the coverage is limited. If I think about, for example, private companies, they do not always report these financials with -- regularly, and when they report them, they are not even complete. So one of the challenges in building statistical models is really about collecting all these data sets and then making sense of these data sets, also taking into account the fact that sometimes there is not enough coverage and ultimately, using these statistical models to automate -- to generate an automated way of assessing the credit risk of these companies.
This is one advantage, obviously, that one does not necessarily need to spend too much time in analyzing these company financials in order to get an understanding of this -- of the ability of companies to repay their debts. At the same time, it's a challenge because, typically, credit risk assessments done by risk analysts, especially in banks, often involve a very in-depth analysis of the credit profile of these companies and an understanding that really goes into -- in depth really. So it's much more than just a quantitative assessment of credit risk. Yes, I think this is really something that we try to address to simplify the workflow of our users who may not have enough resources to -- and time to spend in analyzing every individual company that they are dealing with.
Eric Hanselman
So it's really a matter of navigating a certain amount of uncertainty and filling in some of the gaps where data is not available.
Giorgio Baldassarri
Absolutely. And I think that one other aspect is related to the type of signals that we get out of these statistical models. What I mean is that as we build our quantitative models, the challenge sometimes is that it's not always easy to really understand why the output of a statistical model comes the way that it comes; in other words, why we expect a certain company to have a certain credit risk profile based on those financials.
So another challenge is about really the interpretability, how to interpret, how to justify the outputs of the model in light of the underlying data that are used. So there is this idea about, okay, models are very quantitative, statistical, automated. This is really cool, but sometimes they are perceived like black boxes. And so the challenge is how to equip or what type of analytical features to add side by side with the outputs of the statistical models to enable a user to make sense out of the output of the model itself.
Eric Hanselman
Well, presumably, that's an area in which regulators want to know how the process was -- actually took place, all of that, the explainability of modeling technologies.
Giorgio Baldassarri
Absolutely. So there is this element, Eric, that you rightfully mentioned; that there is, in fact, a requirement especially for financial institutions to be able to justify the outputs of their models when they are using especially internal-based models or models provided by third-party vendors like our own company. The important aspect is really about, again, this transparency that cannot be really achieved just numerically with -- just by showing numbers but by equipping users with some tools that can allow them, for example, to benchmark the company that they are looking at with other companies with similar characteristics that maybe come from a big -- large database of companies as we have as an offering in our -- from our company.
Eric Hanselman
So interesting aspects in terms of when you're actually building the model, you get it fully trained, you have to then really look into how it got trained and really what the model learned and how you're going to be able to manage it. But that's something that's not exactly an easy thing to pull the model apart then just simply because of the fact that there is a level of opacity in terms of what the model's internal function is.
Giorgio Baldassarri
Yes, Eric. So one particular aspect that you also mentioned about the opacity of these models is really very, very relevant. Another one probably is about really how to bring models to the next level. What I mean by this is that we try to build a statistical model that assess the credit risk of companies as of a certain date based on certain financial characteristics.
And we have several models that my team and I have been building over the past few years that are part of a product within S&P Global Market Intelligence called Credit Analytics and some of these models that try to really establish a statistical relationship between the financials of a company and its S&P Global ratings, so the rating that is provided by our -- a subsidiary -- another subsidiary of S&P Global. And the challenge is really not just to try to estimate these ratings in a quantitative manner but also to try to add value to our customers by going to the next level, that is to try to generate early signals of a potential future deterioration of this credit profile of these companies. And this is why we recently engaged with an exercise with Kieran, who has been trying to analyze a set of data provided by S&P Global Ratings, so the actual ratings, and trying to see if there is a way to, in essence, generate early warning signals of a future credit risk deterioration.
And Kieran, probably you can talk a bit through some of these things as well.
Kieran Shand
Yes. No, thank you, Giorgio. So we have a model within the Credit Analytics suite called the CreditModel. In practice, what the CreditModel is attempting to do is, in many ways, mimic the methodology of a formal rating, of course, in a numeric way. So of course, there's no qualitative input. But as much as we can do, we attempt to mimic the rating methodology and use that within Credit Analytics to produce probability of default percentages and also rating outputs.
However, when we work with the CreditModel, we have a huge percentage which fall within 2 notches of the like-for-like rating. We actually have 88%. But when we move away from that, there will always be scenarios where the CreditModel output is not in line with the formal rating.
When we speak with clients, the first question is why. Why -- if you have a model which is attempting to, in many ways, mimic what the rating is, why do we have scenarios where it's significantly out, say, 3 notches or above or below? And that's a very difficult question for us to tackle because the immediate assumption is either that the input of the data is not good or the model is perhaps not reliable. In either scenario, there's perhaps work that we need to do.
So myself and Giorgio took on this question and decided to dig into it a little bit. And what we were trying to understand is whether we can -- whether there's any insights into those scenarios where the rating and the CreditModel output is different by 3 notches or greater than that. What we found is that where the difference is within an arm's width, the probability that the rating will transition within from 1 year is circa 80%. So it's extremely high.
This is where we can take the group of CreditModel outputs which are in that band and use it to our advantage. Because if there are probabilities that it's going to transition -- the rating will transition within 1 year is quite high, then we know that where there is a difference which is significant between the rating and the CreditModel, then these are companies which are highly suitable for surveillance purposes because we know that they're going to transition. If we take -- or we believe that they could transition in the near-term future.
So if we take, say, the workflow of an asset manager who will have a portfolio that they are managing, part of their responsibility, part of their mandate is to stay -- is often to stay within the rim of their investable universe, which is often denominated by credit ratings. So if the portfolio manager has to only invest in investment-grade and above ratings, well then, they have a significant reason to have a good surveillance process on their investments, which could drop out of the investment-grade rating. And if they do drop out, then, of course, they're going to have to rebalance their portfolio.
Eric Hanselman
And so that's a situation, in fact, where they've got a strong motivation to ensure -- to be on the lookout for anything that's potentially going to change just because of the effort required to rebuild the portfolio at that point.
Kieran Shand
And it's a difficult thing to do. It's a very difficult challenge to know when the rating might transition. And what we have here is where we do find that there is that difference between the rating and the credit analytics -- and the CreditModel score where it is 3 notches above or below, there is a significant chance that the rating will transition. So that was really kind of the big piece of work that Giorgio will look at.
Eric Hanselman
That's fascinating because what you're really identifying is that the -- when you're contrasting the model to the ratings -- analysts' rating, in fact, it sounds like maybe there were signals that the model is identifying that in fact, it is causing that shift. And I don't know, is that a pointer to potentially something that might be integrate-able into the ratings model down the road? And again, that's a process that's fairly tightly controlled.
Giorgio Baldassarri
I think this is a very relevant question. I don't think that really the rating analysts will ever be using our statistical models per se. There is a kind of Chinese barrier -- Chinese wall, let's call it this way, between our side of the business, Market Intelligence, and the Ratings side so -- for regulatory reasons obviously.
The ratings analysts do have their own -- some versions of statistical models that they use themselves to define or to sometimes get an idea when they need to review potentially the credit rating that they have assigned to a certain company. So probably, what we are picking here with our own statistical models that have been developed independently from the Ratings side is potentially part of the process that they also follow from their own side. But I would like to say, obviously, what Kieran was mentioning before.
This statistical analysis that he has performed, obviously, is not trying to say that we are able to really anticipate or to predict when and if a rating by an S&P Global rating analyst will change or not. It's more of a statistical analysis based on historical data that points to kind of early signals of potential deterioration based on what we have seen historically. But we are not claiming obviously that we will ever be able to predict what a rating analyst in reality will be able to do.
Eric Hanselman
The issue of, once again, AI and ML techniques don't provide a crystal ball, how unfortunate. But to your point, it's that interesting idea that in fact, you've been able to at least identify probabilities for a ratings change and in fact, something that then can help surveil that particular potential risk and hopefully catch it a little earlier on in the process and saving everyone a significant amount of effort. So when you're putting this together, what do you think the organization should plan for just based on your experiences and when they're looking at technologies like this?
Kieran Shand
Yes. So when I speak with clients out in the market, often, the first big hurdle is the delivery of data. Of course, clients or the different people out there in the market will have a different technical requirement for the data. Some will need to have API accesses, whereas others will prefer a database-based system. Fortunately, S&P has got quite a few different ways that they can get the data to clients at this point, whether that be cloud-based databases or traditional on-premise databases.
But we also -- and this is where we did our piece of work. We also have a new delivery platform called Workbench. And Workbench is a notebook-based environment where users can use a series of different languages within the tool itself, be that Python, Scala or SQL, and they can complete -- you can complete a query. And so you can get the data into your Workbench, and you can also do analysis pieces on top of that.
So if you prefer to do your analysis pieces in Python like I do, you can do it all in one place, which is really a good thing because often what I find with clients is that it takes several steps to get to the analysis piece. Be that -- first order, they have to build their -- often have to build a database, they have to integrate the data that we're sending over to clients, write a SQL query to extract exactly what they need and then finally, export the data and send it up to a notebook environment. So there are several steps along the way.
With Workbench, the data is already there. Workbench, in fact, sits on top of the database, if you will. So the data is there ready to be queried and ready to do analysis pieces straightaway.
Eric Hanselman
That's one of the things that we find as a critical part of really making modeling work more effectively. It's one of the things that we see in our 451 Voice of the Enterprise data, is just simply the data access and prep pieces are such a big part of any particular effort and in fact, access to data is one of the greatest challenges and one of the biggest stumbling blocks for organizations. And actually, Justine Iverson was on just a couple of episodes ago, talking about some of the challenges and some of the things that her team has been doing to be able to actually put this in place.
But I think the thing that's really interesting about this aspect and the point you made is that before you can get to -- Giorgio was talking about being able to define signals from a lot of the data. There's so much legwork that has to happen beforehand. And we're now getting to a point, in fact, where there are tools and platforms and capabilities to actually take out a lot of that upfront heavy lifting to actually bring that in together to get to the point where you've actually got a notebook with data and you can actually start to do real analysis.
Kieran Shand
Yes. I would completely agree with that. And it's almost lost time taking -- having to transform data, build out a database, insert the data into the database. These are several steps which take a long time before you can even write your query to extract what you want and then send the data up to a notebook environment where you can perform analysis.
So those are steps which ultimately take a lot of time and resources and money at the end of the day. That's really where Workbench comes into it. So -- because it is such a flexible environment where the data is already sitting there ready to be queried, ready to be worked with in multiple languages, so it's a great addition to the delivery suite of tools that S&P have.
Eric Hanselman
Well, this is one of those things that Next In Tech listeners have heard me say over and over again. The most important power that we've got in terms of transforming technology is the power of abstraction, to be able to actually raise our ability to be able to now work with higher-level capabilities and not have to do all of the ditch-digging and upfront work to actually put these things together when we've got the ability to actually leverage it and put it to work at those much higher levels of abstraction and usability.
Kieran Shand
Definitely agree with that, yes.
Eric Hanselman
Well, this has been great. Thank you, both. I appreciate the perspectives on this. And again, it's great to actually have the -- we've talked a lot about the data, the capabilities and how we put it together. But here, you've got real-world examples of what you can actually do and some of the challenges faced in putting all of this together. So many thanks.
Giorgio Baldassarri
Thank you, Eric, for having me here.
Kieran Shand
Thanks, Eric. Really enjoyed the conversation, and thanks, Giorgio, as well.
Eric Hanselman
And that is it for this episode of Next In Tech. Thanks to our audience for staying with us. And I hope you'll join us for our next episode where we're going to be talking about the Women in Technology study that's just recently been published. I hope that you'll join us then because there is always something Next in Tech.
No content (including ratings, credit-related analyses and data, valuations, model, software or other application or output therefrom) or any part thereof (Content) may be modified, reverse engineered, reproduced or distributed in any form by any means, or stored in a database or retrieval system, without the prior written permission of Standard & Poor's Financial Services LLC or its affiliates (collectively, S&P).