With all of the furor surrounding generative AI, a more fundamental understanding of the technology can provide better insights. Peter Licursi and Chris Tanner of Kensho return to look at the history and technology on which all of the AI hype rests with host Eric Hanselman. Language models have been with us for years, but the combination of lower-cost computing and the ready availability of large volumes of training data has transformed the types of problems that we’re able to address.
Subscribe to Next in TechSubscribe
Welcome to Next in Tech, an S&P Global Market Intelligence podcast where the world of emerging tech lives. I'm your host, Eric Hanselman, Chief Analyst for Technology, Media and Telecom, at S&P Global Market Intelligence. And today, we're going to be doubling back to talk more about the basics of artificial intelligence with my returning guests, Peter Licursi and Chris Tanner. Welcome back to both of you.
Thanks, Eric. Happy to be back.
Thank you, Eric. Good to talk to you.
And it's great to have you back. When you were last on, we were talking about a lot of the headline aspects of AI, a lot of the buzzy, hypey kinds of pieces. And I thought it would be really useful to be able to really dig into when we start talking about what are all of the crazy, hypey ChatGPT, OpenAI, all these kinds of things that are buzzy, buzzy, buzzy today to really take a step back and understand what these are really built on top of, and we start kicking around a lot of terms like large language models, generative AI.
What does that actually mean? And what are the foundations on which that happens to be built? I mean these are things that from the Kensho side, and actually for those listeners who didn't catch the previous episode, both of you are part of the Kensho team. These are things that you're working with. These are the foundations of the Kensho technology, right?
Yes, that's right. I mean, from our perspective, it really couldn't be a more exciting time to be in our business. And I think we see kind of Kensho and S&P as particularly well set up to thrive in this kind of environment. Obviously, Kensho has a very small team. So from the perspective of kind of competing with some of the larger AI companies, we may not be able to compete in terms of the raw number of people that we can commit to one of these experiments in terms of large language models.
And we might not have the same type of unfettered access to computational power. But when it comes to differentiated data, there really is no better partner in our domain than S&P Global. And so we're extremely excited at Kensho for this opportunity. And some of the amazing work that Chris and his team are doing on large language models, experimenting with S&P data and how that can be a differentiator is really, really exciting to us.
And I would just add because part of your kind of prompt is kind of talking about what is hype and what is real. I think Chris will be the real expert there. But from my perspective, what's exciting about Kensho's position is we've always been really laser-focused on how AI can make real concrete value for an enterprise like S&P and S&P's clients. I know Chris approaches R&D from a similar perspective, right?
Like how can what we build not just be really intellectually intriguing but also create real concrete value downstream. And so from our perspective, we're really well situated in this environment to, I think, thrive and make S&P Global not just a leader in the application of this technology, but also a leader in the development of this technology.
And there's really no doubt that large language model technology has this really kind of enormous transformative potential not just for our society, but for businesses in particular. And I think in our industry, S&P Global's industry, there's no doubt that that's the case as well. We see S&P Global's position, given the relationship that we have with S&P and Kensho, as having an opportunity not just to deploy this technology, but to really be a leader in its development and application.
When it comes to any AI initiative here at Kensho, we always have to kind of ask the question, why us? And the reason is we exist in a really competitive field that's constantly changing and it's rapidly innovating. So we always kind of need to be cognizant of our competitive advantages. We are a small team, but we also have a team that has really enormous expertise.
Our research team has some of the top researchers in this field, in natural language processing, in particular. We have a lot of momentum in the sense that we've been working on this problem actually for more than a year now, specifically developing a large language model. We also have a great operating model with S&P Global. So we're giving that room to do that research and development and experimentation, take some really big swings to impact the entire enterprise.
And we also, at Kensho, have a really great proven track record of deploying AI successfully at S&P. So I think we're really well situated to take advantage of this moment. And one of the most fundamental dynamics that gives us an advantage, I would say, is our access to S&P Global data and also subject matter expertise. This is something that we see as a vital differentiator that really does give us that competitive advantage against even enormous tech companies and well-funded start-ups with kind of unlimited access to computational power and even enormous teams when it comes to machine learning.
Those are the kind of advantage that we feel are core to our positioning. And it makes us really excited to be a part of this moment in the broader kind of development of this technology.
Well, because I think practically, a lot of the questions that we get from clients today are all focused around, all right, this is amazing, this is fascinating, what do you do it? How do you actually put it to work? And what are the things that it can accomplish?
But I guess before we get too far down the road, I actually just like to start with the basics of what's a language model. People talk about large language models. So if we could dig into what is that language model, and Chris, give us a little background in terms of what they are, where they come from.
Yes, my pleasure. I mean it is exciting that these things, these terms are becoming household phrases, which has never been the case before. But the reality is those of us who have been researching natural language processing, I mean this dates back decades actually.
So really far out, make sure we're all on the same page. Natural language processing is a field that is strictly concerned by trying to understand and leverage human language, right? We're trying to get our computers to understand our language. So historically, ever since the '60s, folks have focused on a few dozen problems, AKA tasks, in order to try to accomplish this, right?
It's like too lofty of a thing to just immediately get computers to interact with us and to understand our language. So we carved out a few dozen problems, and we've worked on for decades, things such as translating languages from one to another or trying to identify the sentiment within a given sentence or a paragraph or automatically discovering the topics that are discussed, things like that. And language modeling has always been one of those tasks, right? It's just one of the other things that people focus on.
And to get really specific, language modeling is completely concerned with trying to estimate the probability of any sequence of words. That's all it is. It turns out it's pretty hard, right? And to be really clear, when I say the probability of any sequence of words, I'm not saying the probability that, that thing is actually true or not.
Just how probable, like how realistic is that given sentence based on the data that you've encountered? And so much of that, I guess, as Peter was saying, depends on the data that the model has been trained on and setting it up for what does the model expect based on the data?
And that's why it's really important to have data that is pertinent to your eventual use case of how you plan to use this thing. Examples that I like to give in class when I introduce language models is that you could build a language model over anything you want, either from characters on some TV show or language corresponding to an entire particular domain, such as finance or business or over the entire English language.
Those are really important to make sure that whatever you're trying to do eventually, right, for a business use case or if you're trying to do research, that whatever your language model's intent is that the domain of data going into it is pertinent to you.
Well, and that, again, really speaks to Peter's point about the data. I mean, Peter, this is something where if we think about doing things like a lot of the early work around transcription, so much of the information about understanding what is the great volume of financial data that we deal with day in, day out is so critical.
But I thought Chris' point about sentiment analysis, that's getting relatively sophisticated. But especially that's sensitive to the kind of data that you feed into it and the way you train the model in an area in which, strangely enough, S&P's got a lot of data.
What's kind of interesting about these kind of use cases is we oftentimes think of large language models understandably as primarily a technology problem. They certainly are, right? It is a technology. It requires a lot of development. And Kensho, that's kind of our specialty.
Where we partner with S&P is on figuring out how these things can actually create value when they're applied as kind of products or integrated into workflows, interfaces, that kind of thing. And I feel that one thing that we have not seen entirely proven out from a business standpoint, especially given all of the kind of excitement around the technology, which is, in my view, absolutely warranted.
But I think one thing that businesses are still kind of figuring out is exactly how that value is going to be produced. I think to your point around data, our customers have expectations around how they're going to interact with that data and how that process will change but I think what remains to be seen is, is in exactly what way. So there are some obvious ones, right, being able to have more conversational AI interfaces with the data.
But the thing that I keep coming back to is, a lot of this is a bit easier said than done, right? There's the actual development of the model, which is extremely challenging and unbelievably sophisticated. But there's also lots of stuff that normally kind of plagues these enterprises when it comes to data engineering and infrastructure challenges, right?
I think one misconception is this idea that, and I think this is true with a lot of AI and machine learning in general, is that there's this kind of magic. And once you have the model and kind of release it into the wild, it's going to organize the whole world for you and then present everything with a nice cherry on top.
And I think there's actually an enormous amount of difficult hard work and decisions that are made not just in terms of R&D or in terms of machine learning and model development, but also in terms of infrastructure, data engineering, legal and ethical considerations. And so that's kind of the exciting challenge that we have, I think that all businesses have when they think about how they're going to deal with large language models and deploy them at scale.
And I guess that gets us to that next question of when we get to a large language model, I mean, Peter, you were talking about what the language models are in general. How have language models evolved and how did we get to where we are today?
Yes, it's a great question. So as I was saying earlier, historically, language modeling was just one of the other tasks that folks focused on. And it was only recently in the past 7 years that we've been able to see that if you build these language models large enough, meaning enough parameters, and if you ingest it with enough data, then only then are these models truly powerful enough that they can service the crux, the backbone, to fuel state-of-the-art performance on pretty much any of the tasks that we focus on for decades.
And not only that, but there's pretty much no ceiling towards being able to at least do something reasonable for any other task in the world. As long as that task can be represented in some form of human language, even things that's complicated as playing chess, a language model can do something with it. No guarantee that is great. Definitely no guarantee that it's perfect. But if you build a large enough model and you give it relevant enough data, it's probably going to be able to do something useful.
So you've hit on what I think is an interesting point, which is if it can be represented as a language. And I think maybe that's the crux to that issue of we're using a language model for things that are not strictly speaking, language applications like playing chess.
Yes. And usually, though, we do focus on -- like when I say we, anybody who's looking to use language models. Usually we're restricting it to something reasonable, right? To zoom out again and to kind of piggyback off of what Peter was saying earlier in terms of companies looking and trying to figure out how they can leverage this amazing technology.
The short answer is if any company or any organization aims to do something meaningful with human language, it's definitely the case that a language model will be that main mechanism for which you should probably consider using. Like that will probably give you the best results. But then, yes, that opens up the big can of worms of how do you use it. And even if you do have an incredible language model, maybe that organization themselves didn't make it, but maybe some third party, one of the large software companies, there are many ways that you can go about it, right?
You can either try to go down too many technical details here, but there are kind of 3 canonical ways that one can use a language model, and you can access an existing one via an API, right? You pay some small fee, you give it data, you get data out, just like what we see with ChatGPT made available by OpenAI. There are pros and cons of each of the approaches, but that's one approach.
Another approach is that you take one of these existing large language models. And if the organization that made that model, if they make it completely transparent and available to use, then you can adjust it to your own use case by giving it more data, right? You're fine-tuning the model.
And the third approach is you build an entire language model from scratch using your own data. So there are pros and cons to each of these. And to Peter's earlier point, regardless of which option one chooses, there's a lot of engineering work to actually use it in a meaningful way towards one's actual use cases. So it provides a lot of opportunity, but nothing is completely for free, right?
That's with everything in technology, whether or not you need massive amounts of computational power to ingest everything on the Internet, which seems to sort of be the OpenAI approach or things that are a bit more targeted.
Yes. and teach their own, right? Every organization has different needs. So there's not necessarily one size fits all, but it really is compelling what these large language models are able to do these days.
So what's different about this next stage in the use of large language models and getting to things like generative capabilities out of regressive? Are they leveraging a large language model differently? How are they actually putting that to work?
Yes. So there are 2 main paradigms of implementing a language model. And these aren't new. These 2 paradigms have existed for decades, basically. Going back to my earlier definition that a language model's entire responsibility is to estimate the probability of any sequence. Well, it turns out, that's a very difficult thing to do, right?
There are intractable number of combinations that generate any sequence of words, right? So you have a very small vocabulary, just as a cartoon example. Say you have 10 vocabulary items in your entire dictionary. Well, there are 10 options for the first word. There are 10 options for the second word, 10 options for the third word and so on. So it's 10 to the power of however many words you have in your sequence. It turns out it's a very large number.
So zooming all the way out here and saying language modeling is very difficult, so that there are 2 common approaches that people have taken. One is an auto regressive approach, whereby you're just trying to predict the next word in a sequence, right? Some word comes in, what is the most likely next word. What is the next most likely word after that and so on.
And the other approach is very similar, but instead of trying to predict what is the next word, you give it the entire sequence and you mask out, you hide some randomly selected word in that sequence. And you try to predict what that word was, the word that was missing. This is called a masked language model. They each have their own pros and cons, but these are the 2 main benefits.
And the reason that we're seeing such incredible performance these days is simply what I said earlier. We just have tons of data now, and we have the computational power. We, meaning like the entire field or we as a society, we have the computational power now. But the amazing thing is we had a lot of these ideas are really old.
That's not to discredit a lot of the research that has taken place over the last few decades. There are definitely some advances and tricks that people have learned along the way. But it is interesting to me. But overall, it's the same 2 paradigms that folks have focused on for decades.
But it sounds like now with a mechanism to be able to constrain the full amount of computational power required, you can now start to solve a set of problems in which now that we have the computational power that's available at reasonable costs or quantities and reasonable cost and the access to data, that trio of intersecting capabilities now starts to get us to a point at which we can do some really fascinating things.
Yes, exactly. And I don't mean to sell short the innovations that we've made along the ways. In 2017, there's a new architecture that came about. So that has not existed for decades, but it was a new model, it was a new architecture called a transformer.
And it really is profound. Transformers are incredibly powerful. They've fueled tons of innovation within the field. So there are definitely advances that have happened. But in terms of kind of the fundamentals of language modeling, a lot of these principles have been in place for decades.
Now we touched on it in the previous episode and we've sort of hinted at it in the conversations here, but one of the big questions is how much computational power is required to come up with the result that you want? And that if you're OpenAI, then you've got access to vast quantities of computational power and their approach has been to pull in vast quantities of data.
But if you're trying to do something in which you actually have budgetary constraints or computational constraints, you got to figure out how to actually target what you're trying to do to the specific capabilities. And whether or not it's the computational power you've got, the speed with which you want to execute it, the total volume of data. You've actually got to be able to match those appropriately. And that's a lot of what, Peter, you and Chris have been doing in terms of really matching the appropriate levels of capability to the outcomes you're trying to achieve.
Yes, absolutely. I think from our perspective, the narrowness of our use case compared to all of human knowledge in the entire Internet, if you want to call that narrow, but I guess any domain is narrow compared to that, is actually a huge advantage from our perspective strategically.
And Chris can talk more about the kind of technical implications of our kind of domain and use case specific kind of verticalization of this technology. But from a kind of strategic and business perspective, we see it as a huge advantage because it means for us lower costs in terms of compute, a little bit more certainty in terms of the accuracy because we can kind of leverage the data that we're extremely confident in to produce more accurate results.
And so from a business standpoint, especially for a business like S&P Global, where accuracy and kind of ground truth are the absolute bedrock of what we provide to customers in the market more broadly. This is something that is both an advantage from a cost standpoint, but also an advantage from kind of a quality standpoint.
Yes. I mean because realistically, you're trying to come up with an environment in which you can trust what's actually being created. And a lot of that comes from understanding what's the data that it got trained on and then having confidence intervals in that, that are going to allow you to understand to the extent to which you can actually trust the output.
Yes, exactly. And I love how you framed it, Eric. It really is the case, so it's kind of a balancing act. Depending on what one is trying to do, you always need to focus on how much data are you giving the language model? How big is this language model going to be in terms of number of parameters.
And for those who are outside the field, I'll clarify that a parameter is like a little knob. The language model or any computational model has parameters. Its entire goal is to automatically adjust its own parameters. This is why it's called machine learning. The model is learning automatically based on the data to adjust its parameters. So the size of these models are oftentimes measured in terms of how many parameters they have.
And earlier, when I was saying that these large language models start to exhibit these emergent abilities to do tasks such as play chess, what we're seeing is that those properties only are really possible once you have a large language model the size of at least 100 billion parameters. And that's a lot. This is all connected to the aspects of, well, how much data do you have because it's all the trade-off. Even if you can afford it in terms of computational power, even if you can afford a 100 billion parameter model, if you don't have enough data, then it's not going to serve you well.
You're going to effectively memorize your data and will not have the ability to generalize well the other data that you see in the future. So it's definitely a constant balancing act. But to Peter's point, we have a wealth of data, thanks to S&P Global, and we know exactly the types of use cases and business cases that we want to solve. So it allows us to have a laser-sharp focus and to provide very specific solutions.
And maybe the other side of the coin when it comes to data in terms of the output of the model, the interpretability of the results. And I know this is something that Chris and team have invested a lot and actually really innovated around.
And this is really from a kind of product and business standpoint, something that we are really excited about, which is this dynamic where when you're interacting with some of the other large language models and the interfaces associated with them, the most popular right now being ChatGPT, it's a proverbial black box. You make a request, you prompt the system. The system gives you a response.
There's really little understanding or interpretability when it comes to the results. And something that we are foregrounding in our work on large language models, which Chris can elaborate more on without giving too much away, of course, is making sure that the end user of our systems and whatever model we develop we'll be able to understand how the model arrived at the result or prediction that it has produced.
And from a product standpoint, that's really important because the clients that we work with, the domain that S&P Global and the various divisions work in require that kind of transparency in order to have a system like this be trusted, which is definitely something that I think we will increasingly start to see, and I think have already started to see as companies begin imposing certain regulations on the use of this type of technology. But Chris, I'm not sure if there's anything else to add in terms of the interpretability.
Yes. Yes, there's a lot to add here. Interpretability is critically important, right? We all are aware of this in part because these large language models are often viewed as a black box as they usually are, right? Like usually you don't have complete access to them. And even if you do, the reason why the language model has generated a particular thing, it's not clear.
So yes, there are 2 important characteristics that we should definitely focus on: interpretability and then the accuracy of the model, right? All language models have the ability sadly to hallucinate information, to make stuff up because they're just trained to predict the next word. So there's nothing safeguarding it. There's nothing kind of guardrailing it from saying anything, right? It could go off the rails and generate anything. So it's really important to us.
As we've seen, a lot of prompt responses.
Exactly. So it's really important to us to try to minimize that, right, and to maximize the accuracy. So things to having access to incredible databases and knowledge graphs within S&P. We have some clever approaches to use those, right, because we know that the information contained within the knowledge graphs or databases, but those are accurate, right? Within CapIQ, we know that those are facts. As much as we can make them completely accurate, those are facts.
We have some clever ideas and we've prototyped this, we have demonstrations of this that we can leverage the language model and prompt it in a certain way that it can use those knowledge grafts. And our method of doing so provides some interpretability so that you can see why the model is doing what it's doing and how it's accessing those knowledge graphs.
Is interpretability different than explainability?
Yes, a great question. It actually is. So they are very related fields. Interpretability is more of having a glimpse into what and why the model did something. And it doesn't necessarily have to explain it to you, but you have some transparency to it. Like what mechanisms within the model, like what were they doing? How are they affected by the data that was coming into it?
What does it actually know or not know? Are there certain things that you can do with certain models? You can inspect them in certain ways and that is a form of interpretability. But in that situation, it's not explainability, it's not explicitly explaining anything to you, but it's up to you. It's your responsibility to make sense of that.
Identifying confidence levels, something on that or...
Yes. Yes, exactly. That would be one concern of one is trying to interpret. How confident was it in this? It should be pretty obvious about how it's so related to explainability. Sometimes it's impossible to make a distinction between the 2, but they do have different focus.
Fascinating. Well, this has been great and has certainly gotten me to a higher level of understanding about a lot of the background for what has been a subject that has certainly -- is getting a lot of conversation. But hopefully, we've moved the ball forward with a little better understanding. So thank you both.
Yes. Thanks, Eric. Happy to be here again.
Thanks so much, Eric. Always a pleasure.
And that is it for this episode of Next in Tech, Thanks to our audience for staying with us, and thanks to our production team, including Carolyn Wright, Ethan Zimman and Syed Wajih Abbas on the marketing and events teams. And our studio team, Kyle Cangialosi, Derek Brown, and Darren Rose.
I hope you'll join us for our next episode where we're going to be looking into a whole range of aspects about data security, data privacy around customer experience and employee experience and how those are starting to merge together. I hope you'll join us then because there's always something Next in Tech.
No content (including ratings, credit-related analyses and data, valuations, model, software or other application or output therefrom) or any part thereof (Content) may be modified, reverse engineered, reproduced or distributed in any form by any means, or stored in a database or retrieval system, without the prior written permission of Standard & Poor's Financial Services LLC or its affiliates (collectively, S&P).