podcasts Market Intelligence /marketintelligence/en/news-insights/podcasts/next-in-tech-episode-134.cshtml content esgSubNav
In This List
Podcast

Next in Tech | Episode 134: Observability and new operational models

Blog

Infographic: The Big Picture 2024 – Supply Chains Outlook

Blog

FIMA EUROPE 2023: Exploring the Intersection of Data, Governance, and Future Trends in Finance

Blog

Infographic: The Big Picture 2024 – Energy Transition Outlook

Blog

Infographic: The Big Picture 2024 – Capital Markets Outlook

Listen: Next in Tech | Episode 134: Observability and new operational models

Managing complex application environments successfully requires not only insights into both application behavior and the infrastructure that’s supporting them but also the ability to correlate all of that data. Analyst Mike Fratto returns to the podcast to explore observability, an approach to effectively managing new application patterns that steps beyond traditional methods. From the cloud native world, it’s gaining wider traction and simplifying operations with observability pipelines.

Subscribe to Next in Tech
Subscribe

Presentation

Eric Hanselman

Welcome to Next in Tech, an S&P Global Market Intelligence podcast where the world of emerging tech lives. I'm your host, Eric Hanselman, Chief Analyst for Technology, Media and Telecom at S&P Global Market Intelligence. And today, we're going to be discussing observability and what it means in the wider context of IT and IT operations. And we're going to be doing it with returning guest analyst, Mike Fratto. Mike, welcome back to the podcast.

Mike Fratto

Hey, Eric. Thanks for having me back on. I guess the first one went well.



Question and Answer

Eric Hanselman

We never grade our guests. But when it comes to interesting things like observability -- well, actually, before we get too far down the road, I think it's probably worth working through what is observability because it's a chunk of the world in which it came out of, a lot of the cloud native thinking. It came out of a need for greater levels of visibility in terms of operation. But I think it's probably a term that there may be some confusion around and maybe not all of our listeners are familiar with.

Mike Fratto

Yes. No, there are some confusion around sort of what this observability stuff is. Some people think that it's monitoring 2.0 and that's not really accurate, right? So before the -- you mentioned cloud native, right, and microservices. So before that, when applications lived on servers, you can point to a server and go that's my ERP, that's my database.

We had monitoring software and monitoring systems, and we had application performance monitoring to -- on our applications. We had a number of performance monitoring to monitor the network. So we have log management and real user monitoring and all of this for monitoring platforms, and each one of them gives IT a slice of data.

And then it was up to the operator, the administrator to play swivel chair back and forth in all these consoles and be the glue. And it was really hard even then to have an understanding of the application to say what was happening and was something broke, why the break, where the break and where to go to fix it.

What observability attempts to do -- observability platforms attempt to do is to provide that end-to-end, top-to-bottom view of the application IT estate. So it takes in data from every data source it possibly can, right, if it's -- whatever IT is feeding it. So it can be logs coming in. It could be data coming in from a CMDB. It's data coming in from a cloud service, the infrastructure as well as the applications, just application data. Applications can be instrumented or can have an agent running on them, which is collecting data.

And all this basically goes into a data pool. I guess folks can call it a data lake if it gets large enough. And then analytics, the algorithms run a process and create visualizations of the application and dynamically create a topology. So now you have a context. So when you see an application component, where it sits in the application stack and in the infrastructure stack and when there's a problem, it can highlight not only that this application has a problem, but here's the component that has this problem.

And then other algorithms will further elevate this by providing more context. Like this component is having an issue. What are all the things it depends on and are they causing this issue? And then on the other side, what are all the applications that depend on that component?

So from just a visualization standpoint and an understanding sort of the application state, it helps operators understand these very complex applications, especially when we're talking about cloud native applications and microservices, which are made up of hundreds of thousands of components. So it's a completely different set of capabilities that pulls together raw data, already processed data, enriches it and then presents it up. But we can do a bunch of other things with it besides just breaks checks.

Eric Hanselman

Well -- and doing the thing that we've always tried to do but didn't have the ability to really sort of integrate it because application performance monitoring would tell you, oh, hey, wow, suddenly, this part of the app has suddenly become really slow. Refresh times, responses back to end users are suddenly slow.

And then you go through the process of taking a look at all the various monitoring tools you had, and each of the teams supporting them were really trying to minimize their mean time to innocence of, hey, it wasn't me; no, no, the network looks great; storage is really running well; servers are all really well -- underutilized; hey, couldn't have been us, but doing the thing that now raises that operational visibility to a new level of abstraction, again, one of my favorite words, to be able to actually look at infrastructure performance as a much larger whole, which is something that's been really hard to do.

Mike Fratto

It has been hard to do. I mean, of course, we see and I've seen that [ all these sources see ] over the years just you can say infrastructure monitoring, there have been a number of attempts to try to build dependency trees and dependency graphs of how the infrastructure is created. And I remember early attempts at even just mapping the network and creating a dynamic network of what's actually attached, how is it attached to the ports. So these were really hard problems to solve.

They've been solved now. So that kind of graphing is done. But you're right. It's a bunch of that sort of siloed monitoring led to a lot of finger pointing. It was comparable across different systems. How these are measured can vary a great deal. So that left the burden on IT to do all of that glue work and all of that analysis.

And so where there's observability, depending on the platform or product, it could have a different personality for each team, which is helpful, but they're all operating on the same data. So it starts to remove a whole lot of the roadblocks to collaboration across IT teams because you're working on the same sets of data. I don't have worry about how is it being processed and presented and so forth.

Eric Hanselman

Well, I mean, the comparable point, I think, is a really important one in that we were collecting data in different ways, in different spheres of the infrastructure that we were trying to manage. And so then it was really hard to correlate all this. And I think the point that you made about the shift to cloud native patterns is really the thing that started to exacerbate this problem. We got to this issue of having what were environments that were even more disconnected and harder-to-understand interactions when you got into all of the kinds of microservices patterns and things that we're now starting to bust apart.

It's not just that the app as a whole wasn't doing well. Now you had interactions and dependencies that are much more complicated because you have all of these independent services, any one of which could have potentially been causing a problem. And now you've got this proliferation of things you've got to figure out. So I guess, observability is something that we get to blame on the cool kids for making all this complicated microservices business.

Mike Fratto

Well...

Eric Hanselman

It's their fault.

Mike Fratto

It's their fault, exactly. It's trying to make it simpler, right? I mean one of the aspects you didn't touch on, which is exactly really important, is the way microservices run the components that are running today, probably going to be run tomorrow. They probably weren't running yesterday, right?

So if you have a problem and you want to do a postmortem, the application that was causing that issue or the component that was causing that issue, that's gone. It's been replaced by a new one. And the way the microservices systems -- container systems sort of operate is if the container systems are behaving badly, you just kill it and start a new one, right?

You just keep sort of putting these things off and so where is that emphatically? You have to go back and relate to it and traditional monitoring tools just didn't have a really good capability. The expectation was that application stopped running. And so now you can go back and look at it. But if the instance is gone, you can't interrogate it.

Eric Hanselman

Yes. I mean traceability gets really broken by the ephemeral nature of most container implementations in that, as you said, container is not around all that long. And hey, if you killed it and restarted it, was it a problem with a container? Was it a problem with the resources that it needed? How do you actually figure that stuff out and have enough history and the ability to correlate enough of these different aspects to understand what went wrong and why did that container have to die?

Mike Fratto

Yes, exactly. I'll raise my coffee cup to it.

Eric Hanselman

Well, so is observability one of these things that we can overlay? Is this -- I'm curious about -- really, as always, the hard part in any kind of shift in technology is the transition effort in order to make this actually work. And how do you move from where you are today and really what sort of migration paths exist to be able to get to this better state? And I guess my take is that it's not something that's easy to overlay. There are some fundamental transformations you've got to work in order to both collect the information, right? There's a lot that's got to be done, though, to make this happen.

Mike Fratto

Yes, there is. I mean when organizations start using observability platforms, when they start migrating to them, they start changing their workflows and their processes. This is actually good because you can change the workflows and processes in ways that are more efficient, so they can get to work faster and they can work much more reliably. Typically, the migration path is incremental or it can be incremental. So I'm going to tease up a survey that we just completed, just got the data today. It is fresh off the farm and we asked...

Eric Hanselman

Fresh-picked data.

Mike Fratto

Fresh-picked data. Yes, I'm thinking strawberries. And so this was a survey of like professionals that -- or who use or are affiliated with observability tools. So it's very focused on this sort of domain and this topic that we're talking about. And while we haven't quite vetted out the numbers yet, so I can't really get into specifics, we did ask -- the question is what are the observability tools that you're currently using. And this is a jumping off point for what are they using now and where they're going to move in.

And so it seems like application performance and network performance monitoring and log management and analytics are some of the sort of top sets of tools they get excited most often. Now what's interesting is we have a sort of subcategory. There's 12, 14 different subcategories that we can classify under observability, and we've been doing this for years.

And pretty much all of the trend has been vendors in all of these segments have been adding, what I'll call, observability-like features. So really, it's more around analytics, whether it's just static algorithms, machine learning or perhaps AI, across all of these different subsegments.

So for example, vendors that do alerting, right, they would have utilizations for how -- and analysis for how workflows sort of were being processed in times and that kind of time series and so forth. But then they're adding more analytical capabilities to look at success rates and failures and so on and so forth. But that's in a single subsegment across a single technology or product. So it doesn't have that sort of end-to-end, top-to-bottom capability, observability. But it's...

Eric Hanselman

It's that sort of observability-esque implementation.

Mike Fratto

Yes. Yes. And then last year, when we did our Market Monitor, which is sort of forecasting tool, we added observability platforms as a separate segment because there are actually a class of products and services that do provide that end-to-end, top-to-bottom sort of view of the world, and these products can take in both raw data as well as data from these other platforms, right?

So they can sort of consume it all as long as it's textual-based, right? And that's sort of the big challenge, is taking in all of this data and then being able to do something with it. So there's a lot of integration work that vendors are doing to pull in data feed formats from various locations. There are open-source initiatives like OpenTelemetry, which are attempting to standardize the data collection and the data formats, which are going to simplify the integration or the importing of data.

And so those kinds of challenges are going to become -- they're going to be reduced as the products are developed. And then the other kinds of challenges organizations are seeing are sort of around cost base issues like the amount of -- just the amount of storage, particularly when they're using a cloud service or a SaaS service, and it's license based on the volume of data that's collected versus the number of agents or the number of collectors or what have you.

Eric Hanselman

And I mean that's a problem that we see in all aspects of IT operations, which is that how long can you afford to hang on to that data, and especially, to your point, as we look at cloud services that, in many cases, have a default of 30 days' worth of retention and they're going to charge you for longer time frames. Is that going to be enough? And starts raising some questions about how you manage those costs.

Mike Fratto

Yes. And not all data needs to be stored at 5-second interval, right? You can summarize -- after a certain point in time, you can start summarizing data up into larger blocks of time, for example, or you can get rid of a lot of metadata. The notion that store it all and we'll figure out what we can do with it later, that was really good when you're restoring it all on-premises and disks are relatively cheap, right? When you're using a cloud service, those costs just continue to grow, right? And so not only -- as you're storing it, even for locker room storage, we use -- we're pulling that data back out, and there are performance issues and so forth.

Eric Hanselman

Hence, storage starts to become an operating cost as opposed to a fixed capital cost just because you had that big array that you could dump everything into and not have it cost you more until you needed to expand the array. Well, so that then raises the question of really how should enterprises approach this. And I mean is this something where, yes, there are some operational shifts that have to take place? And what does that mean? I guess you have to go into this with an understanding of how you're going to manage data growth and what you really are going to need to retain and what are the pieces you need to be able to actually make intelligent decisions about what you're actually deciding on.

Mike Fratto

Yes. It becomes a data management issue rather than a data collection issue. IT can deal with data collection. They've been doing it for years. But it's the data management side of it -- just what you said, right? What do we collect? How long do we collect it for? What regularity do we collect it for? Where do we store it? Do we have to do stuff with it? Do we have to mask it if it has PII? Do we have to store it long term? Again, a financial recording, for example, client conversations, that kind of thing, so there are a bunch of data management issues.

And the thing with observability, because it's collecting all of this data from all of these locations and because these applications -- there's a couple of factors. A lot of data sources from a lot of places that are being collected in one place, you have more dynamism in the IT infrastructure and the application infrastructure, so things are changing much more quickly.

It's not like you can install an application, set up its logging and you're done, right, for the life of that particular application. Because those components may be in different places, you have to be collecting data for all of these different locations. And then one of the things we saw from our survey last year is organizations have more than one destination, right? They might have multiple monitoring platforms that they're using or observability platforms that they're using. They might be taking the same data that's being collected and sending it to the IT ops team and the security team.

They may be sending data off to long-term storage. It may be going off to some kind of escrow. There's a lot of places where data ends up going. And so imagine this mesh of data sources and data destinations that are dynamic and they have to change in real time.

Eric Hanselman

And this is the good news and bad news, right? Hey, we've got more flexibility. Oh, no, we've got more flexibility.

Mike Fratto

That's right.

Eric Hanselman

And in a multi-cloud world, hey, it's important to be able -- as we keep saying over and over again, it's important to be ready to expand to whatever that next cloud platform is. But it comes with a whole set of new -- all that telemetry that gets thrown off of the new environment, and whether or not it's on-prem, off-prem, cloud or what have you, being able to support lots of that different varieties is useful but presents some challenges.

Mike Fratto

Yes. Yes. And so there's a class of products that have been around for 5, 6 years called observability pipeline. And the quick description of observability pipeline is it's -- think of it conceptually as the central point that collects and then redistributes observability data, telemetry data, right, whether it's metrics, events, logs, traces or whatever it happens to be.

And so the idea is you have this concept of an observability pipeline. It could be one or more instances. It could be distributed, centralized. It doesn't really matter. Those are details. But all of your applications, all of your infrastructure, cloud services, what have you, point -- all of those data sources act as observability pipeline.

And then it's responsible for now redistributing that traffic to all of the various destinations that it needs to go to. So you get data routing, which is really handy. I can tell you, back when I was touching IT, I'm talking 20 years ago, I was doing data routing with syslog messages, and it made my life so much simpler, right, in my data center.

So you can do data routing. You can do data filtering. So you don't have to send all of your data to one place. This is part of the cost issue but also part of the performance issue. So you can still route data going to different locations. Some products allow you to do masking, so you can remove, say, social security numbers, identification numbers, card numbers and so forth. You can encrypt. There's a lot of functions that you can do within the observability pipeline, and you do it in one place, really easy to manage.

And so now the process is when you bring up a new application, you just point the logging that the event data, the -- whatever is being sent, you can just connect to the pipeline. And then there's either a policy that's going to be able to deal with it or you create one.

Eric Hanselman

In the everything old is new again, it's sort of like getting back to a lot of the ideas we had back in enterprise service bus kinds of things and publish and subscribe models that allow you to pick and choose what you want and to have a pipeline that's going to provide it for you in an appropriate format to be able to enable -- make sharing that much easier.

Mike Fratto

Yes. Yes. And because you can -- exactly. I mean, because you can transform the data at the pipeline, you don't have to worry about does this destination support this format or that protocol, just the pipeline. That's it. And just...

Eric Hanselman

And the subscribers get to be -- get to basically grab the data they want, subscribe to sources types. Take your pick. And now you've got, yet again, another abstraction set to be able to manage this for you and to be able to deliver what is this wild and wooly assortment of all of these different data streams and to do it in a way that's just much easier to consume and of course, then make intelligent decisions on.

Mike Fratto

Yes. It's an abstraction at work. It's the technology I'm really kind of jazzed about. But some of the side effects, which are also interesting, is you can actually start instrumenting some pretty interesting cost controls, right? And so sort of the classic example is you have 2 applications. One is a high-value application. You want to know everything about it. One is a low-value application. You don't really care whether it's up or down, right? But what happens, and this happens occasionally in organizations, that low-value application is consuming most of the cost of data collection and retention.

You may not actually see that or you just see your storage costs growing, but it makes it really easy because the observability pipeline can see all the data coming in, coming out and you can relate it to users and groups, applications, whatever you want to do.

All of a sudden, you start managing costs really, really well that you can go, hey, why is this [ Wiki -- obscure Wiki ] over here consuming 50% of our storage costs at the expense of our high-value customer-facing, revenue-generating application, right?

Eric Hanselman

And do that in a way that was really hard to do with traditional means because if, in fact, you've got this one chunk of an application that's sitting there spewing out vast quantities of log information with excruciating detail because somebody turned on detailed logging and -- or verbose logging at some point and forgot to turn it off. Now you've got a tool to be able to throttle that and ensure that it's not chewing up lots of expensive capacity in so many fronts, storage and all the processing and the handling.

Mike Fratto

Yes. Because really nobody sits on that data, right? So nobody knows.

Eric Hanselman

Well -- and hey, it's -- maybe it's a test environment. Maybe there's something that you don't care as much about it or, for that matter, you only want -- certain teams are interested in it. You don't need retention for longer than a week. And now you've got the ability to craft exactly how much you retain and how you manage it.

Mike Fratto

Yes. Yes. So someone in like operations who's managing the observability infrastructure, right, he takes that term observability infrastructure because it sounds big and complex, really cumbersome or creaky. It simplifies tasks like data management because then you don't have to go and tell your developers, hey, you make sure you turn your machine off and you've turned down the logging, please, when you're done because they're busy writing code, which is what you want them to do.

You don't necessarily want to burden them with having to manage their infrastructure. It doesn't sound like a big deal, but I know when your head's down on a project, you're not thinking about those details. You're thinking about I need to get this application written. I need to get this thing debugged, get fixed, whatever I'm doing. I don't need to go over here turning software switches on and off. I don't care about that.

And so it's just isn't going to be lost. So you have somebody who goes in, who goes, hey, why don't you turn this down, we can turn it off. He can collect these -- he or she can collect these locally, but -- instead of being sent off to these [ 4, 5 ] [indiscernible] and consuming resource.

Eric Hanselman

Well -- and one more way in which it's possible to provide guardrails for teams so that more things they don't have to worry about, keep the guardrails sort of around it. So you manage cost, all the other operational hassle and toil that's associated with that and the ability to now integrate this into one more operationally simple whole or simpler whole.

Mike Fratto

Yes. Interesting, from that survey that we just completed, we'd ask all questions around observability pipelines. Growth is increasing, which is promising, and we'll continue to see that growth over time. It's very exciting.

Eric Hanselman

Wow, good stuff. So many motivations to get into the new and wonderful world of observability.

Mike Fratto

Yes.

Eric Hanselman

Cool. Well, thank you, Mike. This has been great. And I will point our listeners to the Voice of the Enterprise study that you're referring to and when that actually gets published. Would be great to get a little preview of sort of where that's going and a lot of good information to dig into once the report's actually out.

Mike Fratto

Yes. I'm looking forward to getting up close to the data over the next couple of days, putting out the reports for the next 6 to 8 months. We'll have the initial draft out in hopefully within 4 weeks. So yes, look forward to it.

Eric Hanselman

Cool. Yes, and hey, maybe another -- have you back on to actually talk about the details and dig into what the data shows us.

Mike Fratto

That would be great.

Eric Hanselman

But that is it for this episode because we are at time. Thanks for being on, and I want to thank our audience for staying with us. Thanks to our production team, including Caroline Wright, Ethan Zimman on the Marketing and Events teams; and our agency partner, the One Nine Nine.

I hope you'll join us for our next episode where we're going to be talking about the upcoming Hosting and Cloud Transformation Summit in a new format, a new location now in New York. I hope you'll join us then because there is always something next in tech.

No content (including ratings, credit-related analyses and data, valuations, model, software or other application or output therefrom) or any part thereof (Content) may be modified, reverse engineered, reproduced or distributed in any form by any means, or stored in a database or retrieval system, without the prior written permission of Standard & Poor's Financial Services LLC or its affiliates (collectively, S&P).