Open Source Databases Performance, Dynamic Tracing, BPftrace - Percona Podcast 27

by Valerii Kravchuk, Matt Yonkovit

Link to listen and subscribe: PodBean

Join the HOSS Matt Yonkovit and MySQL & MariaDB expert Valerii Kravchuk on the latest episode of the HOSS talks FOSS. Valerii is passionate about all things Performance, Open Source, and Databases. Listen to the Database Performance and Bug guru talk about performance tools and methods like dynamic tracing, flame graphs, BPftrace, and more! Valerii is a Principal Support Engineer at MariaDB Corp with over 20 years of experience in the database field.



Valerii Kravchuk

Principal Support Engineer, MariaDB Corporation

Valerii Kravchuk helps MySQL and MariaDB users and DBAs to resolve their problems since 2005. Worked in MySQL AB, Sun, Oracle, Percona and, since 2016, in MariaDB Corporation. MySQL Community Contributor of the year 2019.

See all talks by Valerii Kravchuk »

Matt Yonkovit

The HOSS, Percona

Matt is currently working as the Head of Open Source Strategy (HOSS) for Percona, a leader in open source database software and services. He has over 15 years of experience in the open source industry including over 10 years of executive-level experience leading open source teams. Matt’s experience merges the technical and business aspects of the open source database experience with both a passion for hands on development and management and the leadership of building strong teams. During his time he has created or managed business units responsible for service delivery ( consulting, support, and managed services ), customer success, product management, marketing, and operations. He currently leads efforts around Percona’s OSPO, community, and developer relations efforts. He hosts the HOSS talks FOSS podcast, writes regularly, and shares his MySQL and PostgreSQL knowledge as often as possible.

See all talks by Matt Yonkovit »


Matt Yonkovit: Hi, everybody. Welcome to another HOSS Talks FOSS here. We’re here with Valerii Kravchuk from MariaDB Corporation. Valerii, how are you today?

Valerii Kravchuk: Nice. Thank you, Matt. Nice to meet you again, online. Not during the conference, but during some personal kind of talk. I’m actually proud to be a person of interest for these podcasts.

Matt Yonkovit: Well, Valerii, you’ve been in the community for as long as I have. You’ve been around since you know, both of us. We fought dinosaurs together. I’m pretty sure we did. Woolly mammoths. And we went hunting, there’s those evil dinosaurs out there. We’ve been around for a really long time. And your talks at Percona Live recently were really well attended. But that’s no surprise, because typically, when we’re in person, and I can’t wait to get back to an in person conference, it’s going to be so nice. It really was. But when we were in person, you always had a great attendance at your talks as well, because your talks touch on things that impact so many people in the open source space. And that is typically on finding those really difficult to find issues, those bugs, the performance problems that you that people can’t normally find. It’s really about those deep problems that no one else seems to be able to figure out in a timely fashion. And that’s really valuable to a lot of people in the community. And both your talks this time, we’re about that as well.

Valerii Kravchuk: So yeah. I was trying to speak about different things about bugs, for example, successfully or not, it’s questionable, about performance schema, about replication and whatever, many things, but for last six years or so, based on the idea of our common friend LeFred expressed during Oracle OpenWorld Days we spent together in 2014. He said,”If you want to talk about something, talk about GDB, your talk will be accepted for FOSDEM instantly”. So I followed his kind advice. And they had like two, maybe three talks about GDB at FOSDEM and in many other places. Then I naturally moved on to profilers, more advanced profilers and stuff like that. So like common tool set that are available on more than operating systems like Linux, and can be used not by me, for Percona Server, for MySQL, it can be used by any open source developer or DBA if we speak about databases, PostgreSQL or whatever, MongoDB. So it turned out that the audience is quite happy, and even advanced people, including key developers of MySQL, for example, can find out something that they had forgotten for a long time, or had never cared to study in details in these tools. So it’s getting popular just because of these topics. So I started myself, I become more advanced in the tools I use every day, I am planning to use even more tools. And somehow it’s becoming interesting to other people as well. That’s because of the open source infrastructure we all work on. So I can speak to PostgreSQL people with examples from MySQL and even though the systems are quite different, they themselves find something interesting. In flame graphs, for example, as it happened at Percona, or BpfTrace. I am trying to put it in context. And they understand the context. The details are different, but they clearly understand the context. And when it’s something I’ve worked on in practice, I’m trying to solve practical problems. They feel it and they try to understand how you can move from just reading some menu or funny blog of anyone to the practical usage of the tool. So that’s why, probably, these talks, I can’t say they are very popular, but they are well attended. The last Percona Live in Amsterdam in 2019 was really the case for my last GDB talk today. So people were really excited, even though it was the very last slot for the long conference. We stay around for one hour talking with people of different origins and of different background and not only MySQL. So this is how it works. It was great advice by LeFred, I’m thankful to him for these and for many other things. And that’s what I’m trying to do recently, and it is like six years. More and more tools, we evolve as an ecosystem and release ecosystem over this period. eBPF became a popular, BPF trace, for example, became popular, widely used, maybe outside of MySQL context. So I am trying to be more or less up to date with what’s cool or at least with my own interests, or at least with the interests of my fellow or MariaDB developers, because they know what’s cool in working for MariaDB Corporation, if anything, it’s working directly with the key developers, there are many, they are working on different stuff, and they know them for four decades in person and not in person. So they may have practical problems, they may have questions, I have practical problems. So we are cooperating. And in the process of this cooperation, it turns out that being current with the Linux performance monitoring tools is something both me and they are interested in, and can use and can mutually benefit from our work. And I’m happy when they use these tools, and they are happy when they can make a point with these tools.

Matt Yonkovit: And that’s one of the challenges, especially with developers, I see two things there, developers nowadays, there’s so much code, there’s so many things that they spend on features, a lot of times they don’t know how their code operates in the wild, right? It gets out into the ecosystem, and people do crazy things that the developers never intended them to do. And sometimes that exposes some really crazy bugs, or some really crazy performance slowdowns that are really difficult to find.

Valerii Kravchuk: Yes, but at the same time developers need evidence. They will probably trust my words, but it will not help them to fix the problem. They need evidence. So the tools I’m interested in, they may provide the evidence they need. And I try to provide it in a way they like. So I would say that over the last five years, the habits and preferences of MariaDB developers shaped the main tools I am trying to study. So it’s not like out of the blue, I decided to study something new, no. I was asked this specific question, I was suggested a specific tool. They would like to get stack traces, for example, for everything not the performance schema outputs for whatever reason, we can argue about that. It’s clearly visible in, specifically, MariaDB. So I am trying to provide what they want to see. If it’s Windows, I am trying to study some Windows tools, speaking about the open source, luckily, most of them will be happy if the problem is repeatable on Linux, most of them, almost all of them. So that’s why I have the benefit of using some older knowledge and studying the really key things that are found by others, are created by kernel developers for themselves. And maybe they are not yet so popular among MySQL DBA experts, I have a chance to make them popular. And they use it.

Matt Yonkovit:
Yeah, and I think that that’s one of the things that a lot of companies that have evolved in the last five to 10 years, they’ve started this movement of more and more databases, whether it’s a new open source space, or open source adjacent space, we’ve started to see more companies develop really purpose built databases. And as they do that, I think that their developers are always looking for that additional feedback, they’re looking for ways to find those problems. And I think that the position that you feel is critical to that, which is understanding both kind of the operations and how it’s used side and then also the development side and being able to take those tools that you mentioned, and kind of merge them in so they work better together. And you can kind of cross the bridge between the core developers and not.

Valerii Kravchuk: Make them a part of the daily workflow of both sides. So they can speak the same language after all, without an intermediate person like me. So I surely can figure out things and put them in the words that are acceptable for developers in many cases, but I would prefer to be out of this picture. I would like every MySQL DBA to be able to provide GDB backtrace for the core dump, and I am fighting for these and they do it in production environment in many bands, because if you can make a point and you can show how efficiently it helps to resolve a performance problem, to get the bug quickly verified, processed and fix it, people start to trust this and they use the best of the tools of the open source provides. And as long as they use open source database, for example, there is no reason not to use the tools that work best, that can be easily related to the specific line of the code. So it’s a mutual benefit for both sides. And we do not play games like show me this or that, I can always try to explain, show by examples, and they can verify what they sent to us. We do not ask too many details, too much information that is irrelevant. I try to educate DBAs as well, why I’m asking, how it looks like, why it’s important, what it shows. what it does not contain. So they can be confident that they do not share their confidential customer data as well.

Matt Yonkovit:
Yeah, so with what I know, it’s always easier to get the full amount of detail, if you can get a GDB trace, it’s awesome. But there are companies that are hesitant to provide them sometimes. Right? I mean, it’s often a given take, because I think there’s a lot of misinformation about what’s available in some of these traces, people you mentioned, do think, oh, you’re going to have proprietary data, I don’t want you to have access to or they think it’s going to severely slow down the system or even prevent a stall. And are those things true?

Valerii Kravchuk: Yes, they are true. And that’s one of the reasons while started from GDB from very intrusive way that really stops our process from working, maybe for a long time, that can get into every byte of memory, so from this tool that people would reasonably hesitate to apply in production to more lightweight tools that less impact performance, and to the tools that they can quickly use themselves, verify the data, see that nothing confidential with their data, control clearly will be put there. So they know the tool, if they know the language in such simple as BPF trace, for example, they can actually code their own way to collect the data, those that they are ready to share. So even though in many cases, people are already happy to share GDB backtraces, very few are happy to share the core dumps themselves. And there is some work done in the background in the MySQL community, for the core dumps to not include like InnerDB buffer, with all the data row there, even, in many cases, not compressed, in many cases, not encoded in any ways or just clear strings. So we do not include that in core dumps anymore. And that was a great step towards people being less hesitating to share code dumps, but the tools that we get more and more every year from Linux, are based on similar production experience for people who care about security. Then the next step will be from people who use databases in containers, it’s still a long way forward to that. So we would like them to have all kinds of tools and decide on the best tool for the job. So they do not share I won’t they never can make public and still provide the useful summary. This is one of the reasons behind the steps I am trying to look for from GDB, to real profilers, to lightweight profilers and to find control possibility for the DBA to write down his own programme and verify it, use it, play with it and only then share with me the results will just given away everything he have.

Matt Yonkovit:
So what tools are you seeing that are starting to show up that you’re excited about? Like what are you testing now? What are you looking at right now?

Valerii Kravchuk: Based on the talk, you probably entered a couple of talks recently, I am really happy that I was able to dedicate the entire talk to BPF Trace. This is the tool that is from one side is based on eBPF framework, sSo with options to access everything in a uniform way from the processor registers to all libraries to the kernel to any application software to almost every other line in any function, any application software or the kernel itself. So at the same time there are many eBPF based tools and quite complicated, it is a bit hard to develop them, they end up very specific, many of them are created actually for studying operating system, very few are targeted towards databases, while I understand DBAs are interested in that. They are a bit complicated because you have to write down LLVM C code inside the Python or whatever script to follow a lot of conventions, use built in functions that you have to study. So to become a programmer, it’s normal, it’s OK, and everybody will have to study that. But BPF Trace allow to use a quiet simpler approach, with some price - it might be less efficient, flat, less flexible. But it allows you to do actually code like you do in shell, with text processing languages, or systems like SCD or AWK… Familiar tools that every DBA should already have in his tool set just to automate backup, just to automate the data collection or whatever. So AWK is not rocket science. It’s quite simple. BPF Trace is modeled the same way. And it has a lot of C origins like common constructs that everybody’s familiar with. But that approach of applying specific action or if you see a specific pattern, it’s easier to programme them than writing everything down in Perl or in C, you should not care about too many details, because you have a set of predefined built-in functions. So I’m excited with that, because that will allow people to start tracing their specific things easily. First, with less impact, then with perf and draw tracing. Second, with some ways to programme it. And without any risk to have a programme that does not compile, to have a programme that does not pass verification. eBPF doesn’t do the wrong code to be executed because it’s executed in kernel context. But when you have some obscure error message that some map cannot be resolved, BPF trace should prevent most of these cases. This allows some things by having some automated workflow inside. But it makes it easy to use, both for operating system monitoring and customer monitoring and for one time digging into a specific performance problem. That’s the goal. We would like DBAs to do what Linux Kernel developers do, andmajor network service providers do giving them a tool to dig into the very core of their current performance problem whenever they need, with some coding and some flexibility. So that’s what I’m excited. Honestly, and I said that in my talk as well, I am here to help like external use case when it was really suggested to real life customer to apply BPF trace and they applied and got very useful results, because we have a lot of internal success in that, I have a lot of experiments myself, I am prepared myself. As long as they will start to use in production recent kernels, five point something at least, I should be ready for that moment. And they should be ready as well. Also, I would like to get a collection of good and bad approaches, collection of useful scripts, at least a problem solved. So far, I am basing on the requirements, requests, questions from developers. Recently we for example, discussed if BPF Trace can be used as a base for the code coverage procedures. Like can we trace instrument every function in the code and count how many times it was called. Theoretically it’s possible and there are different ideas on how to do that. And I believe even that is doable with BPF Trace itself. And we can have less serious performance impact then tracing it in perf, for example, where it’s also surely doable. But we would like to get faster results. We don’t want to generate gigabytes of data on disk to be summarised by something else, we would like to get a simple chart that shows for example that specific functions we care about out of hundreds that have never been executed during the days of testing. So we need to monitor for days. Do not follow things down, do not produce gigabytes of output to post process, but still give the task. Do the test set for example, cover the functions we intended or might be some tests, so even such things and it might be as simple as just sampling once in a while the content of instruction point, and not resolving it, if possible, until you summarise everything. So just get a list of addresses and how many times you were there, and then somehow resolve it at the end to show just the result. So we even get in for such things that are not at the moment interesting to any DBA. But in the process, we work on the priorities, we understand the limitations of the tools, we understand that each of the tools I already mentioned, is the best for some specific part of the job.

Matt Yonkovit:
And that’s an interesting use case that you mentioned..

Valerii Kravchuk: The time on a call with, for example, breaks cases when it’s slow, unacceptably slow, at least the way we were able to know.

Matt Yonkovit:
The thing you mentioned is actually a very interesting use case. Because if you think about from a development perspective, as you write your test cases, you make assumptions that certain functions are going to be called and used. And they might not always be. So there’s some things that are only triggered under certain events or certain combinations. And to know that as you go through your test suite, you never hit some core functionality.

Valerii Kravchuk: Sometimes developers are surprised. Oh, I am surprised it was more users are surprised how it’s possible that in a mature software that is a part of it that was there for I don’t know, 15 or 20 years, probably, they can assume that like core protocol parts or whatever, we find some new bug, how is it even possible that because the test coverage for that specific part of the code had never been 100%, or it was at some stage, but then this coverage was removed, for example, while working on some new features, while speeding things up while supporting different platforms, things might be removed without nobody notice it until they break. We would like to prevent that. So we are going even that far. Surely in the process, at least what will MySQL DBAs get and MariaDB DBAa, they will be prevented from going the wrong ways. For example, as I did, and I speak about that, try and quickly trace every memory allocation with BPF Trace. That’s doable, that slow things down to not being practical. Grade that I tried it myself way before I ever suggested it to anyone in production, because getting your results in 10 minutes, while sacrificing three times the performance, roughly like that, it is awful, I would not ever want to suggest that to anyone. before. So this is what I’m working on. I’m trying to be ready. Yeah. There’s things and bad things about the tools I already used. So that’s the point.

Matt Yonkovit: I think there’s two things there, right, because you’ve got the benefits of these tools for developers themselves. And you’ve got developers for DBAs. From the developer perspective, it’s a whole different thought process. I remember back in the day, when I was doing performance testing with InnoDB, I would actually include things in code to output, the time spent on disk, like so individual disk reads and things and or how long it took in certain functions. So I could try and optimise certain things, I had to compile custom code in order to make that work. So having those tools available to you, is really benefit from a development.

Valerii Kravchuk: That’s the simplest thing you can do with BPF Trace, you can measure time spent in each function you want from the enter to the end. It does not matter if it’s instrumented with a performance schema or not, you just need to know the function name. And you are there to just add into user probes and measure time. So you can do it instantly without a single byte of the code change, you can do it on the fly. And the impact is only while your test is running. There is some minor impact if you instrument one function, not like malloc that is called 1000s of times per second.

Matt Yonkovit: So let’s think about this from the DBA perspective, though. So you have a system that’s slow and you can’t find you’ve got millions of things running, it’s really difficult to find the needle in the haystack. You can use this as well. How do you go about figuring out what you want to trace?

Valerii Kravchuk: These new tools do not replace the historical procedures and common sense that DBAs have. So we should understand that in the majority of cases, it’s about slow queries, queries that I executed somehow wrong. So the first thing you need to understand for programmatic load is what kind of queries are executed. In most cases, you will see some very bad of them in the slow query log. But what if you need to see each of them, or each of them for a specific short period of time, there is a general queries log, and there is a cost of enabling it and writing to disk. You can try to capture packages on the wire with TCP dump, there is some small cost, a lot of preconditions, and a lot of work to do. Thanks to Percona, some of it is automated already with particularly digest, but you need that in some cases, and I cover that case of the very first test with every dynamic tracing tool I’m trying to use is to capture the queries executed by a specific thread, you may want to go as far as to a specific thread. And with BPF Trace you can do, if you can identify the thread ID you can instrument only specific thread, isn’t it cool? It’s like heaving the query log very focused in time. So, my idea is that you feel should understand what my go wrong, you need to know how resources are used from the operating system, if you’re starving on something, if you are writing so much to disk, where there are unacceptable disk waits for your even simple or single page reader, right? But again, these set of tools and BPF trace itself can be used to study that as well as the operating system side. So it has a potential to be the one tool for all aspects of your job from the kernel to the SQL query in between. So it does not replace anything, actually, it’s augmenting an existing set of Linux, Unix operating system utilities, with the ability to do a fine grained tracing, measuring of specific things in your code, soyou can try to see the problem at the operating system level first like to high CPU usage, then trace it down to specific slow queries, that does a lot. And then try to figure out when these queries are executed, what they are waiting for, for example, what they are spending their computing time on: getting stack traces, aggregate them and count how much time is spent in a specific function call. And it can be your stored procedure computing something or you can hit mutex rate. And there are other tools to price them as well. So it’s not like a Linux dynamic tracing is replacing everything for every purpose. No, I can just recently my customer and friend quoting me, my blog post about new features in performance schema finally added to MariaDB as something that he found very useful in the context of MySQL to trace the performance problem. So, I understand that as well. So I’m trying to use different levels of tools, I switch to dynamic tracing GDB and stuff personally for my status, when I was not able to get down to the source of the problem with performance key. So things are not instrumented by default. Some things I probably was not good enough in using them. So I haven’t got the answer. But I got the answer from perf and that was the kind of a new perspective. So I was still looking for the same problem. I have seen this slow query, for example, its classical example that I use for four years alreadyin my talk, primary key lookup takes 13 seconds on integer field, primary key is 13 seconds. How is that even possible? Sometimes no, sometimes it’s instant. Sometimes it’s 20 seconds, or whatever. And it can be reproduced, but how to see it, where the time is spent? I wasn’t able to do it until I dig into the source code. for a simple reason that some part of the code getting statistics index dive in InnoDB was not instrumented in enough details to show where the time is really spent. It was showing that it was spent by optimizer. What optimizer might do really, for 13 seconds for primary key lookup? There is nothing to do there. But it was trying to read the value from the system and the system was preventing this request from being answered in time. So it’s quite practical, but none of the new tools redefines the approaches. Usually there are classical approaches, you try to see the performance problems from two sides: from the application side what is flow for your user and from the operating system sides, how much resources are used, are they used at all, and then you somewhere in the middle, you dig into a specific part of the code. Sothat’s my point. And I’m trying to show him that it’s not something to you, you did it, you can just do it more efficiently.

Matt Yonkovit: And so the recommendation here is really, what I’m hearing you say is, there are lots of tools that are already available. And that’s where you should start with those easy ones, whether that’s the performance schema or the slow query log, or just the statistics, you start there. And then as you find those instances that are a bit more problematic, there are tools which take you to BPF Trace. And then from there, you could go even a step further, and then you can start looking at those GDB’s and so you have this toolset that narrows down as you start to eliminate the easier things but always start with the easier things first.

Valerii Kravchuk: Well, finally, the easiest thing, there is some bad query in the database. If we speak about the database, there is some bit too big to flow with a very bad plan, just not needed query, too often executed. In the majority of cases, it’s about the bad queries. And what’s even better, it’s about some recent changes, either in the volume in data, or in the plan, or in the settings that made these queries bad. Because hardly, it’s the intention of any developer to create a very bad query. No. We should assume that we are humans, that are capable of checking stuff. And they checked it, and in their case, it worked acceptable. But then something changed. And that’s what we are looking for in support. We are looking for what changed, the world was beautiful, everything was okay till yesterday, but then it was broken badly. What changed in between? Yeah, it does not happen out of nothing. If we do not know, we move further. Do you know how long time or what shall you do for how long time to see things changing to bad, to worse. And there is an answer to that. Okay, then we will set up some careful monitoring to try to ping point. And we just have a new set of monitoring tools, nothing more. The problem solving approach, you can go top down, you can make wild guesses, but at some stage you will have to verify your theory and BPF trace specifically, dynamic tracing tools are good for checking theories. Fast enough in the non intrusive way. If you’re a developer, you are checking in theory by changing the code. Right for you, you can afford that, but we can’t.

Matt Yonkovit: That’s true. Well, Valerii, thank you for sitting down with me today and chatting about this. I know that your sessions are available on YouTube right now, they’re available for everyone. I also know you have a director’s cut of your talk. It’s a little longer.

Valerii Kravchuk: Yeah, you’ve never discussed it before. But I made an assumption for all my talks that for online conferencing, that happens for two years, that if some some of my recording was not accepted by the conference providers, they consider it to be too short to wear the whatever, I am free to share them and I never share the real thing that is up to the conference committee to show but they put it into a bit different, you know, then sometimes they show my way of thinking sometimes they show how bad they was initially, brilliant talks that were really presented to in some cases, they had details that were missed. I can even make it later recording, one day will start to last talks were good enough. But yeah, I am doing it. I can’t wait to channel and for online conferences, sharing the recording, as well. Not those that you will share for Percona Live.

Matt Yonkovit:
I encourage people to watch the full version, especially if they’re interested. It’s really good content. Also, you can check out your blog over at MySQL entomologist, which always has some excellent deep dives into some of the things we’ve talked about.

Valerii Kravchuk: I probably have to rename it one day less MySQL now than ever and I’m less about bugs now than ever. And it was actually Percona who turned me into a person who always look for for problems to a person who has to look for solutions at times because that’s what Percona was about. They never cared “Okay, you are great and explaining why things cannot be done at all that way for customer but what solution we can offer” that was a different experience for me while in the corner, and I keep it. My blog started to change. While people are interested in both sides. I’m also interested in both sides. And I’m quite thankful for this wide change in mindset. It helps.

Matt Yonkovit: Good, good. Well, Valerii, thank you very much for being on the podcast today. And I appreciate it and do check out those videos everyone check out Valerii’s blog. And don’t forget to subscribe for more awesome content like this. Bye, Valerii.

Wow, what a great episode that was. We really appreciate you coming and checking it out. We hope that you love open source as much as we do. If you like this video, go ahead and subscribe to us on the YouTube channel. Follow us on Facebook, Twitter, Instagram and LinkedIn. And of course tune into next week’s episode. We really appreciate you coming and talking open source with us.

Did you like this post? Why not read more?

✎ Edit this page on GitHub