Percona Community Live Stream for MySQL will focus on benchmarking a MySQL database. This is a part of our bi-weekly meetup to develop open-source database skills. Come up with your questions and get an answer straight forward to all your questions.
VIDEO
SLIDES
Check out Marcos’ slides HERE.
TRANSCRIPT
Mario Garcia:
Hello, everyone. Thank you for joining us today. I am Mario Garcia, I’m a Technical Evangelist at Percona. Today we have Marcos our way, Principal Support Engineer. And the topic for today is how to benchmark your MySQL database. Thanks, Marcos, for joining us today.
Marcos Albe:
Yes, sir. My pleasure. And yeah, I’m going to be talking a bit more about how to think about benchmarks in general. And then a little bit about the MySQL part. So let me share my screen. I always forget how you do this here. I don’t use restroom so often. So welcome, everybody.
Again, I’m going to be focusing more on how to think about metrics. And then we’re going to see some of the actual commands. And I will show you some I’ve done in the past. So let’s see.
What is the benchmark? According to the new hackers dictionary, also known as The Jargon File, it’s an accurate measure of computer performance. And so I started with these as an intro, because I want you to be aware that benchmarks can be very, very misleading, very misleading.
And there are two ways they can be misleading. Either they were prepared and executed, and presented by someone who didn’t have the technical expertise to properly design and interpret the results of a benchmark, or because someone in the marketing department thought it was better to, you know, create a benchmark that favorites your product versus a competence. So, you know, again, always take benchmarks with a big pinch of salt. And also, you know, distrust your own benchmarks. Like you have to be critical of your own benchmarks and make sure you’re, testing the right things. And looking at the right things. That’s why I wanted to start with this. Because, you know, it’s very common to see benchmarks that are shenanigans. So with that, in the web, let’s start. I’m going to talk, why do we benchmark, what is that we actually do the benchmark on, we’re going to tell things that we should keep in mind when doing benchmarks. And we’re going to talk about things we should avoid when doing benchmarks. We’re gonna see how actually what types of benchmarks there are and how they are done. And then, like one or two examples that I have around that I think are worthy. So why do we embark on spending time on something like benchmarking? Well, marketing, of course, like, it’s a valid reason. And, you know, when done properly, it’s not horrible, but you know, to say, okay, MySQL is faster than Postgres, or, you know, Postgres is faster than Oracle, whatever, you know, there is always the price involving, you know, you have to include pricing in your benchmarks. Like, if I will compare, you know, cloud instances, pricing will be a very big factor. So, you know, I will not only talk about the row performance, but I will certainly talk about the cost of things. We also perform benchmark to understand the effect of tuning, so when I tune something, or if I upgrade MySQL, if I upgrade my OS, if I change the setting in my OS, does it change anything? Is it positive? Does it have a positive impact, or is it a negative impact? So, understanding the differences of our future operations, again, upgrades, tuning, etc. It’s one of the big reasons to do benchmarking. Also, for scalability, when we want to understand the limits of our hardware, the limits of our software, and we want to perform so-called capacity planning, then benchmarking will certainly help us it, it is probably one of the difficulties, the most difficult type of benchmarks because for proper capacity planning and for proper scalability analysis, you really have to have very good benchmark that is close to your production application. So, you know, if, if the benchmark I run is nothing like my production, then it’s hard to actually make any sense of the results, even if I do everything proper, if it’s not like a good equivalent of my production workload, then the benchmark is going to be meaningless. But if occasion, and with this, what we’re trying to say is, let’s imagine I’m suffering some performance issue. And you know, I want to make sure my disk is in good shape, or my network is in good shape, or you know, that my memory, it’s okay. That it’s providing me the adequate response times and throughput. So, you know, sometimes you run your software, and you are not sure if it’s the software or the resources that are suffering performance degradation. So benchmarking the resources, it’s a very good way to understand that. And also benchmarking your software on a different set of resources on a separate machine. It’s also a good way to understand and troubleshoot. So troubleshooting, it’s another good reason. Reliability, is something few teams do, I guess. But it’s so useful. And it’s basically, when you’re going to release your next application version, part of your testing should be benchmark of your application, it should be a real replay, or a real simulation of your application. And those simulations, you know, even when they’re not going to be identical to the production workload, they’re at least going to be identical among themselves. So I can always compare with the previous iteration of that testing. And so I can say, Okay, this version of my libraries is still as good as the previous version of my libraries. And I will, you know, have a good understanding of whether there are regression in performance or not, or if, you know, at least, I’m going to discard the obvious issues, there’s always the chance that, you know, once you’re in production, things are going to be somewhat different. But having a base reference, it’s, I will say, important, and very desirable. And, you know, also for fun and curiosity, like, what does this and, you know, if you are actually in that group of people that you’d like to do benchmarks for fun, you can email me we have a job for you, just let us know. And then, we’re going to talk about what, what exactly is that we’re going to be benchmarking asides of the funny definition I tried to give when we started, it’s on your workbench, when you’re a carpenter. You have marks on your bench that help you very quickly measure the length of a piece of wood. And this is where the word comes from. And actually, the meaning is compare, right? Like, benchmark is Let’s compare. So what we benchmark is, what we compare is the performance of the two things that exist in computer basically, which are resources and software, so hardware and software, and we’re always going to be testing dos very much. And basically, the only thing I can think you could benchmark, again, there is always the cost. So, you know, the relationship between performance and cost is part of a benchmark, for sure, you should always consider that. There are stories of vendors that make, you know, not false claims, because the benchmark is correct that they never tell you they spent a million dollars in hardware to get those results. And they showcase those results as a generic, you know, thing that will be running on whatever hardware most people have. But in the end is going to be, you know, only true in the 1 million or hardware. So, costs hardware and software are the things we are going to be measuring. So the resources, CPU, file memory, and also when I say resources that are things like buffers, or threads, things that are facilities or things that every piece of software will use, again, the P threads subsystem, Linux is something every piece of software around there, or at least multi-threaded software reviews. So understanding if the P threads implementation in my OS is more reliable or more responsive than other implementation is interesting. mutexes are the same, like the fast mutex implementation, is it better on the service or the other? Those are also called resources in our world? And then the code, which is my own code, like the code I wrote that will use File CPU memory, threads, mutexes, etc. So what else? Let’s talk about a bit about what should we do and what should we do not. And here are the key things, I believe, should be in every benchmark that wants to be successful in terms of providing useful, meaningful results. So, first of all, you have to make it repeatable and scripted, make it a script and collect the details and put them in a way anybody can go ahead and recreate that benchmark that is really important, I think that’s the most important piece, like, I want to repeat this benchmark. Next time I have a new piece of hardware next time I have a new library version, next time I have a new version of my software. So repeating, being able to repeat the benchmark easily is crucial, then, of course, observe it, you run a benchmark, and you don’t collect data to date, nothing like you just spent a lot of electricity, you know, burning CPU time. But what is the result? So to actually make useful benchmarks, we need to set up proper and consistent monitoring across all the benchmarks we run. And if I run different benchmarks, to measure the same thing, I should always monitor the same metrics and try to present the same results. So monitoring with something like you know, could be PMM, or it could be SATA, or it could be you know, anything you have that will collect metrics about the US performance, this is the resources and that it will also collect metrics about the software, for example, in MySQL. I will try to collect if I didn’t have PMM I will collect show global status samples, I will collect show engine InnoDB status, I will sample the performance schema for mutexes information. And you know, I will just in general gather metrics on time sampling basis, so I can later understand the effect of the different settings or hardware or versions. Again, related to scripting it you have to make it easy to rerunю If I have to spend 30 minutes of my own time, preparing the benchmark is going to be a very unpopular benchmark among your peers. Ideally, it should really be like a single script with a configuration file or a few parameters that we can quickly type in and say, Okay, this is what we need. Just run it, and I can turn around, let it run for a couple of hours, come back and see the results. So you know, that’s an important part is making it easy to read one. Another crucial bit, and this is really important, is test one thing at a time, don’t change 20 variables, and do another benchmark, then change another thing variables and do another benchmark, change one variable tested, change another variable tested, so you can understand the individual effects of things. Otherwise, it’s going to be very confusing. And it’s going to be super hard to reach conclusions as of what produces what perfect. Benchmarking is hard. It’s a long, cumbersome road. And, you know, like, perhaps you need to test the same things, even in different order to really understand what is going on. And, but, you know, if you test more than one thing at a time, you’re never gonna find out what happened.
Then be realistic, try to make the benchmark. So that is, to some extent, comparable with your target load, or, you know, again, if I want to benchmark for my dB, and I benchmark on the most expensive t instance. Then, you know, I will be running on an instance that is 1/10 of the size, will, you know, I won’t have the same results. So I need to be realistic and choose the right instance type. And do I expect to have 20,000 customers, and how many application servers will you know, like, I will try to dimension the clients for the benchmark, similar to the numbers, we project for threads coming from the application servers and the clients. So, trying to not over dimension or under-dimension things, or use different resources than what we expect to use, in our encase, in our real production use case.
Then sanity checks, this is somewhat related to being realistic is like, in my benchmark, I see numbers that exceed the capacity of my storage, for example, you know, like, I know, my storage can do 1000 IOPS of 16 kilobytes per second of random read rights. And I see 5000 It was like, something is wrong, like that number is wrong. And the likely reason I am hitting some cash. And so I’m not testing the IO, I’m testing the cache. So sanity checks, help us.
You know, avoid lying to yourself. Again, you know, it’s a way to not fall into your own trap. I’ve seen this dozens of times precisely with that example, people trying to test their IO. And thinking they were testing their IO, when in fact, they were testing the file system cache that Linux puts in front of the IO, and showed they were doing like 10,000 I ops, but in reality, they were doing only, you know, 20 I ops and the rest were hitting the cache. As soon as we disabled the cache, they could see the real numbers. And then those real numbers verify if the problem we were telling them the customer had so this is really important. And then, again, as you observe the data and gather all these and you have fancy graphs, it that’s all fine and dandy, but then we have to reach conclusions.
And it’s important that when we set to do benchmark we have goals, I guess I should have put that here as well have clear goals of what you’re testing and avoid trying to have expectations. So my goal is to understand the different performance of memory allocators. But I will have blank expectations about that I will I will not set myself to say, Oh, I believe J malloc. Should be better, or TC malloc is gonna cause more fragmentation. No, I rather will say, you know, okay,
I got a test. And I’m going to analyze the allocations with different block size, and I’m gonna see how they behave with rates, see how they behave when I do sequential writes. And, and then I’m going to compare those and arrive to some conclusions. So arrive into conclusions based on the data you collected is important, and it’s holistic part of benchmarking, and is probably the most difficult part of benchmarking, because it requires deeper expertize on the software, or the resources you’re benchmarking.
So if you don’t truly understand how MySQL works, you’re gonna see something, a variable change, you’re gonna say, Okay, I’m going to enable the credit cash, bam, and I will, you know, run a few benchmarks. And when I see the results, it’s gonna be hard to interpret, like, why does the query cache show, so very good results for reads when it’s running on, you know, one or two threads, and then it doesn’t work for, you know, the 200 threads. So that type of conclusions like understanding that requires investigation and requires understanding of the system, you’re benchmarking. So if it’s your own application, it’s easier if it’s a big application, like MySQL might be harder. Finally, presented or not finally, but presented, once you have your those conclusions, you should present them in a way that they are clear that the methodology is clear, the objective of the benchmark is clear. And the results are clear and easy to interpret by anyone.
And finally, of course, share it. Again, you already have scraped, you have your presentation, it makes for a great blog post, go ahead and share it. So not only can others benefit from it, but also others can do independent verifications of your own benchmarks like is the benchmark I wrote, reasonable or has some mistake. And sharing with the community, especially in the open source community is going to give you answers people is eager to take a look and is eager to try on their side. So this is also, especially in the in our open source world. It’s a crucial part of benchmarking, I believe, to have independent modifications, it makes it look more like science. And all right, and these are the big dues of benchmarking. Let’s go with the don’ts. Oh, sorry, no, one more thing.
I’m going to show you an example of how when I say test one thing at a time, like a very basic idea. And I think we will have time to do a demo of this in a moment. But basically when I want to test one thing at a time, what I will do is oops, sorry is I will create a loop like this, where I have the variables I want to change in my benchmark. And if I and I will set the different values for those variables. For example, here, I am going to test this benchmark I’m going to run it seven times with different threads counts 148 1632 etc. Since and I also want to exercise the dictionary, the data dictionary, so I’m gonna test with 110 100,010 1000. And then I’m also want to test with different IO schedulers. And so some stuff is from the database, other stuff is from the clients and other stuff is from the OS. So I’m, you know, changing things from three different plays, but I’m changing things one at a time, because all these tests that the combination of all these tests is gonna, in the end, I’m gonna have 105 results for this, I’m gonna have seven times five is 35, times three is 105. So I’m gonna do 105 different benchmarks.
That and each is going to be all the possible combinations of those, which again, is going to allow me to test one thing at a time. And again, while I will just set those variables in these loops, and then I will just use them in the body of my test, I will just, you know, finally use these variables I O scheduler, I will set the Q scalar, then I will take tables, and it will sit, for example, for sysbench. If instead of my test, I will use sysbench, I will use threads and tables. And I will also have neatly organized results folder. You know, in this case, I decided, Okay, I’m going to categorize my testing on the schedulers because that’s my main target for this benchmark was the schedulers. And then I put my test name, the name of the test, I’m running, and I’m saying threads, and then I will hear it’s off the screen, I’m sorry, but basically, I also have the tables variable after that dollar sign. So then I can, I’m gonna have 105 of these files. And I can extract data from there. And I can create plots, and graphs and summaries and averages and anything I want to help me understand the results and right to conclusions. Often number of results. This simple one, which this is a very simple benchmark is giving you over 100 results, it’s going to take time to go through that stuff. And the right thing to do is have some scripts that help you extract the data from all those results. And to help you present it. So scripts that automate the initial presentations for your analysis. That’s important to have as well.
Usually, it’s just, you know, grep, extracting aberrations, percentiles and totals. So you can present those in graphs. And with the plotting, you can say, Oh, look, you know, like this or this and that.
Mario Garcia: There’s, a question. If, well, if there is a way to keep the database running when doing major version upgrade, if there’s a way to keep the database running, when doing a major version upgrade, for example, or when upgrading MySQL. Oh, but You mean you want to do, so you want to view the to keep your database up and running while doing a benchmark? Yes.
Marcos Albe:
Yes, you can. There is a way to do that. I will mention it is we’ll see it in a minute. I don’t have a demo of it. But we have good blog posts in our blog about that subject. It’s absolutely possible. And I see Maxim Larin is here. We are pleased to see you here, man. Total hammer.
Okay. Again, hereю I’m just showing you things you shouldn’t do. So this is things we should avoid. And that if you do these, you’re gonna not going to have a good benchmark or you’re gonna not be very popular in the open source community. The first one and it’s common, it’s something we are humans. And we are tendencies to satisfy ourselves. So sometimes we try to specialize. Or, you know, we introduce a BS in our benchmark. You know, if I have a very fast hard drive that I don’t know, to x, the speed of sequential reads you know, I will try to create a benchmark that is all about sequential reads. But in reality, like reality is that most people suffer with random rates, and that most databases will do random rates and not run sequencer rates. So creating a benchmark that purposefully focuses only on sequential, it’s actually a BS, and sure, it might be technically correct, but it’s meaningless for the larger audience, and it’s not really going to result in a positive experience. If I tried to use your hard drives, you know, continue with example, if I use those hard drives, and I put them on my database, I’m not going to finally have a positive experience. So not a good idea. The other one is during the benchmark, you might see errors, you know, I don’t know you depleted your file handlers, you will see too many files open or too many connections, or you will see the disk full, you know, ignoring those. It’s a bad idea. And you should always work to understand why you have errors. And then the other one is outliers. Like why do I have like 10% of my load that returns in, you know, 10 nanoseconds, and then 80% of my load returns in like 1.5 milliseconds, and then another 10% of my load runs in like one second? Well, you know, those outliers, the small ones and the big ones, it’s important to seek an understanding of what mechanisms are behind them, like you said, because I’m hitting a cache, is it because there’s another workload, disturbing my benchmark? Is it because I buffer fills up, and I am not properly configuring things еo match a realistic environment? In any case, we need to look for the outliers. And, you know, for things that are way under, let’s say, the five percentile and way over the 95th percentile, you know, like, yeah, why, like, try to see why. And, of course, try to not test the wrong thing. So if I’m running Sysbench, for testing IO, I need to make sure I create a dataset that is large enough, that is not going to fit in memory, because otherwise I’m not testing the IO. I’m just testing the memory on the CPU. And at the same time, if you only want to test the IO, you should probably try to reduce the things in MySQL that will prevent the IO from being saturated. Because we do want to reach saturation points. Not not immediately, we want to ramp up towards saturation. But we want to reach saturation. So like, I don’t know, I will say don’t set a thread concurrency limit on MySQL, or perhaps you know, if you see that concurrency is so heavy that is causing mutex contention, then do set some thread concurrency to avoid the mutex has been a problem or you know,
things that will reduce like if you want to test I will try to set full durability for MySQL instead of relaxing durability, which will avoid the most expensive Fsync calls. Again, try to design and think out your benchmark in a way that reflects the reality of production as close as you can.
Okay, types of benchmark, this is just a very simple list is basically, that the micro benchmarking things like synthetic benchmark things that have some inspiration in a generic reality, but that they don’t necessarily match the reality of your database workload. These are things synthetic, and these are the least similar to your production, you know. So, for this, we have tools like Sysbench, MySQL, slapp, and all the TPC suite.
And these are easy to run, easy to configure, like, you know, you can just run a loop, like the one I showed you earlier. And with that, you will, you know, have a reasonable idea, super useful to compare things like hardware, or OS settings, you know, like, because that test should be the same everywhere.
And, you know, I can say, Okay, this hard word was better for this test. And that’s going to be realistic for that test. So, they are not useless, but they are very far away from being realistic. So, you know, this is to understand individual components of a system. And the difference in tuning single components, then simulation, you know, that will be something new, right? This is usually done with custom scripts, we have people that has, you know, API’s to access their database, and they will create sequence of API requests that closely match what production looks like. So they this is actually really nice is basically, you first monitored your through production database, and you will categorize is not the word you will try to understand how your load is composite, like it’s it, mostly inserts, mostly deletes, what type of queries.
And with those characterizations, that is the word. So you will characterize your workload. And you know, again, if you have an API, you will say, okay, during the regular normal workload in my website, or in my site is I hit this API endpoint 1000 times per second, this other endpoint 25 times per second on this other endpoints 500 times per second. And so I will create a benchmark that will do that it will hit a 1000 times per second, B 25, and C 500. And that will be a fairly reasonable emulation of production workload. And that’s actually much more helpful to help you understand your reality, the reality of your application. And then we have like direct replay for web servers. And I guess for Mongo as well, I don’t remember truly, but we’re talking about MySQL. You could do TCP capture and replay in MySQL is not so easy to do TCP replay as far as I know. So what we do is we capture everything in the slow query log. And then we use credit playback.
So query playback, it’s in our Percona Lab GitHub repository. I can post the link later. But basically, that the, the tool, you know, will try to
replay the queries in the same order they arrived with the same rate they arrived with, and they’re going to be exactly all the same queries you got in production. So it could be pretty close to reality. You know, when you’re then replaying on a different configuration, it’s going to happen that queries are not going to replay exactly at the same speed. They want overlap the same way. And the difference, there could be very different effects on locking contention and everything. But it’s a pretty good testing and then production. And here we can to the IT viewer question.
Mario Garcia: And how do you test in production?
Marcos Albe: Well, you could, you know, say, Okay, if I have a couple of hours at night, I can Ranch, I can run read only benchmarks, I can actually use Sysbench to test the individual memory and the individual CPU file your components of my production environment, but testing realistic workload on your production machine is not 100 persons possible because that will include rights, and those rights will be distracted for your data. So I don’t think you want to do that. Rather, what we have is ProxySQL mirroring. And with enough resources with a big enough ProxySQL mirroring will result in a very decent testing.
Basically, this goes like you have your clients and you have your proxy, and the proxy is going to send the queries to both the production and the testing the benchmarking instance. And you’re going to have exactly the same traffic on both. And again, that’s like the most that’s as close as you can get to being realistic, without actually, you know, changing parameters in your production database life. This allows you to have two machines, two identical machines, same versions, or different versions, if you want to test versions, or different machines, if you wanted this hardware. But again, it allows you to have one additional instance, that is going to get the same traffic with the exact same arrival rate. And that is really the best testing we can probably do with MySQL, it’s expensive, in a sense, because you need to have ProxySQL machine. But you know, it’s worth the effort and the cost because it will allow you to test very nicely. And again, you can test different hardware, different versions, different settings, and have PMM monitoring that testing instance. And you know, it’s really useful, like, if you’re proficient with data analysis, that can help you bring your actual real instance up, you improve the performance in your real instance a lot, because you could do a lot of experimentation and verify that things are running better on the mirroring instance.
And finally, help No, so that was that. So, finally, let me show you a quick benchmark I did some time ago, I was bound to present this Sunday and you know, I never have enough time. Basically, when NVMe cards when SSD cards were introduced it we change it one piece of the IO stack for a new one and with this Linux introduces ban bunch of new so called IO scheduler and we were bound to give it a try give the the scheduler set right. So, you know basically, here is my basic presentation of the benchmarks we did back then. So I present the hardware. So if you want to verify, you know, you will try to get similar hardware to what I use to make sure you know we are doing equivalent testing. For whatever reason I failed to keep the configuration I must have the configuration in the folder. I couldn’t find it. So I apologize. Basically here. I should present the MySQL configuration I was I had it used back then. And then I present the very simple script. I use it for the benchmark. I have some settings. All adjust. And then I, like I showed you before I will have my variables, you know, I have my variable scheduler, I have my variable Sysbench threads. So I want to see, my goal is to see how the IO schedulers react to different levels of concurrency. That is the goal of this benchmark. So, I go, I iterate over these four types of benchmarks. And then I will iterate over these six thread concurrency levels. I do some preamble here, I configured the scheduler, here, I dropped the caches to make the testing fair. So every test I do is going to start from the same clean slate, there is not going to be a cache that was generated by the previous run instance, this is important again, this will introduce BS, right, this will introduce an anomaly like the first test was very slow because it didn’t have a cache. And the second test will be very fast, because it already had the cache enabled the cache warm it up, I mean, so to make sure all the tests are the same, we drop the caches in this case. In other cases, you might need to do other stuff. But what it’s important is clean up and make sure every test is starting from the same clean slate, then I write some message, I have some human-readable string that is going to help us understand what test is currently running in the system in the bench. I was using PMM, I will show you in a minute. And PMM has this beautiful functionality, which is called annotate. And it allows me to introduce a marker in the graphs when there is an event. So I was using that to I was using annotations to know in my graphs where one test started, and the next one finishing when one test finished it and the next one is started. And finally I have my TPCC a script a bunch of connection parameters, then I will start configuring the test is going to run for one hour, it’s going to gonna run with the number of threads. So this is one of the variables. And this was the other variable. So this is this variable, right systems threads. And then tables is this variable. So I’m always running 100 tables, I, we determined it that was a fair amount, and that it will not introduce much variance in the in the benchmark, so it was fine. Then I’m taking one second interval reporting, which is going to allow me to create a nice graph of one second apart events if I want. Then Sysbench scale is the ratio at which things arrive to the database. And then I run and again, I am careful of sending the output to a file that late meet that will later allow me to identify to which test it belongs. So I will keep my variable names attached to the file in some way, either inside the file, or in the file name. I prefer the file name, you can do whatever you like. And let’s say that this is the whole benchmark and this benchmark was actually useful for us. So you can see it’s not very hard. But it allowed us to build this plot. At some point none was showing good results. But it quickly went up. We arrived at two sample like back then we were investigating why this change it I recall was doing it with me. All I can’t remember if it was months ago, but he arrived to some conclusion on why we never got to publish this, like it was suddenly abandoned due to lack of time. But again, it was useful because we were able to tell if you’re running on Amazon EBS volumes, deadline shows consistently better performance for reads and for writes. And they are more reliable in the sense that, there is no lots of variance, right? Like you can see it’s very dots are very compressive, meaning that there is not much variance between requests. This shows somewhat better performance on some.
But it’s not truly it’s too scattered. So sometimes it’s very good. Sometimes it’s not so good or really bad.
And, you know, again, this allows us to reach conclusions like okay, if I’m going to have more than 256 threads, then perhaps Yeah, I want to stay with non, which look at more stable overall, after a lot of threads. These look very good up to 128 threads. But after that it was no longer good with non it continue to have the same latency as it had with 128 threads. So you know, we can arrive to that kind of conclusions and take decisions on how to configure our instances based on this. We also took the throughput the disk throughput to compare. And we can see that this throughput was brutally Hager with MQ deadline, up to 256 threads where it literally diet known was never as good, but it never died as badly. So again, it’s more stable for very high thread count. And then we also measure it the MySQL throughput to understand how MySQL behave with with the different schedulers and different threads concurrences. So this is like classic example of simple micro benchmark to help us understand IO scheduler performance in MySQL with different concurrences. And let me show you another simple testing I wrote for some customer. So a customer came and asked me how do I test my IO? And so what I did is I use again, very simple. First of all, I use the file IO test in Sysbench. I said okay, I want to test files of 256 gigs. I’m always going to be running. Okay, I prepared the thread the test with a threads. And I said I want to split these 256 gigs in 64 files. I’m going to use blocks of 16 kilobytes for this test because that’s what that that’s what InnoDB uses for vast majority of its operations. So I focused on that size. And I ran the prepare and after I have my prepare running I will, again do a very simple loop in this case, I’m just going to test the number of frets. And I’m always going to stick to the rest of the variables is going to be always the same.
And I carefully crafted the settings in, again, the test mode, the read read rate to the IO mode, how much a synchronous backlog, the extra flags. So all these is crafted in a way that reflected reality for the customer. You know, if I wouldn’t have said this to one, I would have made the test much heavier. But it didn’t reflected what the customer had, because in his settings, he was not doing F sync all the time, it was just letting the OS take care of when to do F sync. So I did the same here. So again, this is not a universal testing for IO is rather an IO test, I crafted for an individual customer, looking at his own settings. And also depending on your settings, you will adjust differently. And here I have a reasonable explanation of each of the settings. And I will share the links of these documents in a minute with colors. So with Mariano sorry. So so he can post them later on our blog. But basically, I don’t want to set a limit on the number of requests,
I will be changing the number of threads because this will change on any system. As the day goes by, you’re gonna have different number of people working on MySQL.
Okay, how often I want to put in metrics.
This was about the size of this guy’s database, the guy has 200 gigabytes database is it 256, just to make it round number.
And then, um, and also, I added this note where, you know, if you have two terabytes of data, but you only ever read 200 gigabytes, you could as well test with 200 keywords, you might as well test with the two terabytes, it’s a good idea to also test with two terabytes it’s going to be make it far excruciating on the slower.
Okay, number of files. Again, this was similar to number of tables they have. Then, again, block size, this reflects the InnoDB default page size, we’re going to do random, our n is random, sorry, our nd is random. And our W is read, right. So we’re going to be doing random read rights, which is what you really want to test with a database, mostly because again, you could go ahead, look at your PMM or your monitoring solution, and characterize your workload, so that you can better understand how much of your read workload is sequential and how much of your read workload is random. And also for the rights. So once you understand that you can tune this better. Again, random IO is the problematic thing in IO. So I tend to test random IO first or mostly. Then the ratio of reads and writes. So you know, like, how many reads am I going to do forever, right? And I’m going to do like, almost two reads for every right. Give or take a bit list. Again, we took this number by dividing InnoDB rows read by the number of InnoDB rows inserted deleted date. And you know, it’s that’s not a perfect number. We could have use it pages read and pages written. That’s also fair, again, well, here I suggest in the DB data array that needed to be data written. But you again, you have to characterize your workload to understand what’s an appropriate ratio that will reflect your reality.
Mario Garcia: How often should benchmarking be checked on an application? And the other one? What would? What would the recommended frequency be? So, looks like the same question. So frequency, and how often is the same in my mind?
Marcoa Albe: So how often? Well, whenever you’re about to do changes, I mean, if you benchmark, again, is to compare, remember, benchmark is comparing. So if you have something to compare, you want to do a benchmark, like, Okay, you say, I’m going to upgrade my operating system, there, you should run a benchmark, I’m going to upgrade my database, you know, especially if you’re going to upgrade a major version upgrade, like from five, seven to eight, zero, then you should run a benchmark. Or, you know, I tell you, Oh, look, you’re having Aiyo problems, you know, like you have some IO difficulties, you should use these settings. And you don’t trust me, or you, you know, you’re super concerned about effects of changes in the settings or you have changes committee that demands proof that the change is a good idea, then you will do a benchmark and you know, you will create comparison, you will compare between your current production instance and the instance with the change. So, you know, you will check check with an upgraded OS, you will check with the upgraded MySQL, or you will check with the change it’s setting. And I don’t I don’t see the purpose of running any more often than when do you need to compare things, I think that is the only time we need to run benchmarks. Oops. So, I will say, again, I use it you know, to be in or to be uses AI to a synchronous IO. So I said IO mode, a synchronous. This is the default queue depth, and just putting it here for completeness. But so if you want to tune this, you also need to tune the queue, skip the queue, for your block device. And then we normally run in ODB, with our direct InnoDB, flush method or direct. So if you have that set, you also want to have extra flex equals direct. So again, you can see how I am trying to look at InnoDB to tune my benchmarks, so it more closely reflects reality.
And then again, if you have InnoDB flush method, equal InnoDB. Sorry, flush log at TRX commit equals zero, you probably want to do this equal zero. If you have InnoDB. Flush, like a T Rex committee equals one, then you should make this one.
And we’ll finally just run. And that was for table, you know if I want to test myself, but InnoDB does other types of IO, for example, the IO it does for the redo log is very different. It’s done in blocks of 512 bytes. And it’s always sequential writes, it’s always old, if you use all direct all, then you can use direct here. Otherwise, you should not use any extra flags. And again, how often we think and, you know, again, do it every time because we have this. So for the different and for binary login again, I did 1k I’m going to do sequential writes is going to be this should be remove it because binary logs don’t have direct. And if you have sync binlog equals one, you should again use F sync frequency equals one. So I’m in every case, like I’m reusing most of the stuff but with small variations. I can emulate other parts of you know the beat of MySQL.
So again, this is just an example of how I will craft sysbench to do some micro benchmarking. And with that, I conclude my presentation, I’m not sure if you guys have questions or anything else that you want me to show you, or answer. If not, we can wrap up around here, I will share my links with Mario, who will later publish them. So you can find them later on our website. If you find this full, we are here every other Friday. And I’m not sure what’s our next topic, Mario?
Mario Garcia:
Let me check. How to keep the database running while doing an upgrade to MySQL.
Marcos Albe:
So there is no zero downtime upgrades in MySQL. Unless you are using a cluster of beautiful a synchronous cluster, like Percona extra db cluster, or Percona server with group replication. So an individual instance of MySQL cannot be upgraded without some downtime. If you want to keep at zero downtime, or like very, very near zero downtime, you need to have a visually synchronous cluster.
If you have an as synchronous replication, you will also need some downtime to do a clean failover between the active primary and the replica. And at the time you do that failover, you need to induce some downtime to guarantee consistency. So again, the most important thing to do sudo downtime upgrades is to have the appropriate clustering solution running. Once you have the clustering solution, with a beautifully synchronous replication solution, you can just point the rights to any of the nodes because all the rights or all the nodes can take rights and be certified to be free of conflicts. So you don’t need to take downtime to move the rights from one node to the other. And then you just you know, upgrade one node, like the other two notes will still be taking rights. And once you’re done with that upgrade, you can move to the next node and once and finally, when you’re going to upgrade the last node, you move the traffic to one of the already upgraded nodes. And that’s it like you don’t have to suffer any downtime. But there is a solid requirement, which is that you must have the fully synchronous replication solution in place. I hope that answers your question.
Mario Garcia: Okay, well, our next stream will be on November 11. And the topic is how to secure MySQL. Oh, I’m not sure, that’s my specialty. But I’m not a security guy, but I know MySQL enough. So I will help you with that part. And yes, there are dozens of I see a question, which is, if there are any job positions available at Percona, there are plenty of them, including in our support department, I can tell you support at Percona is not the classic call center support that does things by the manual. And we have percona.com/careers you will find all the open positions there, including the geographical location and apply. So please go there and check if you want to join us. And I’m not sure answer any other questions. No, I think that was the last one. There are no other questions in the comments.
Marcos Albe: Oh, right. Okay, gentlemen, was a pleasure. And I’ll see you November 11. Have a good one, everybody.
Mario Garcia: Thanks, everybody, for joining us. Thank you, Marco for all you have to share today. And see you next time. See you next time. Bye-bye
Follow us on Twitter and stay tuned to all upcoming meetups.
Any comments, feedback, and suggestions are welcome to help Percona Community improve upcoming events! ∎