Overview of MySQL Project, Orchestrator, Ghost, Vitess and PlanetScale - Percona Podcast 19

by Shlomi Noach, Matt Yonkovit

Link to listen and subscribe: PodBean

Listen in as the HOSS talks with long time MySQL community contributor Shlomi Noach. Shlomi has been developing and enhancing some of the MySQL communities most popular tools for years. You may know his mysql work from projects like Orchestrator, Ghost, and Vitess. He shares his experiences and his motivations behind some of these projects and gives us a preview into what he is up to next!

YouTube

Link: https://youtu.be/CmA2MZSBTgw

Shlomi Noach

Engineer, PlanetScale

Engineer and database geek, works at PlanetScale as a maintainer for open source Vitess. Previously at GitHub. Interested in database infrastructure solutions such as high availability, reliability, enablement, automation and testing. Shlomi is an active MySQL community member, authors orchestrator, gh-ost, common_schema and other open source tools, and blogs at http://openark.org. He is the recipient of MySQL Community Member of the Year, Oracle ACE (Alumni) & Oracle Technologist of the Year awards.

See all talks by Shlomi Noach »

Matt Yonkovit

The HOSS, Percona

Matt is currently working as the Head of Open Source Strategy (HOSS) for Percona, a leader in open source database software and services. He has over 15 years of experience in the open source industry including over 10 years of executive-level experience leading open source teams. Matt’s experience merges the technical and business aspects of the open source database experience with both a passion for hands on development and management and the leadership of building strong teams. During his time he has created or managed business units responsible for service delivery ( consulting, support, and managed services ), customer success, product management, marketing, and operations. He currently leads efforts around Percona’s OSPO, community, and developer relations efforts. He hosts the HOSS talks FOSS podcast, writes regularly, and shares his MySQL and PostgreSQL knowledge as often as possible.

See all talks by Matt Yonkovit »

Transcript

Matt Yonkovit: I’m here today with a longtime community contributor and a good friend of everyone in the MySQL space. Shlomi, how have things been for you?

Shlomi Noach: Hi, I’m fine, everything’s fine. Thank you. We’re in good health. And that’s all we can ask for these days.

Matt Yonkovit: So, yeah, I mean, it’s been, it’s been tough all over the world, but it looks like you know, with everything going on, hopefully many of us will get vaccinated, hopefully, we’ll get back to being in person. And so we’ll be able to have conversations in person again, we’ll be able to drink a beer together, we’ll be able to have a coffee, whatever else we need to do. So I saw forward to that. So Shlomi, you have been in the community and in the MySQL space for as long as MySQL has been around, basically, I’ve known you since the early days of Percona, where we used to do consulting gigs, you were on the other side of the world. So we were like ships passing in the night once in a while. But since then you have done a tonne of things that have brought awesome tools, awesome knowledge to the community. Maybe for those who are listening who haven’t met you before, can you just give them a little bit of your background, tell them a little bit about your journey, I know that there’s some really cool stuff that you’ve done. And I think everyone would love to hear about some of it.

Shlomi Noach: Sure. So I’m slowly I’m based in Israel, I am a software developer, I’m a developer by origin. And through my work, as a developer, I was kind of drawn into databases, you have to cross paths with databases at some point. And I did land up using MySQL for a few years, and I’ve always appreciated the open source, the ecosystem around the back in those days, it was phpMyAdmin, to help you manage your database. But those were the things that were appealing to me, there’s something out there, it’s working, it’s working, well, there’s an ecosystem around it. And I’ve always wanted to contribute to that. And through my later work, I co-founded a small company, and, and then I became both the developer of the product, but then also the DBA. So I began to see things more from the DBA side, and the hassle of the operation, etc. And that’s where things clicked for me, because as a developer, I’m able to give more software programming methodology, solution, fine to DBA problems. And so a lot of DBA operations are automated using Chrome tabs, a little bit of scripts, etc. And so my move was to start writing software to help you manage your databases. And so I began to contribute to open source. I began a blog, I also did a little bit of consulting, I worked with Percona, for just a very brief, very brief time. So I was getting more and more into the MySQL community, which I appreciate until today. And I started to write some open source tools and notable some of them came and fast, and no one noticed, others lingered on and most commonly known today are orchestrator, which is began as a topology management tool, replication topology management tool, but now is more used as a high availability solution, failover, promotions, detections, etc. And then, during my time at GitHub, I authored ghost, which is a schema migration tool. And I think I’ve been around schema migrations throughout my career. And actually, the things that I’ve done today are very relevant to that. So maybe we’ll discuss that later. Yeah, so I released a bunch of open source products, some in my spare time, some during my work at Booking.com. Now, obviously, it’s a planning scale working on open source of the test. So those are the things I do. Yeah.

Matt Yonkovit: Yeah, no, and I mean, I know that so many people have been helped by both ghost and orchestrator. I see those constantly being brought up as solutions in many different organisations. They’ve really been adopted quite well. And so that’s exciting to see. But you talked about coming from that develop space, and yeah, my kind of theory. And what I’m seeing is there really is this merging of DBAs and developers now where, honestly, it’s hard to know where one is in, are there? Is there really a dedicated DBA position in a lot of companies anymore? Because we’ve moved to this, this space where everything is infrastructure as code. And so you’ve got a lot more development overlap with what used to be dva functions. And they’ve kind of merged.

Shlomi Noach: Yeah, yeah, absolutely. I mean, in the old days, you could just be the DBA, who guards over the database, doesn’t let people in or pushes requests, don’t touch my database. And, and you’d have to operate things manually. And that’s what you do. But these days, I think, are long gone. And today, it’s, I think, for most DBAs, it’s not even enough to know, bash scripting something you really no need to be aware of, and collaborate with retooling with automation, starting from puppet and chef and you know, and how to deploy your database and how to configure a database automatically. And yeah, getting to know some origins and getting to know some tooling around an issue, you need to run XtraBackup, then you need to understand how that works. And where the output goes, if it’s no longer just a bunch of scripts, you really need to integrate with if you have monitoring solutions, you need to figure those out, you need to integrate everything together. So DBAs don’t necessarily need to be full blown developers, right. But there’s a lot of overlap between the two worlds nowadays.

Matt Yonkovit: And that’s why like, now we’re seeing more of that kind of SRV, DBRV kind of role kind of form from what was a lot of DDA infrastructure folks pulling over some of the developers in kind of merging into this new role. So I’ve seen that quite a bit, there’s been quite a bit of talk on where that, that that position is going. So I’ve even written stuff and talked about is the DBA dead? You know, I think it just evolved personally. But it’s definitely a evolution and a change. Now, as you’ve gone through that evolution and change, like you coming from the developer space? At what point did you look back after you gain some of your more core database skills and say, like, Oh, my God, I can’t believe I did that. As a developer, I didn’t know this about the database. And now I regret writing that code. Do it. Did you reach a point where were you transitioned and started to grow into the database space? And kind of said, Oh, yeah.

Shlomi Noach: I mean, I mean, here’s the question. Do I regret some software that I’ve ever written? That happens all the time! For sure. I mean, like DBA wise, I used to do a lot of things manually back in the day, it was like you had those database pets, right? You had this server and that server, and it just made sense to do things manually and wake up to some phone call telling you why things don’t work. It’s arrived. And you’d figure out Yeah, my automation is really not not very good. Like, I don’t, I don’t get to solve problems without waking up in the middle of the night or getting a phone call while I drive, etc. So those are the things that scare me the most, like, Is there anything that I haven’t automated that it’s going to come at me when I’m least prepared to handle it? And so for sure, I wrote I wrote many things in the past like cron job, bass things, things that would just get lost in the noise something would fail and then you know, this Chrome notification going somewhere it’s it’s not really audited, it’s not really collected in a very good manner. There’s no centralised place where you can go check for the pending errors, etc. So there’s a bunch of things. There’s a bunch of open source software that I wrote that it wasn’t high quality, but that’s the way you learn how to make progress.

Matt Yonkovit: Yeah, of course, of course it I mean, it’s, it’s always that constant evolution, but that evolution helps us grow right. So as we make mistakes as things happen you learn what to do, what not to do. And eventually you start developing things that are really cool and interesting. So I know, orchestrator was a project near and dear to your heart. Tell us maybe like how did that come about? Like, what was the inspiration for the orchestrator? You know, you mentioned that it kind of evolved from its original purpose. You know, so when you started out and said, I’m gonna work on this, what was what was going on? What did you think?

Shlomi Noach: Yeah, yeah, so I was working at the time for Outbrain, which had a fairly medium sized database. It was nice, everything was working smoothly, but work, we’re kind of flying blind, right? We had this complex topology, lots of replication servers, across three data centres. And like, at any given point in time, we didn’t really know how does the topology actually look like right now. And we used to have network flow failures between different diseases, and that would really impact some of the replication streams. And we still wouldn’t know why does the system behave as it is right now. And one of the first objectives for orchestrator was just be able to visualise, hey, give me give me the visuals. How does the topology look like? What’s the primary? What are the replicas in which data centre? And if the topology doesn’t make sense, or if I’m going to do some maintenance work on some the link between two devices? Can I refer back to that. And that was the second purpose of orchestrate to be able to refactor your topologies. Back in the day, it was pre GTID. And so you had to work very carefully detaching one replica from here and assigning it to a new primary. And that that was you know, at the time, it was, Matt kit had a tool that was then abandoned. And orchestrator, kind of revived the functionality in that market. And, in that’s how it began, I was actually inspired by by a lecture, I saw it at Percona Live, it was by get up by one called Sam Lombardr who today is my boss, he was my boss. He’s kind of my boss, as well, yeah. So it’s kind of like it’s a funny game, how you get inspired by one tool, and you inspire someone else and use something, and someone else uses your code back or your functionality. But yeah, I just love that about about open source. So that’s, anyway, that’s how work is where you begin. And then it grew out of that position into, okay, let’s try and do at least partial failover is, let’s try and figure out how to overcome the fact that we don’t have gtids, yet, 5.6 was only coming out. It was very early days for gtid, it wasn’t easy to migrate into gtid. And then we realised orchestrator could be the high availability solution for booking.com topologies. And that’s what we did. So during my time, in booking.com, my main focus was to turn orchestrator into a failure detection and recovery mechanism that could promote new primaries, take some servers out of the game and completely re-hold the existing topologies. Or rethink the existing topologies we have at Booking.com.

Matt Yonkovit: Yeah. And so as that was developed, I mean, you started to get more contributors, more people started to look at it, I mean, being a project that, at least at first, is owned by one person, and really maintained by one person, was there a point that you reached, where it’s like, Wow, I’ve got a lot more work than I can handle? There’s a lot of requests coming in for features and a lot of ideas. Like, it’s always hard as a project starting out to really reach a point where it’s like, Hey, this is successful. Oh, my God, it’s successful. What do I do? Are there any tips that you have for people who have their own projects that are starting out that might be useful for folks that, hey, you’ve got a project like orchestrator that’s starting to pick up steam? How do you start to handle the prs, the bug requests? How do you keep on top of that, because it’s really a labour of love at a certain point. And then you have to eventually build something a little greater.

Shlomi Noach: Absolutely, those are excellent points. And one of the things is that I never took orchestrator to too much into my personal space or my personal life, like I mostly work on orchestrator during work hours, so sometimes I was allocated to working on this open source project. And so I always put my mindset into both, this is not going to eat away my weekend. And is this a good use of time for my employer? And so I prioritise pull requests and issues based on does this make sense for the business that pays my bill right now? Because I am now using their bill to work notes. And that that was kind of, No, what am I working on today general purpose like what am I working on the high availability today, then that’s my focus, I will be more attentive to pull requests and issues regarding high abitability than those about fancy dashboards. And so that was my primary drive. The other thing is to really be focused about what you really want your product or project to do, and try to limit.. Community is always going to come up with requests, and use cases that you’ve never thought of. And it’s really important to know how to push back and say, I’m sorry, that’s not the main business of my product, like it’s worthwhile that you invest in integrating that into your Grafana, or into your fancy dashboard, whatever it is you want to do, here’s an API. Okay, I can give you everything through the API, I’m happy to push through things through the API. And you’ll consume that and do whatever you want with that. So that’s the other thing. In the third…

Matt Yonkovit: Boundaries are really important, really?

Shlomi Noach: Yeah, absolutely. And, and perhaps the most important of them all is that I appreciate people’s desire to contribute. But anything that’s contributed into an open source project or product that I maintain, I’m going to be the one who has to support it. And if something doesn’t feel right to me, like if, if it’s too big, someone’s going to ask me a question about that. And I will have no answer, because I really don’t understand the code or the purpose of the code other than that that’s unfortunately going to be rejected. And sadly, so but that’s my way to keep sanity.

Matt Yonkovit: Understand, yeah, those boundaries are super important, because there are people who have very specific niche use cases for their contributions, that they’re running some oddball platform or stack where they are. And it’s very, like the 1% would use it. And you have to look at well, how do we cover, most people are going to download it and add that might blow, it might break things, it might cause our whole re-architecture. And it’s interesting, because some people are okay with it, other people get very upset. And it’s a fine line. But I think it’s a good message to folks who have their own projects, to focus on those boundaries, and make sure they know, not only, hey, I’m not going to let this consume all my free time. But I’m going to be careful about the purpose. Otherwise, this could just take on a life of its own and end up with something that we don’t want.

Shlomi Noach: Absolutely. And I was very fortunate and lucky to have community engagement that was always very positive, just appreciative and polite, and people just genuinely wanted to contribute. So I think if you remain on that level, and explain to people this doesn’t work out, people will understand, and I was very fortunate to never get into this situation where someone was terribly upset, or it just never went there. So I’m really happy. And, frankly, there were a few contributions where I thought there were like you saying, the 1% margin and like, it seemed a long tail, I’m never gonna be there. And you know, one year later I was there. So, sometimes it’s also good to accept those seemingly longtail proposals, because you’re going to enjoy them eventually.

Matt Yonkovit: Right. So fast forward, past orchestrator you moved to GitHub, you started working on ghost, which schema migrations is long in issue, especially with people either doing migrations or you know, having CI/CD pipelines that require additional work. You know, how did that project come about? So how to ghost began, and what was the purpose for you when you created it?

Shlomi Noach: Yeah. So early on when we were in GitHub when I say several of us on boarded at the same time into the database infrastructure team, we realised we were using PT online schema change, which is a great tool, and very much appreciate it, but it didn’t work for us pass in some point, at some point in GitHub’s growth, he just didn’t scale. The triggers, the parallel code, the fact that it wasn’t auditable, or interactive, made it infeasible to actually migrate some tables, there were literally several tables which are marked as risky. And we couldn’t migrate them, or the migration was so heavyweight, that we were used to like postpone changes to collect, like, Here, let me gather two or more three requests for the stable and run everything at once. But obviously, this hinders the development flow. And this is not a healthy situation to be in. And so we came up with a during maybe summit we had in New York, and John Berquist, my dev colleague, and later my manager said, Can we do that like tail the binary log and do asynchronous schema migrations? And to which I originally said, Well, no, you can’t do that because… huh, hey, I think we can do that. So I think later that we took a long walk in Central Park, and that’s where the ghost was born. We basically laid out the principles and the general design. And he took about four or five months to create ghost. And since we first deployed it, we never looked back, it worked very well for us. It. So it answered many of our most general purpose as well as in Terminal development flow. Like it was very lightweight, as opposed to trigger based heavyweight, it was localised, which really helped with git, which is very write intensive, which leads to looking fancy, as well as being very interactive,cintegrated well with our automation with you both with chat Ops, like we could communicate to goes through channels and say, Hey, goes, What’s up? What’s the status? Hey, suspend, throw, abort, abort. Please continue. Right. So it was really fun for our developers to be able to. So we kind of handed the keys to a lot of developers, hey, do you want to see what the status of your migration is, you don’t need to talk to us, the DBAs. It’s all there for you in chat, you can interact with that you can actually control it if you really have to during an incident. So that gave us both, like, more lightweight migrations and less outages, as well as alleviated a lot of work on the database infrastructure side and giving more power to the developers.

Matt Yonkovit: Yeah, that means that it’s great. And I’ve heard a lot of people have made use of that, like I said, as part of those pipelines for the development side. I think right now we’re in a phase where we are as an industry and as database professionals, we’re about enabling developers to self help. And I think that that’s one of the critical things, you mentioned, ChatOps, being able to let developers check the status of their migration, kick things off you know, everyone that I have talked to in the industry is looking at ways to basically develop their own Database as a Service type features functionalities for their infrastructure. And this is just another component of it is how do you do schema migrations? How do you do backups? How do you do this, that the other thing and make it so the individual can kick those off. And so I see that those are important things and kind of the trends where the industry is going now, I know that that posts that you’ve moved on Vitess I see you’ve got your shirt on, you’ve got your shirt your jacket on. So what are you working on now at planet scale? So you help the community with orchestrator goes, What challenges are you tackling now with planet scale and and on the BTS project, maybe give us a you know, a heads up here.

Shlomi Noach: So you really handed the ball here because you would in about, however, company tries to build the databases infrastructure for its own, and that’s exactly what I’m doing at Vitess. So to shed more light. I work on PlanetScale. Which is the company that helps build Vitess. Vitess is a CNCF open source project, it came from YouTube, but then contributed to CNCF. PlanetScale doesn’t actually own Vitess. But we are the main contributors to it. So we have co-author of Vitess and a dedicated team lead and a bunch of quite a few developers. Vitess has maintained this outside of PlanetScale, but we’re kind of influencing it the most because we are, that’s what we do, right, so we write Vitess. But then we also have a commercial offering, we’re kind of in a stealth mode. So I can’t elaborate too much about that. But obviously, it builds up on top of the Vitess. So Vitess is in our main interest to be a stable and capable tool. Now, why exactly Vitess is, people used to refer to Vitess as a sharding solution for MySQL, I like to describe it more like a framework, a database framework on top of my skill. It’s a massive scale solution that you run on top of your database service. It masquerades as a database server, it speaks to by scale protocol, you will talk to Vitess if you’re talking to MySQL Server. But behind the scenes, there’s many components like the vt gate, which is a proxy, and vt tablets, which are like agents running on your database servers in vt ctld, which is a demon that operates things in the background. And the magic of Vitess is because it masquerades as a database you can make believe, as if it’s running something very simple, while at the same time managing a complex operation in the background. So that’s most of my work, but also a little bit on the commercial side, open source side, everything is visible, Vitess are public and visible. Let’s talk about schema migrations. You know, one of the biggest problems with schema migrations is that they’re not very accessible to developers, right? We discussed this, you want to make things more available to developers. And still, if a developer wants to run, whether it’s be the only schema change or ghost in production, then you know, they need to SSH or log into some server and then run the script or tool and they need to supply these command line variables. And they need to know how to throttle correctly. So it becomes complex and in a small company that may be sustainable. But the larger the company is, you can’t give all the developers all the keys to all these database operations. But what if we could just masquerade everything and make believe that you’re actually running a simple alter table, and let Vitess figure out all the complexity behind that. And that’s exactly what we’re doing today. So, a developer using Vitess today, can set a session variable that says, hey, I’m running an online DDL Don’t tell anyone. And then I’m writing my normal table or creative or dropped English. And the test will take that into a breakdown apart, and it will say, Okay, here’s an alter statement, either fix the stable, or, hey, I have four shards for this table. So I’m going to spread this request across all shards in each of these shards will say, oh, okay, I have these pending migration, I’m going to kill this request for a while. And then I’m going to run it with ghost or with PBR schema change, or using the Vitess mechanisms themselves. And I’m going to just make that happen. And I’m going to auto throttle, because Vitess know the exact topology of database service, no one needs to tell me who the primary is and who the replicas are. I can figure this out myself and I have my own heartbeat mechanism. You don’t need to set up the heartbeat mechanism. And I can do the cutover manually or automatically and do whatever it is you want to do. And at the end, a developer just issued an alter table, but everything ran online, everything ran asynchronously, and the developer is able to audit and control and cancel or retry immigration. And that’s, that’s kind of phenomenal. And we’ve added things on top of that. So just recently is as of just recently, it has no support reverting migration, so they didn’t ever happen to you. Oh, really? Yeah. So they’ve ever happened to you that you drop the wrong index in production, would you drop the column and everything was supposed to be fine in a perfect world, but turned out to be an incident and pages are now ringing and then you realise that Oh, no, this was a mistake, I have to revert that. In fact, I just read the GitHub incident report for this month. And apparently there was just this type of incident two weeks ago. And we had, I was part of that team. We had those during my days. For sure. This just happens. We live in an imperfect world. So Vitess lets you figure out like 10 minutes later, oh, no, I did a terrible mistake here. Let me revert that migration. And given some constraints, you need to use the Vitess schema changes etc, you can alter back the table into its previous position without losing all that data that you accumulated during those 10 minutes. Well, that’s like insanity. Right? That’s insanity. Yeah. How about record recovering migration myth fail overs, right, you have you run this migration, and it’s been running for two or three weeks. And like, please don’t let there be a failover. Or else everything goes to the garbage and I need to restart, Vitess can solve that for you can keep up running that migration from the same point where it broke. And so there’s Oh, like, so it’ll just pick up and go. So it’s mind boggling. The amount of complexity that would otherwise be impossible for a normal developer DBA to do in the middle of the night, right during an incident. And now there’s a bunch of automation that just makes it smooth. So those are the things that I work on, schema migration wise, for example, and there’s more coming. I’m really so excited about that, because it really changes the game. It makes schema migrations a no-brainer. There’s many more stuff..

Matt Yonkovit: I can tell you. What, before we get into the many more stuff, I can tell you that I talked with over the last few years, several people, different companies, different users. And I like to talk to DBAs. And one of the questions I always like to ask is what’s the biggest problem you’re facing? What’s the biggest challenge that you run into? And constantly, I get it’s it’s, it’s deployment related. It’s we deployed bad code into production, we had to roll it back, it caused a mass, it caused these issues. And that is a constant thing that people struggle with. Because, as you said, as you start altering complex systems and complex data structures, there’s a cascading effect potentially. So while to revert code might be fairly straightforward. Sometimes there’s challenges there as well, of course, but you know in the past, it’s been, oh, yeah, I’ve committed the changes. I’ve added the columns, I’ve dropped the columns. Oh, my God, now I’ve got a big operation. So that’s actually a really big deal and a really cool deal. How does it work? If you don’t mind me asking? I mean, I don’t know how technical you want to get on this particular call, but what are you doing behind the scenes there?

Shlomi Noach: Sure. So the logic is based on verification. So let me do a quick brief of what verification is, it’s one of the secret hidden, hidden secret secret sauces, whatever you call it, in detail, okay. She’s, if you know how ghost runs, kind of see a similar mechanism where verification allows you to migrate data from here to there, online. So what migrating data exactly means is, you can move a table from one cluster to another live and then be able to switch reads or switch rights or switch your entire traffic to use the new table. Or you can build a materialised view of one table implemented as if it were a normal new table or you could re-shard tables like those all these are things that replication does. So, in what way it works, like ghost, it copies data or like PD on the schema change, right, it needs to copy the entire data set of one table into another table space, but they need to catch up with the change log with the being log events that take place over during that time during that copy, which could take hours and hours. So we reimplemented schema changes using a verification right. The preferred way today would be Vitess is going to be don’t run periodically change, don’t run ghost, run verification, run the internal Vitess supported flow. And verification is different from ghost and gives us schema change in that he does everything transactionally and keeps a state of the progress. So you copy a bunch of rows from the original table into the new table, and you apply some binlog events during that time. Anytime you apply the binlog events, you write a transaction into the target table. But then in that same transaction in another back end table, which keeps state keeps track of your state you indicate. Now this is the last bit of event or the last GTD sets that I’ve applied thus far. And as you copy rows you write down in that table, or in a different table, those are the rows that I’ve copied thus far transactionally. So that if the server goes down, or if you promote a new primary, the same changelog is transactionally persisted in those tables, and you’re able to pick up from that exact location and continue copying rows from that location, or you can continue applying billows from that location, that’s for failover. If you want to revert, then you know how, for our viewers or listeners, you know how all online schema change tools work is by actually creating a new table. I like to call it the ghost table or the shared table. And you know, you slowly migrate data, you copy data over from the original table into the ghost table. And at the end, you flip the tables right and you lock them if you need to. And then you rename the original table into an old table and you renamed the ghost table in place of the original table. So during that time, during that atomic rename with freeze the table, right, we put a lock on the table we evaluate the rename, but then we also take note of what the exact gtld coordinates are at this time, what’s the security ID at this time. If 10 minutes later, you want to revert your migration, we treat it as kind of a new migration, we say, okay, you want that schema right here. We already have that table from 10 minutes ago, we kept, we put it for safekeeping. We know that schema, we’re going to reevaluate what the migration statement would look like. And then we would kind of run a new migration, only, most of the data is already there, we just need to keep up from that GD ID set, which we recorded. Okay, previous failover. So all you need to pay for those 10 minutes is the time it will take to catch up with those 10 minutes for that specific table, which is likely to end up with maybe a minute or two. So that’s really magical. And it builds upon. It was literally just patching things. I didn’t need to write a lot of code to make that happen. And I was just utilising existing verification logic, a lot of building blocks, which were, they were just for me, they were there for me to just reassemble and, and tailor to create that. And that’s the amazing part.

Matt Yonkovit: Well, that’s cool, because yeah, I mean, and anytime you have an open source project, you’ve got you know, the building blocks there. A lot of times the coolest features are the glue between those, right, so it’s not the deep engineering, oftentimes, that’s already done, because it’s that glue that makes it so much easier for everyone. And I get a lot when I talk with people about Hey, have you ever thought about contributing code? Or sometimes people are scared because they’re like, Oh, I’m not a C programmer, I don’t know deep logarithms. I don’t know that. And it’s like, you don’t have to in a lot of cases. There’s some really cool, powerful things that you can do. And this sounds like a cool feature that I could see a lot of people making use of, definitely. Now you alluded to other cool. So I think you were gonna tell me about some of the other cool things that you were working on.

Shlomi Noach: Yeah, I mean, it’s kind of endless. So while I discussed schema migrations, I kind of mentioned throttling. So one of the other big problems in relational databases, or you know, with MySQL replication topology is really you have one primary that takes all the rights and then you have all those replicas which play catch up with the primary right? And one of the big problems is for them to be able to catch up and you know, you have the CTL you This massive job, it’s now uploading tonnes of data into some table. And then all of a sudden the replicas are lagging for 90 seconds. And that creates stale reads. And it’s generally unhealthy. So one of the other things is that we have a built in throttling mechanism. And it’s based on work that I did at GitHub with my colleague, Miguel Fernandez. But now, we took it in order to be an integral part of Vitess. And so if you are going to do like a massive right, hint-hint schema migration, or hint-hint what kind of a pity archiver job where you need to purge massive amounts of data or update massive amounts of data, there is a throttle service that you can consult with and ask, is this a good time to write to the primary, right. Are all the replicas in good health like this? Okay. Yeah. Okay, cool. Let’s write some stuff. No, let me check again, and then again, and then again. And so the different apps don’t need to figure out oh, what’s the list of replicas or they’re lagging? What’s fresh of note is this one source of truth that tells everything like, now’s a good time now is not a good time. And so it becomes an authoritative throttling mechanism. The other thing, I hadn’t been more…

Matt Yonkovit: So that would be like, Oh, so that would be like, during, like the holiday shopping season and somebody decides to archive half the data. You know, you could say, Oh, well, we want to throttle that, because we do not want to impact the users buying stuff. Exactly. Right. Because those back end jobs yeah, that’s a problem.

Shlomi Noach: Yeah, exactly. So and so you can actually have priorities of back end jobs. I mean, some of them could be high priority. This is part of really an online retail action right now, these data needs to move from here to there, this, this has to be online. But this schema migration, it can wait for three more minutes, right? It doesn’t have to run right now. Nothing bad happens if he takes one hour in three minutes, as opposed to one hour. And so this one gets lower priority and can wait a little longer.

Matt Yonkovit: I was going to ask and I mean, we’ve got about 15 minutes here left in our time allocated, I wanted to ask you this question. I’d love to ask everybody that I talked to about what they see coming on the horizon? What are some of the trends? What are they hearing, as they talk with people in the community, as some of the cool new things, or new challenges that we’re facing out in space? So I’m just curious about what you’re seeing, what you’re excited about? Maybe you’re concerned about things like that?

Shlomi Noach:
So you know, I’m not a prophet. And I can’t always see the trends, right. But you kind of mentioned that before everyone is trying to run their own cloud service, database, Cloud Service thing, everyone wants to have their own databases service, everyone’s trying to automate away the most difficult problems that you have to deal with database with relational database, which is the model, the schema changes, the high availability, the consistency of all these big problems. And kind of everyone is trying to do that on top of Kubernetes these days, which is in itself a whole new level of problems. I kind of want to say that the things that we work on today, advanced skills that are exactly at those crossroads are exactly targeting those, those pain points. And so that’s like, by real passionate work in there that we’re solving, what could be the generic solution that the Holy Grail, if you will, or the grand unified theory of database infrastructure, that’s that, at least I’m hoping we’ll get there. But yeah, absolutely. A lot more companies today realise that they operate at scale. And a lot of a lot more companies today realise that they can’t throw more people at the job. DBAs are scars, their efforts cannot be forever utilised on doing manual labour of configuring this replication or creating that cluster or provisioning that service. Things have to be automated. It’s just not sustainable to let people keep on doing these things manually. And so I believe a lot of people, a lot of companies understand today that they have to find automation at scale. And that what previously, people would say not everyone is Facebook. Not everyone is Google. You don’t need to be Facebook or Google to be to hit the scale problems of relational databases, right. A lot of companies here.

Matt Yonkovit: I mean, even now, even now, more than ever. You see, with people doing cloud native architectures, they have microservices, everybody wants their own database, right. So you know, a lot of architectures now are about petabyte, a 1000 petabytes systems. Yeah, sure, you might have a petabyte system here or there, but you’re going to have 10,000 few gig databases that like end up here and there for everybody’s little side projects that you have to manage. And that becomes difficult. Yeah.

Shlomi Noach:
Yeah, absolutely. And so, yeah, I think that’s where people will have to be looking at, like, how can we automate? How can we make MySQL run bedroom communities? How can we what, what are good paradigms for high availability, for durability for consistency, those are the things people will focus on next.

Matt Yonkovit: So but what you said, you said earlier, now, you know that that you know, the Kubernetes thing. So now I’ve got my little widget thing here. And it’s got a switch. So so for Kubernetes. It’s not just like this. Oh, Kubernetes is on. So now it all scales. You mean, there’s more to it than just flipping that switch?

Shlomi Noach I’m not, I’m by far not not a Kubernetes. Expert. But we all know how difficult it is. I mean, we all know how difficult it is, in particular to run stateful applications on a Kubernetes cluster, and in particular, to run a database server on top of Kubernetes cluster. And that’s something like, I think there’s so much parallel work trying today. People trying to figure this out, in not enough collaboration in that space, if I may say so. People are just trying to figure out whatever works for their company, but we kind of need to understand what, what is a good theme, like, what’s the general purpose, good practice to just run a cluster, and from there on, how do you proceed to fit your specific bill. So this is quite a challenge.

Matt Yonkovit: Yeah, definitely. Well, Shlomi, thank you for taking some time today to chat with us. I really appreciate hearing about you know, your past projects, some of the cool stuff you’ve got going on BTS, it sounds like it’s gonna be pretty awesome. I know you’ve got a session that you submitted at Percona Live. We’re excited to hear all about that coming up in a few weeks, as well. So I hope to sit down and chat with you again, as new things come out. And as you wa

Wow, what a great episode that was! We really appreciate you coming and checking it out. We hope that you love open source as much as we do. If you like this video, go ahead and subscribe to us on the YouTube channel. Follow us on Facebook, Twitter, Instagram and LinkedIn. And of course, tune into next week’s episode. We really appreciate you coming and talking open source with us.

Did you like this post? Why not read more?

✎ Edit this page on GitHub