Link to listen and subscribe: PodBean
Pandit Prasad, Principal Product Manager at Amazon, and Vijay Karumajji, Sr. Solutions Architect at Amazon Web Services inc join the HOSS in this episode to talk about the cloud, AWS services, Aurora & RDS. Both Prasad and Vijay are presenting at Percona Live on Multi-Zone failover in Aurora & RDS. Get not only a preview of the feature and a bit more background into these Experts.
Pandit PrasadPrincipal Product Manager at Amazon
Pandit Prasad is a Principal Product Manager at Amazon RDS. He brings with him 28 years of Hi-Tech industry experience.
Prior to this role, he led the product management for IBM’s Db2, Big Data and Cloud portfolio of products (Monitoring, Automation and IT Analytics). Prior to that he led the global product marketing team at IBM. Earlier, as an IT Architect working for PwC and other consulting companies, he implemented ERP, CRM and Data warehouse applications for Large Enterprise including several Fortune100 companies.
Matt YonkovitThe HOSS, Percona
Matt is currently working as the Head of Open Source Strategy (HOSS) for Percona, a leader in open source database software and services. He has over 15 years of experience in the open source industry including over 10 years of executive-level experience leading open source teams. Matt’s experience merges the technical and business aspects of the open source database experience with both a passion for hands on development and management and the leadership of building strong teams. During his time he has created or managed business units responsible for service delivery ( consulting, support, and managed services ), customer success, product management, marketing, and operations. He currently leads efforts around Percona’s OSPO, community, and developer relations efforts. He hosts the HOSS talks FOSS podcast, writes regularly, and shares his MySQL and PostgreSQL knowledge as often as possible.
Matt Yonkovit: Hey, everybody, welcome to another HOSS Talks FOSS. I’m the head of open source strategy, the HOSS here at Percona, Matt Yonkovit. And today I’m joined by Vijay and Prasad from AWS to talk to me about all the fun things happening in AWS land and about them personally. How are you today?
I’m good. How are you doing?
Matt Yonkovit: Excellent. Excellent.
So, nice to meet you today. And then excited to talk about all the features we’re working on. And then let’s have a coffee, ticking the two.
Matt Yonkovit: Yes, yes, indeed. And, Prasad, how are things here?
That’s awesome. And I am excited to be here with you.
Matt Yonkovit: Okay, excellent. So I am curious. So I have lots of different folks on the show. I’ve had folks from AWS; I’ve had folks from the open-source space and other database companies. And so, and maybe I’ll start with you Prasad, how did you get started? Like in the database space? I mean, like, what did that look like? Where did you begin, your career here? And what got you interested specifically in databases themselves? Yeah,
My career goes quite a number of years back. So I started as an application, ERP, and CRM consultant. So I was mostly doing the front-end applications, ERP and CRM applications. But it’s always to do with data, right? So how do you make sure the data is presented to customers in such a way, especially when you come to CRM applications, it’s all about presenting the data in a way, that it’s easy for the customers to find whether you’re presenting to salespeople, whether you’re presenting to marketing people, or whether you’re presenting it to manufacturing people. So I had that background, that I spent a number of years at IBM, working in the database space, both on our DBMS, in terms of DB2 are in the big data portfolio of IBM. So had a number of years of background in that space, and excited to be part of the journey here at AWS with RDS.
Matt Yonkovit: So it’s interesting. So I hear Big Data, everybody has a little different version of big data, IBM version of big data is probably very different than other people that I know. Maybe define for us your definition of big data, because it’s one of those things that everyone treats a little differently. Everyone says big data is a little bit of a different thing. I’m curious, in your experience, what have you seen out there that is quote, unquote, big data? Is it just a lot of data? Or is it unstructured data? Like, maybe give us a bit of insight into that?
So to be precise, I was part of the Hadoop or GID at IBM. Yeah. So if you put on that lens and look at the big data, it’s everything that you mentioned. Right? So, it is structured data, unstructured data, streaming data, and data address what do you call NoSQL Data, any type of data is data or just files, but you want to mix them all together. So, what I used to think about is having my background in CRM world, for example, I was with this company, I was an architect trying to help the support group, right. So, when a customer calls, the support guys will look up their accounts at that time, it was great when I provided them a bird’s eye view of the account. So, the customer is calling right. So, what are the products they bought? What are the tickets they have raised with you? What are the opportunities that are pending? Like, bringing this data together itself was so awesome at that point?
Matt Yonkovit: Yeah. And it’s so critical for that now because everybody has so many disparate systems, right? It’s no longer just a single database. When I started, I did Oracle, we had like, the Oracle database in the back it was as opposed to the like many, and then it eventually grew. But still, now it’s 20 different sources, or 100, different sources, and they all have to come together at some point. Yeah. And
Then gradually improve to hey, I’m going to add more richness to this data. So now I’m going to incorporate the weather data, the social media data, etc. And then it started getting into the sentiments of the customer. And then like for example, you have your insurance companies now? So multiple insurance companies, all know your driving history, they all know how many tickets you’ve got, and they all know what activities you are up to. And now their premium is based upon all those data. So these are all like the richness of data that is coming from multiple sources, not just a single source. And that’s what big data enables you to put together information from so many disparate sources.
Matt Yonkovit: Yeah, it’s both great and frightening at the same time because there is so much data out there, that we become digital packrats, a little bit where everyone wants to keep everything, regardless if they need it or not. They might not know why. But in the future, they might see a reason. And that means that not only are we getting more data in more different places, but we’re also getting bigger data, right, like bigger, bigger systems of record, which is, which is a challenge in and of itself. Vijay, I don’t want to leave you off of this conversation. Maybe tell us a little bit about your background. right now you’re working as a Solution Architect at AWS. But how did that come about? Where did you start? How did you get involved in databases?
Oh, I started my database journey like 15 years ago. I started with SQL Server, also been with SQL Server for a while, around like 12 years with SQL Server. I’ve been with SQL Server, Washington, company all worse, and six, all the way up to the fall versions, right, SQL Server 12, and 14. So that’s how I started my journey and worked for various different customers. Throughout this SQL Server journey, I did consulting work I did. With Southwest Airlines, I worked with Blue Cross Blue Shield, I work with Verizon customers, I work with multiple customers, right. That’s how I started my journey. And then with AWS, I started my journey as a Cloud Support Engineer. And then because of this unique experience that I have, with all the databases that I have, then I happen to support the SQL server at AWS. And then slowly transition me with one relational database, you mostly know, all databases, all the technology behind the scenes I’ve seen, so you have got a file log file, they look the same. So the features are different, and the way you access them is different, right? So slowly, I started picking learning, MySQL and Postgres open-source engines while I was working as a Cloud Support Engineer, and then I immediately become an SME and MySQL. So, I started enjoying MySQL, open-source databases and then having both versions of MySQL and Postgres versions of open source I got exposed to and start loving them. So, that’s how I started and slowly transitioned into solution architect and solving problems with the customers.
Matt Yonkovit: Very cool. And while there are similarities the difference between MySQL and SQL Server, there are differences. And that’s not always an easy journey to make that transition. Because you scratch your head, you go, why did they do it this way, this just doesn’t seem I’m kind of expecting this to work, and it doesn’t, so I get that journey, I started as an Oracle and SQL Server DBA, and kind of transitioned into the open-source 15 years ago. So it’s been a while for me as well. But I do know that there does take some getting used to, and there is a learning curve. So kudos to you for being able to pick that up and go that route. Now. What’s really interesting is your journey through the customer space and working with customers helping to resolve issues. And this is one of the favorite topics that I have with talking with the folks here at Percona, we have our own support staff, and consultants, and I always like to hear about like, Hey, what are we seeing the customers struggling with? What are the topics that maybe we wish that we could educate customers on or help them with, whether it’s new software or new tooling, or just like, please don’t do this thing anymore? Right? Like I say that a lot. Like, I go out there and I begged them, I go, please, please pay attention to your data types. Please, please don’t, don’t neglect these indexes. But maybe if I get enough people to say similar things, they will listen. So are there things that you’re seeing in that customer space that are starting to evolve that are really important and critical that either you’re trying to address or you would like to tell people like hey, you might want to look into this thing or avoid this problem?
My journey with AWS was it’s a little different from the way I see the customers right, so the customers better handle all the while levels work. Working as a support engineer, customers are pretty good. They know about databases, and some customers are learning to adopt, them because they don’t want to deal with the databases, they want to be a deal and only want to concentrate on their business. So you have like different kinds of customers, right? So whenever you have customers who already know about the databases right, their challenges are totally different. They want to do something scaling activities, they want to do something totally different. Some customers I see are new to the database world. And they really don’t want to handle those databases, they just want their business to run. So those kinds of customers, you need to educate them, right? You have to be patient with telling them, hey, there is a difference. I know your billing, but there are some things you need to still take care of when you’re creating tables, right? You need to take care of the data that you’re using on the indexes that you paid in, don’t just blindly create hundreds of indexes on a single table that is going to impact the performance as well. So I know it is going to improve, but it is also going to impact the performance. So these kinds of things I normally used to see customers. So two different sets of customers I have seen.
Matt Yonkovit: Yeah, yeah. And so it’s interesting having both of you from different backgrounds, right. So the big data space for Prasad and you for MySQL SQL Server. I don’t know if you’re familiar with Facebook’s internals, but they have a lot of MySQL, and they have a lot of relational systems. And I always found it funny. The team there manages these petabytes worth of relational databases because big data has been taken. They call themselves the small data team, even though it’s petabytes of data, right. So they’re like, we’re part of a small data team, but it’s bigger than anybody else is, systems out there. And I think that you mentioned, Vijay, there’s, there’s kind of two different customers, and I’ve experienced this as well. Some are very keen and understanding of the database, and they know exactly what they want. And they tend to be able to handle most things on their own. So they’re looking for guidance, but they’ll just go off. And then once you tell them, like, point them in the right direction, I’ll just go. But I find that where I think the biggest battleground for all of us in the database space going forward is really more on the other group, which is I’m so busy, I want to focus on developing my apps, right, I don’t want to worry about like availability, or I don’t want to worry about performance, I don’t want to worry about these things. I just want it to work. And a lot of times, they make some mistakes in the design, which caused scalability issues later on. But this is where I think that there’s a lot of tooling and automation and things that are out there that are starting to help try and overcome and cater to them. And that’s one of the cool things about RDS and Aurora is it enables a lot of those folks who might not want to do that to get a basic to start and to kind of progress in the right way. Right. So it gives them something that’s easy to start with. And then they can progress at their own pace, and their own skill level and grow into that. And I know, both of you are going to be coming to Percona Live to talk about some of those new features, one of them being the multi-AZ. And so those who aren’t aware, you can do high availability, and you should if you have a mission application hear about in Aurora rds. But what we’re talking about here is the ability to have faster failover and better response time, it’s a slight architectural change. So, Prasad, why don’t you tell us a little bit about this change and what we’re going to be seeing within the system.
Sure, as you said, we introduce a new option for our existing multi as a feature. So the current model is a feature called multi-AZ with a single standby. And the new deployment option that we introduced is called multi-AZ with two readable standby instances or multi-AZ DB cluster in charge. So essentially, we are going from two instances to three instances where the two standby instances are also going to be available for reading workloads. So there are a couple of differences here. The first one is in the multi-AZ with a single standby. The standby instance is not available for any reads or any kind of application access. It’s just there as the active-passive standby mode. In this newer option, you have to redouble instances as the name implies they are available for you to it’s more of an active-active mode. In addition, we are providing two different cluster level endpoints, right so that’s one endpoint for write workloads. You can put write and read workloads on that endpoint. The other endPoint is strictly for reading workloads where the two readable instances, standard instances will be load balanced and will be serving the read workloads. So that provides scalable grade capacity for the cluster as a whole.
Matt Yonkovit: And Vijay with that type of a setup you’ve got applications and when you’re migrating applications sometimes that read-write split is difficult from an application perspective, because most applications, did they have a connection? It either works or doesn’t. So are there some best practices or things like to kind of enable, that read-write split? Does that make sense here?
Yeah, definitely. So Prasad had pointed out, right, whenever you quickly create this multi-AZ cluster with two readable standbys, by default, the RDS provides you two endpoints in your application, you need to use these two endpoints. One is the cluster endpoint, which always points to the writer, right, in case of a failover, it is going to update that for you. And then you don’t need to do any application changes at all if you use those cluster endpoints. And then if you have any workload that requires read activity, or then if you have like monthly reports or weekly report, something any other application that has this kind of activity, or your regular OLTP workload with some real activity, then you need to use the second endpoint, which provides which is the reader endpoint, which points to both the read replicas within that cluster. And then it distributes the workload between those two readers, that round-robin fashion. So you don’t need to do anything, right, you just need to use that endpoint, and then we behind the scenes automatically distribute the load between both the reader instances and then we will improve the read throughput, if you will, right behind the scenes, you can leverage the two reader instances. So all you need to do is just use in your application endpoint, make sure you use that reader endpoint for your reporting applications. And then as part of the Percona discussion, right, I’m showing this as part of the demo, right? We are showing this demo, how I used in a containerized application, and how I use this cluster endpoint reader endpoint. And then how will you distribute the load between those two regional instances? That’s what we are showing as part of the Percona Live session.
Matt Yonkovit: Awesome. Awesome. Now, I’m curious about one of the challenges, and I don’t know how you approached her to solve this. One of the challenges with the read-write splitting in most applications or most databases is the possibility that the replicas get out of sync. Is that something that you’ve spent some time trying to resolve? Or is it still a best practice? If you have something that is very time-sensitive that you go to the primary?
Yeah, sure. As you said, we have two clustering points. One is the writer and the reader, and the reader has the read workload. But replica lag is a more important concept in this setup, because of a couple of reasons, right, so we have the reader endpoint, which is serving the read workloads. But under the covers, it’s also committing or writing transactions that are coming from the primary to keep all the instances in sync. So essentially, it is doing two types of activities while the right activity is hidden from the user. So this may cause replica lag under certain circumstances, which we need to be aware of, and so need to distribute the load accordingly. So for example, I’m having a lot of write transactions coming in. And at the same time, I’m trying to just overload the read replicas with tons of reading workloads. And so the read replica is going to be totally occupied. All the resources are fully occupied serving those read workloads, and it may not have enough CPU cycles to update the transactions that are coming from the primary. So that can cause a replica lag. In the other instances or I’m serving the other way around, right, I’m starting normal read workloads, but I’m having a huge volume of concurrent transactions coming in from their primary that’s trying to apply more time. In final use cases, I’m trying to do huge volumes of DDL statements are doing your manual data upload into the primary. So all these could cause replica lags. It’s important that we are aware of it in order to make sure that replica lag is within the tolerable limits. We have also added a mechanism that can use to control the replica lag between the primary and the secondary instances. It is exposed it’s called Flow Control. Um, it is exposed in terms of a parameter value that you can use to set. What’s the maximum tolerable limit. And once you hit that limit, this mechanism will kick in and start slowing down the transactions into the primary so that readers can catch up with the primary.
Matt Yonkovit: No, excellent. Cool. Pretty cool. So I’m curious. Now we’ll move on a little bit here. So I am a nerd. I like technical stuff. I like to learn new technology. So I’m curious. What sort of new things are you learning? I’ll take your AWS hat off, maybe it’s something that you’re excited about. I like to ask this question just because I like to learn myself, sometimes people have some interesting technology have never heard of, or they have some new thing that they’re trying to implement or figure out. I can show you stuff that I’ve worked on recently. But Vijay maybe, tell me a little bit about some of the things that you’re interested in that you’re starting to learn, maybe get excited about. And then we’ll go over to Prada.
Yeah, definitely. So as you know, I already started my journey as a SQL Server DBA, and then slowly transitioned into MySQL. Now I’m learning Postgres, which is enjoying their Postgres, the Postgres documentation is really good. The product is really good. So every day, I tried to learn, read, do some experiments with Postgres, and try to learn how these things work here. What are these presumably extensions here? What are these extensions? Yeah, what are these extensions, so learning about those extensions, and all those things with that, myself, it’s an individual interest, learning about exploring about ml services, how they are, as we talked about big data, right? So how do these ml services help these big data, how to analyze this big data and all those kinds of things. I kind of like trying to write my own models to try to see how I can mostly analyze all this data that I’m that I have, how I can leverage these ml services in our relational databases and how I can bring value to this relation that in this world, that’s what I’m learning.
Matt Yonkovit: Excellent, cool. And Prasad, what’s going on? what are you interested in? What can you recommend to me to start thinking about?
So as we discussed earlier, I come from the database and big data of all but I am relatively new to AWS, and here I find the offer multiple databases, SQL Server, Postgres, Aurora, Oracle, MySQL, etc, so forth. So on so far, then it’s interesting to see, and understand the choices that the customers make what are the nuances in which they are interested, in choosing a specific database for their applications? And the second important aspect is the scale on which we need to operate. Right. So you have tons of customers having so many different databases, they want them to be highly available, and managed all the time. How do you manage all that? No, is another important? Yeah, yeah. That I’m looking into.
Matt Yonkovit: Well, Vijay and Prasad, I want to thank you for coming on chat with me a little bit, helping me understand a little bit about both of you talking a little bit about your Percona Live, talk sharing a little bit about what you’re learning about, thinking about, I appreciate it. I appreciate you coming on and chatting with me today.
Thank you, Matt.
Thank you, Matt. Thanks for your time. Yeah. All right.
Matt Yonkovit: Thank you very much. And for those who are watching or listening go ahead and subscribe, like, follow us and come to Percona Live where you can hear from Vijay and Prasad in person, which is always fun. It’s good to see people virtually, but it’s better to be in person. ∎