Uber Technologies, Inc and Uber’s Custom Built Database DocStore - Percona Podcast 36

August 11, 2021 11 Aug 2021 by Ovais Tariq, Himank Chaudhary, Matt Yonkovit

Link to listen and subscribe: PodBean

Uber Technologies, Inc., commonly known as Uber, is an American technology company. Its services include ride-hailing, food delivery, package delivery, couriers, freight transportation, and more. It is one of the largest firms in the gig economy. In Episode 36, The Percona HOSS Matt Yonkovit invited Ovais Tariq, Sr. Engineering Manager - Core Storage @Uber and Himank Chaudhary, Staff Software Engineer, Uber Technologies to talk more about Uber’s custom built database DocStore. Recently the pair also teamed up at Percona Live to deliver the talk “Docstore - Uber’s Highly Scalable Distributed SQL Database” to give us more details about the technology.

YouTube

Link: https://youtu.be/7ujfyr6h_sc

Ovais Tariq

Sr. Engineering Manager - Core Storage @Uber

Engineering leader with 15 years of experience in database infrastructure and site reliability engineering. Currently building the core storage platform and leading production engineering for storage at Uber.

Ovais is a Sr. Manager in the Core Storage team at Uber. He leads the Operational Storage Platform group with a focus on providing a world-class platform that powers all the critical business functions and lines of business at Uber. The platform serves tens of millions of QPS with an availability of 99.99% or more and stores tens of Petabytes of operational data.

See all talks by Ovais Tariq »

Himank Chaudhary

Staff Software Engineer, Uber Technologies

Himank is the Tech Lead of Docstore at Uber. His primary focus area is building distributed databases that scale along with Uber’s hyper-growth. Prior to Uber, he worked at Yahoo in the mail backend team to build a metadata store. Himank holds a master’s degree in Computer Science from the State University of New York with a specialization in distributed systems. Himank has proven ability to learn challenging concepts quickly and he has developed competencies in diverse areas. He easily adapt to change and welcome the opportunity of a new environment.

Specialties: Application Development, Algorithms, Operating Systems, Distributed Systems, Data Intensive Computing, Big Data, Networking, MultiThreading, Android Platform, Parallel Architecture, OpenMP/MPI, GPU Programming Languages: C,C++, Java, python

See all talks by Himank Chaudhary »

Matt Yonkovit

The HOSS, Percona

Matt is currently working as the Head of Open Source Strategy (HOSS) for Percona, a leader in open source database software and services. He has over 15 years of experience in the open source industry including over 10 years of executive-level experience leading open source teams. Matt’s experience merges the technical and business aspects of the open source database experience with both a passion for hands on development and management and the leadership of building strong teams. During his time he has created or managed business units responsible for service delivery ( consulting, support, and managed services ), customer success, product management, marketing, and operations. He currently leads efforts around Percona’s OSPO, community, and developer relations efforts. He hosts the HOSS talks FOSS podcast, writes regularly, and shares his MySQL and PostgreSQL knowledge as often as possible.

See all talks by Matt Yonkovit »

Transcript

Matt Yonkovit: Hi, everybody. Welcome to another HOSS Talks FOSS. I am here with our famous two engineers from Uber. I’ve got Ovais Tariq and Himank Chaudhary. And hi, welcome to the HOSS Talks FOSS. I appreciate you guys stepping by. How are you guys doing today?

Ovais Tariq:
Doing good. Thank you, Matt, for having us here. It’s great to chat with you. Normally you would meet up in Percona Live in person, but you know, pandemic doesn’t didn’t really allow us to do that. So I’m looking forward to talking to you in person as well. But great to be here. Great talking with you.

Matt Yonkovit: Well the good news is online conferences are still here, but in person are starting to show up. I know, like this year, for instance, the OSS summit, and AWS Reinvent, those are all going to be in person. So there are going to be opportunities, especially getting towards the second half of the year, and next year for all of us to potentially meet up at various conferences. So that’s gonna be really exciting. And I don’t know about you, but I’m really looking forward to getting out of the house. I think I smell from all the time here in the house like, you got to pull that off. So not a great thing. But yes, really looking forward to seeing you again, in person. For those who are listening, I don’t know, Ovais used to work at Percona. And so I’ve known Ovais for for years and years now. And it’s great to see him do all the awesome things that he’s doing. Oh, advice, why don’t you tell us a little bit about your background, just for those who are listening who might not know who you are?

Ovais Tariq:
Sure, yeah. So I have been in the database industry for 15, whoa, for more than 15 years, and including procurement time that you mentioned. Apart from that, I did some time at a startup called lithium, also doing the database instruction over there. And then for the last five years, I’ve been at Uber. Again in the database world. And here at Uber, I lead the storage org, my team is responsible for providing a fully managed database, blob storage and cash solutions. And you can think of it as a private cloud pass offering if you were to compare it to AWS or Google, but fine terms users, so you can tell the internal users or the internal services get the same kind of feel that they would get if they were using Amazon. They don’t have to worry about the operational procedures or their operation leaves, everything is fully managed for them. So that’s kind of what we’re summarising, what I’m doing at Uber.

Matt Yonkovit: Okay, and what just to clarify, when you talk about users, you’re not talking about Uber users, you’re talking about developers within Uber, and those who are developing applications or using the data for analysis or whatever.

Ovais Tariq:
Yeah, thank you for clarifying that. Yes, when I talk about you users, and this is a term that we use internally, as well. When I talk about users, I am mainly referring to the developers that Uber who are developing all of the services that make up the Uber ecosystem.

Matt Yonkovit: Now, what’s funny is for us database, folks, it’s interesting, because we’ve seen this evolution around users and who our users are, we used to build things specifically for application groups or you know, things like that. But now we’re building apps that serve people building apps, as opposed to just like having that one to one connection anymore. It’s about building and scale.

Ovais Tariq:
I think, yes, I think that cloud, I think we should really thank the cloud for that kind of transition, I think that we have moved away from in the past, we used to think more as a DBA. And you think of your database, you think of a DBA, right. But we have kind of moved away from that. And that is due to the explosion of services, especially moving towards a microservice environment, it’s not really possible to be a DBA for all of these services, right? And that’s where you start thinking about, okay, how can I provide a managed database solution, which is not just a database infrastructure, but he also can do most of the things that a DBA would do automatically, right? So prevent antipattern boundaries, provide an abstraction layer, restrict bad usage, right? And be smart enough to where I’m going with is smart enough to suggest how to write more performance, for example, queries, right? And when to be able to suggest what kind of schema you should have, what kind of indexes you can have. I think that is the way to really scale a DBA role in this microservices world.

Matt Yonkovit: Yeah. And so being in charge of that past solution obviously and that infrastructure, you need people who can understand and have that vision to implement a lot of what you’re trying to build. And that’s why we have Himank on the phone here as well with us. Hi, Himank. You know, who is architecting doc store and some of the other backend features that you know, you’ve you’ve talked about щт Percona Live, I think why don’t you give us a little bit of background about you and tell us about your role?

Himank Chaudhary: Yeah. First of all, thanks, Matt, for inviting us for this talk and forward to this. About me… So my main focus area since I did my school was in distributed systems, and mainly targeting distributed databases. Prior to Uber, I was working with Yahoo Mail there also, I was looking into the backend system where I was working with similar databases like HBase. And when I joined two more in the last five years, I’ve been part of the storage and the storage, I’m working on all different storage systems, what will be offered to our internal users. And lately, I’m involved with these stocks for project tech leading this sponsor project. And working along with OBS on that. And the journey so far is mainly in the distributed databases.

Matt Yonkovit: Okay, yeah. And I think that right now, we’re seeing this, this wave of database evolution into this distributed database mindset. Because of it, as you mentioned, like these microservices, and how developers are designing applications, they have to be resilient, they have to be portable, they have to conform to some sort of like standard, a lot of people are looking at deploying via Kubernetes, or via containers, and that changes the thought process around how you interact with the databases themselves. It’s no longer a single database, that is a monolithic system that everyone connects up to. Now it’s a new challenge. And that leads us to your guys’ infrastructure, so you’ve got a really big complicated data footprint, don’t you?

Ovais Tariq:
Yeah, we definitely have a huge footprint. If you look at two businesses in the amount of rides you’re doing, they want affordable. So if you’re going through and getting processed through the system, if you look into the Uber system, for example. So yeah, we have to deal with the massive amount of data, you have to deal with a massive amount of qps. And that automatically means that we have a massive infrastructure footprint as well. And that, again, is the reason I was mentioning that we have moved away from the individual DBA mindset, because at this scale, it’s not really possible for us for to have human operators that are humanly managing all of the systems, we need to have systems in place that are self managed, right? And that’s where we’re going with what we’re developing over. That’s exactly why you started Docstore, as well. And to your point about all of these microservices and how there’s a shift from moving from monolithic to microservice architecture, there’s one part of the boom for distributed databases. The other part is the amount of data that gets processed these days, right? Noone thing so Bart, no one thinks about how much data they want to store because the storage is kind of cheap, right. So that’s another reason why we invested in Docstore, we wanted to build a system that can still scale with the application that can work on small data set medium data set to very large data set, and prevent having the user to have to migrate from one offering to another right, which is, and legislate them focus on whatever their business logic is.

Matt Yonkovit: What’s funny is that the concept of storage is cheap. You know, it only extends so far, because when it’s 10 times cheaper, but you store 10 times more data, it costs the same. Right? I mean, because you’ve just got more, and it introduces a litany of other problems, because you actually have to process that data. And a lot of times the performance implications of having 10 petabytes versus one petabyte or one terabyte of data, it’s vastly different. And that’s something that a lot of people don’t consider, they just think we’re just going to store it forever. We’re going to keep it into whatever storage mechanism, whatever data lake that we have out there we’re just gonna keep it out there. And eventually we’ll, we’ll use it. But there’s all kinds of cascading implications. Now, you mentioned Docstore. And that’s really what I wanted to talk to you about, because both of you presented at Percona Live on this, and this was an exciting topic. I like to learn about why people are building new databases, new features. So maybe just give us a quick overview for those listening who haven’t seen the presentation, what is docstore?

Ovais Tariq:
Okay, sure. First I will go ahead and talk a little bit about it. And then I would like him to talk a little from the technical perspective as well. So Docstore has been designed as a multi model database. It provides the best of both worlds, when I say about the worlds you think about relational world and the relational world and the document world right and to provide the same level of flexibility that you would get from the adopted model. And at the same time, the same type of structured approach that you will see in relational model. When we thought about building Docstore, we would not want to enforce any restrictions in terms of what kind of data model and the application going for us because we really wanted to build a database that can serve almost all of the use cases, operational database use cases, Uber, right. Also we wanted to support a wide variety of data models, our relational model is hierarchical model or its document model is similar to MongoDB. So Docstore from so that is, that is what I would say, from a usability perspective. From a technical perspective, it’s a distributed database that’s built on top of MySQL, we use MySQL as a storage engine. And on top of that, we have a data distribution layer. That’s possible for sharding, the data set, then query processing layer and a caching layer. And all of these deals come together to form the doctor product. And I let him talk about the technical side of it.

Himank Chaudhary: Yeah, and Matt, when we started this Docstore, new things we kept in mind are like, what type of functionality that we need to support. And then we talked to multiple users. And then we figured out that, okay, this database itself has to accommodate new functionalities, like transactions. And then consistency, which is very important for our developers as well. And then we implemented raft protocol to ensure transactions are durable, across the quorum number of nodes. And then we have different layers, as always mentioned, like how to route these requests and how to provide these client drivers driver users, because in Uber, we have go and Java. So we need to ensure that we have both, and then how routing needs to happen, and then how to have secondary indexes, replication, and all the typical functionalities that we need to implement in our database.

Matt Yonkovit: Okay, and so as you were building this, I think you had to implement a lot of these different details. And you mentioned that MySQL is the underlying layer. So does that mean that there’s a proxy or a routing layer that actually handles most of the interactions with the developers and has sort of an API, and then from there, you call down to MySQL?

Himank Chaudhary: Yeah, that’s correct. So what we have done is we have implemented different layers. And the topmost layer that we have is our client that talks to our gateways. So we have our own gateways. And what we have done is we have provided a structured API to our client. And all they need is just to call the structured API. So for example, if they need to just insert the simply columns, that’s it, and then applying drivers automatically figures out that how it needs to route our gateways and gateway understand that, okay, where is the other layer, lower layer, which is responsible for running graph protocol is running, there are two those and they figured out, okay, this is the shot. And that layer talks directly to MySQL. So everything is hidden with all these layers, and users simply talk to your client.

Matt Yonkovit: Now, is the API, like a REST API, then that is just very basic, or does it accept SQL directly?

Himank Chaudhary: So right now, these are REST API’s, but we have also done the query API’s that we have is similar to what MySQL supports, but it’s still structured API’s that we support.

Matt Yonkovit: Okay. Okay. Now, so always as you were building this out and this vision formed, why not look at some of the databases that are already out there? I mean, it seems like every week, there’s a new distributed database that pops up over the course of the last several years. I mean, there’s dozens, and I mean, certainly there are some that have become very popular. What, why build your own versus some of the things that are already out there?

Ovais Tariq:
Yeah, that’s a great question. So we actually have experience building databases since 2015. So we had an early incarnation of a distributed database. Again, on top of MySQL that was called schema lesson, there are a few blogs as well on the wintering blog about it. And then Cassandra also got introduced into the mix, right? So is schema less so the append only database had more data modelling restrictions. So from a usability perspective, it was a bit hard to use, it was scalable, but not as usable. And that’s how Cassandra came into the picture. But having developed an initial argument in coordination of a distributed database, and having operated one of the largest Cassandra deployments, what we realised that like, what we realise really from for example, on Cassandra perspective that users do care about strong consistency, which is not something that Cassandra was providing. And without having a strong consistency, we saw a lot of complexities in terms of the application architecture. So let’s say there is no notion of transaction I mean, the load library transaction exists, it’s not scalable. So typically applications end up plugging in other systems. So for example, they would use ZooKeeper as a distributed lock manager and then use that to coordinate the rights to Cassandra. So this is something that was happening on many different applications that were using Cassandra. And if you look at the Uber workload, it is big because of the physical nature of it because someone is wanting to get a ride, right? And you don’t want multiple people to get the same ride, look, what kind of response we would have there, right. So the kind of consistency that is needed is really important. And that’s where we’ve found so many problems from increased complexity on the application side that we said that we need to have a system that can provide us the same level that can provide us with a strong consistency.

Matt Yonkovit: So is that a function? You know, and I’ve seen this in a lot of other companies in a lot of other infrastructures where you start off with a technology that fits a specific need, and then you start to evolve it to match other needs. And you might like it because it has certain features. And then you start to bolt on technology on top of technology. And you try and overcome the limitations, as opposed to using the right tool for that particular use case, which might be completely different. So with Cassandra you mentioned that consistency, it might be that there needs to be more reporting, there might need to be more things that it just doesn’t work. So you’re bolting on solution on top of solution until you have a Frankenstein monster. Yeah, at some point, you just need to kind of walk back and say like, okay, we chose the wrong tool, initially, based on what our needs are now.

Ovais Tariq:
Yeah, that’s exactly what I’m saying. Consistency is just one part of it. Like if you have an operational database that is responsible for processing or storing all of the transactions related to business operations, you need the day, for example, the important use cases need they make the data available for Business Report, right. So let the data flow into the data warehouse and make it available for business reporting. And that’s another area where there is integrity with Cassandra, there is no concept of having a consistent log that can be used to move the data from Cassandra to the data lake. Yes, there’s a CDC feature that was introduced a little time back, but it’s not snarkily production ready. So there were these types of challenges where Cassandra really fits in well, is that you write the data, you don’t really care about whether data is made available or not, whether it’s consistent or not. And then you get to read something that may not be consistent, right. So this is not something that we need from a business perspective. And those are the limitations that made us move away from Cassandra.

Matt Yonkovit:
So you’re really from a Docstore perspective, you want flexibility, consistency and ease of use, right, that’s what you’re looking for, is for your developers to give them a universal interface to be able to handle a wide variety of different workloads. So you’re looking for the One Ring, the One Ring to rule them all, right? You need to have all of the data kind of like flow through and have some conformity, you’re wondering for operational operational data set.

Ovais Tariq: So it might seem like it’s not possible. But if you look at how applications typically use a database, what do they need, right, they need to be able to store a large amount of data, they like action processing, and be able to query the data set as and when they need this. And they need high availability, reliability, consistency, performance, right? So it is definitely live in what we have found is definitely possible to build a database that can that can offer all of that. And that’s where Docstore comes into the picture. Now to your other question about why did we end up building something ourselves, as I mentioned earlier in our discussion that we did have a previous incarnation schema less, which did have some of the building blocks that we could use to build out a new database Docstore. Yes, they were issues from a usability perspective. And for example, he didn’t have, he didn’t have a notion of transactions, but these but it did, it still has some other building blocks that we had built for it that could be used to build out a new database. And then we had operational expertise around running, managing a MySQL based system running and managing some of the components that I mentioned, that existed in schema less. And if you build a company, if you bring a completely new system into the mix, which hasn’t been proven at a scale that we have, we have that we are right, then I would say that we are at mercy of the system. I don’t i think that that would be a very risky decision to make. And if you think about the life of a product, where do you end up spending most of the time, you end up spending most of the time running it in production, right? You spend 20% time developing it, you spend 80% time running it in production, maintaining it, understanding and debugging performance issues, fixing them, right? And if you have an expertise around a system, if you have built or if you have an, if you have built out something that or complex that you’re that you can reuse, then I would say that that is the way to go. Because 80% of the time would be best applied there.

Matt Yonkovit:
So building a database, though, or even a distributed system like this is not for the faint of heart. I mean, let’s be honest, most companies probably won’t be able to pull it off because they don’t have the engineering background and maybe the details that you guys have at Uber, they might not even have the workload. And it’s an expensive endeavour in terms of time, resources and knowledge. And it’s potentially fraught with challenges that you have to overcome. So, Himank, as you build this out, as you’ve been designing this, what are some of the challenges that you had to overcome that you found that this was a really difficult thing? And we had to overcome this and we’re happy that we did, but it was tough.

Himank Chaudhary: Yeah, I think one of the challenges that we had was supporting transactions. It sounds simple. But when it comes to production, when you have to scale it, there, we see a lot of challenges. And there, we end up spending a lot of time. But I think it was one of the best decisions, because now we are solving it on the database layer, and application, don’t care about it. But earlier, what was happening is every application developer had to implement it in their application side. So transactions, definitely one of the features that we found earlier, we don’t ever really been given. But scaling, it looks challenging, because they will not have fun on cases that we have to deal with. And the other part of that would be to teach our developers how to use it in the production and how not to misuse it. Because when you expose this functionality, chances are that people may misuse it. And similarly, we are, and I think we will, can wait towards the end of the talk. But we are also implementing to PC. And there also, there are a lot of challenges that we have to figure out a lot of corner cases that we have to figure out. And these are some of the challenges that we faced while we were developing Docstore.

Matt Yonkovit: So from a transaction perspective, when we talk about that challenge you’re talking about, you’ve got this distributed system, you’ve got shards, and you might have a transaction that crosses a dozen different shards. How do you ensure that consistency across those dozen shots? Yeah, he’s that kind of. Okay. Yeah. And that is a challenge.

Himank Chaudhary:
Yeah. And then on top of that, we have maps, we also need to understand the new sense of it, okay, what will happen if you have to note going down, or what will happen in the corner cases when the leader is not up and these type of things, again, won’t come during the development, but when we were scaling it, when we had traffic on the instances, then we have to figure out all those things in the production.

Matt Yonkovit: Okay, okay. And I mean, that’s a big challenge. I know that in a distributed system, that’s a key area for a lot of different people. Now, you might use something that’s interesting that most people don’t think about, which is the developer education, whether you’re choosing a new database that you built, or a new technology that you’re implementing, it’s those pesky developers who often cause a lot of those issues. And it’s interesting, because I’ve seen this where people don’t necessarily understand the technology, so it’s very easy for them to misuse it. Right. So I like the Spider Man phrase: with great power comes great responsibility. And so you’ve developed this system that has this immense amount of power. How are you going about training them and making sure that they do things that aren’t going to hurt the system? I mean, are there checks and balances in place?

Ovais Tariq: By not giving them the power.

Matt Yonkovit: Okay, explain.

Ovais Tariq: I mean, yeah, so, while Docstore is featureful, we have restrictions in place, which prevents the developers from using bad design patterns, right. So, think of it this way, like, what are the different types of bad design patterns can we see? Maybe someone choosing a a not that selective field as a key right. So for example, choosing gender or a city name as a key right there, they are not my distinct values and what it means is that the data is going to be not well charted right. Then the other examples of for example, people using transactions the wrong way, for example, keeping the transaction open for too long, doing some other remote operations and then committing and causing a huge amount of lock contention, right. So we have a restriction in place, or storing very large blobs into the database, which is another anti pattern. So we have restrictions in place that can prevent these types of anti-patterns. And that’s actually one thing that I wanted to add when I was talking earlier is that, whether you are using, whether you’re building your own database, or you’re using something open source that already exists, right, you need to have an abstraction layer in front of it, you cannot simply expose the full feature set of that particular functionality. Because generally, if you look in a database, it is developed for a wide variety of use cases. So it has a lot of feature set. Not all features are supposed to be scalable, right? So if you just give people the full power, as you mentioned full power to do anything, then they are going to miss us, we cannot expect the developers to know the nitty gritty of each and every feature of the database, right? And that’s where you need to have an abstraction layer in place, which not only abstracts away the database, but also restricts the amount of functionality that is available. And the less obscure functionality that is used, the easier it is for other developers who might end up working on the same codebase. So that’s where I think the way we are handling it is not just by teaching Yes, teaching, for sure is important. But by also restricting the API, thinking about the anti-patterns, and especially taking our experience and taking our experience and in building our protections in the system. So that these anti-patterns don’t get applied, right, preventing hot shards, things like that. And that’s where we feel like Docstore excels as compared to some of the other systems as well, in that it really makes sure that even if developers do not have full knowledge of the system, it is not going to learn it’s not easy to make it misbehave.

Matt Yonkovit: So narrow the feature set to what is really needed, keep it simple and straightforward. and eliminate those edge cases, from you know, being exposed to everyone until they’re needed. And then you can work on how to implement them best. Okay, that makes sense.

Now I know you, you both are on this, this team is responsible for all of the management of data from an operational standpoint and Uber. So you’re talking petabytes of data, potentially here, right? So just outside of even Docstore. I always like to hear what are you seeing with the challenge of managing an environment that size? Like, it’s a massive environment? And what sort of problems do you see crop up more often than you’d like? What are the recommendations for people who have these scalable environments that are Uber scale yet, but maybe they will be in a couple years? What are the things that they should look out for?

Ovais Tariq: Most of the problems that we see are most of the time spent on operational issues. Right. So the first thing that I would really, like to recommend is to focus on platform automation, make sure that everything is fully automated. Because as you scale, it’s like, as I mentioned earlier in the discussion, it’s not really possible to throw humans at the problem, right? So. So think, think of things like, how do I scale out? How do I add new nodes? How did the election happen? How do I provision these databases? How do users interact and as much self service as the developers can do themselves, the better it is for the system so that the team can focus more on maintaining and developing the system and taking it forward. Right. So the automation part is really the key. The other part is don’t make the other part think about it don’t make the database, a dump, dumpster, right. So make sure that developers understand what it is and what kind of data they are storing. And that’s one of the key features of Docstore for schema enforcement. So as soon as you put it, as soon as you infer the schema or force the developers to think about the schemas, they start thinking about the relevance of the data and what they are going to be storing in it, right? And then that is kind of one way to make sure that database just doesn’t get a being a dump of all kinds of different types of data.

Matt Yonkovit: Database design matters. And I think this is where it’s interesting as we look at all these automation tools, whether it’s in the cloud, whether it’s you know, systems, you’re building yourself, whether it’s trying to run you know, via Kubernetes whatever the automation and the operations take you so far, but from a database design perspective, a crappy application, a crappy design system is still going to perform poorly, no matter how much awesome infrastructure you put behind it. And I think that’s a key lesson that a lot of people don’t take to heart. And I think that causes more issues than the infrastructure does in a lot of cases.

Ovais Tariq:
Yeah, yeah. Yeah, for sure. And the other thing I would say is, it’s necessary to have the right abstractions in place. Make sure that you do not even if you’re using an OSS system, make sure that you’re not expose the, the internal API’s of the system, because in some part of your journey, like we have a journey of migration as well, you will realise that what you have doesn’t really work doesn’t really scale with you. And especially in a microservices world where there is no monolithic application, there are 1000s of applications out there, it’s really challenging to move people from one technology to another. So thinking about a right abstraction place, actually makes it easy in the future to evolve the system, right and like seamlessly migrate to another technology and those stories in that part also is really important. The final piece I would add is around efficiency. I think that is the biggest challenge as you scale. As you grow bigger and bigger, you can’t just throw hardware at the problem, you need to think about options, you need to think about how data is getting stored. And especially today, in our world is a lot of requirement around there are a lot of regulations which make which make it that you need to make sure that the data is properly protected, you need to have history of for example, all the changes, let’s say on a user account, or you need to show a large amount of data. And that’s where from the get go think about tearing, think about how you can bisect your data and to what is hard, what is frequently used versus something that needs to be kept around for regulatory reasons. There’s not access that frequently thinking about these things right from the get go. would make it easy as you draw as companies transition from from like a small scale company to let’s say, Uber scale company.

Matt Yonkovit: Great. Well, Himank and Ovais, thank you for showing up and talking to me a little bit about this. I know that you have a Percona Live session floating around there. I believe it’s out on YouTube, if you’re interested in checking out more details on docstore should be available out there for free. If you’d like I know, a vise you’re always hiring. So if you know if you’re interested in working on Uber scale problems advice is always hiring DBAs and engineers to help them on the storage team and would probably welcome some awesome engineers to the team.

Ovais Tariq:
Thank you so much, Matt, for mentioning that. And bringing that up, we definitely are always looking for engineers with database experience, both from a management perspective as well development perspective. And Uber is a great place to work. And like working with him on whoever comes in, we learn a lot on the distributed database side as well. So I would say this is a huge opportunity. As you mentioned earlier, not every company gets to build a database. So I would say come join this journey and help us build Docstore forward.

Matt Yonkovit: All right. Great. Thank you both. I appreciate the time that she spent today. And we look forward to seeing how Docstore grows and evolves over the next few years. Thank you, man.

Ovais Tariq:
Thank you so much, Matt, for having me here. And it was great chatting with you. I look forward to seeing you in person as well and one of the conferences soon.

Matt Yonkovit: Wow, what a great episode that was! We really appreciate you coming and checking it out. We hope that you love open source as much as we do. If you like this video, go ahead and subscribe to us on the YouTube channel. Follow us on Facebook, Twitter, Instagram and LinkedIn. And of course tune into next week’s episode. We really appreciate you coming and talking open source with us. ∎

Uber Technologies, Inc and Uber’s Custom Built Database DocStore - Percona Podcast 36

YouTube

Ovais Tariq

Himank Chaudhary

Matt Yonkovit

Transcript

Did you like this post? Why not read more?

All Things Pulsar, Cassandra and DataStax - Percona Database Podcast 78 /w Patrick McFadin

Data Collection, Download Metrics, and Scarf - Percona Database Podcast 77 /w Avi Press

All things Open Source Database Advocacy and AlloyDB - Percona Database Podcast 76 /w Gabe Weiss