Link to listen and subscribe: PodBean
Bruce Momjian, Vice President, Postgres Evangelist at EDB sat down with the HOSS Matt Yonkovit to talk through the current trends in database technology as well as to get details on the release of Postgres 14. Bruce has been around the database space for more than 30 years : He is a co-founder and core team member of the PostgreSQL Global Development Group as well as a known speaker, author, and contributor in the PostgreSQL community who has appeared at many international open-source conferences. Recently he visited Percona Live and talked to us about how to choose the best hardware for PostgreSQL in the cloud.
Bruce MomjianVice President, Postgres Evangelist at EDB
Bruce Momjian is co-founder and core team member of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996. He has been employed by EDB since 2006. He has spoken at many international open-source conferences and is the author of PostgreSQL: Introduction and Concepts, published by Addison-Wesley. Prior to his involvement with PostgreSQL, Bruce worked as a consultant, developing custom database applications for some of the world’s largest law firms. As an academic, Bruce holds a Masters in Education, was a high school computer science teacher, and lectures internationally at universities.
Matt YonkovitThe HOSS, Percona
Matt is currently working as the Head of Open Source Strategy (HOSS) for Percona, a leader in open source database software and services. He has over 15 years of experience in the open source industry including over 10 years of executive-level experience leading open source teams. Matt’s experience merges the technical and business aspects of the open source database experience with both a passion for hands on development and management and the leadership of building strong teams. During his time he has created or managed business units responsible for service delivery ( consulting, support, and managed services ), customer success, product management, marketing, and operations. He currently leads efforts around Percona’s OSPO, community, and developer relations efforts. He hosts the HOSS talks FOSS podcast, writes regularly, and shares his MySQL and PostgreSQL knowledge as often as possible.
Matt Yonkovit: Hi, everybody. Welcome to another HOSS Talks FOSS, today I’m here with Bruce Momjian from EnterpriseDB and from the Postgres community. Now, Bruce, you have been around the database space for a long time, longer than I have. I’ve been around for about 25 years in the database space. And you have several years on me. So I read on your website. And recently in one of your blogs, you mentioned 32 years.
Bruce Momjian: Yeah, yeah, I started with databases in 89 as a developer, so I was writing database applications. But pretty soon after that, I actually started trying to write my own database in C. And I kind of got to the point where you, you write your own database using SQL commands, but then there’s this recursion problem, and then I just, I gave up, because you end up writing the bootstrap code in SQL, but then you don’t have any SQL to bootstrap it. So I was just too confusing. So then I wrote an SQL engine in a shell script called shqL, which is, I think, linked to on my website. And that was my claim to fame in the SQL. Until I started with Postgres. So yeah, for some reason, that just as always resonated with me. I remember flying to Australia with my wife on vacation, and I was reading an SQL book by Graf, it is the author’s last name. And I remember just being really fascinated about how this SQL thing worked and being really excited. And my wife complained that I wasn’t talking to her. I was reading a book on the plane and stuff like that. But I said, I’m on vacation, but I want this opportunity to kind of do research on the plane. So yeah, for some reason, I’ve always been interested in how it worked. I got interested in Postgres, because I was curious how to optimise your work. How does this thing I was Informix Ingress guy at that time, of course, can’t see the code. But in Postgres, you could. So yeah, just kind of digging in there and sort of seeing how it worked. And then working with one of our early developers, Tom Lane is a core member now in kind of, dissecting the optimizer figuring out why it’s so slow. I remember some long phone calls on the weekends doing that. So it’s, yeah, it’s been a long, long time. For me, it just resonated with me. I don’t know why. The funniest thing was I was with Linux, or was it a less con probably 12 years ago, and he came up to me and said, I always thought but people couldn’t get excited about databases, like operating systems, they can get excited but databases… I didn’t think that was possible. You show I was wrong.
Matt Yonkovit: A great compliment, right?Getting excited about something that most people find boring.
Bruce Momjian: Exactly. It’s not for everybody.
Matt Yonkovit: Well, this is an interesting topic, though. Because I’ve noticed this where the, we’ve tended to think, or as a developer community, I guess his thought databases are uncool for a while, right. And so they don’t want to deal with them. Right. And so they continually kind of like, put whether it’s ORM s, or abstract layers, or try to just get out of thinking about databases as much as possible. And I think that that comment there kind of shows that a lot of developers just databases are not cool. How can you make that exciting? Now, I’m like you, I do get excited about performance things and database features, and changes so it’s, it’s weird that other people don’t to me, I don’t know if that’s the same view, you’re like, you don’t find this interesting. This is really interesting to me.
Bruce Momjian: So my wife hates, my wife took a computer class when she was in college, I think, and she hated it. So I’m kind of used to being around people who aren’t really excited about what I’m excited about, I think, I’ll give you I’ll give you a really crazy story. I was in high school or in high school. And we used to wait for the bus and I used to be a railroad line that went by New Day, where we waited for the bus. And it was early morning. So the trains are going back and forth. And the train would go by and we’d stand as a group. Hey, oh, look, there’s a train, you know. And I guess I did that every day, like, well, there’s a train. And it never occurred to me that somebody wouldn’t be interested in pointing out that there’s a train going by, like, for me, that’s exciting. Right. There’s a train. Look at that. Cool, yeah. But everyone else I remember once, we ran, and the train goes by and one of the people says, Look, there’s the train and everyone laughs And I’m like, I don’t understand why that’s funny. Like, yeah. But they were laughing at me because I think it’s interesting and nobody else does. So yeah, I have always been the oddball. I watch NHK world from Japan, I watch stuff that people come in, like, What are you watching? I’m like, Well, it’s interesting to me, you know. So I guess I’ve just been that kind of person.
Matt Yonkovit: But isn’t this where it’s interesting, because we’ve started to enter this space from a database perspective and a technology perspective where the interest is in that application development. And so we’ve done a lot to automate, to build out the technology of the databases to make it more self managing more tools, more automation. But at the same time, the developers care less and less about databases, because they don’t have to. Isn’t this creating a kind of a gap between the knowledge that people have when they need to troubleshoot or fix which honestly, still happens? And their capabilities to actually manage? And do though, so it makes people who do care more valuable?
Bruce Momjian: Yeah, I think it’s, I think it’s really a question of how you view the lifecycle of what you’re doing. Right. So if you’re just trying to get from A to B, which is what most developers are trying to do, then the database can seem to be this this input this impediment, right, because you got to create the tables, you got to define the indexes, you got to sort of setup the schema. And developers just want to get from A to B, and I just want to get that job done. And I want to see the pretty stuff on the screen. But when you think of the life of the data, that the time, it’s going to need to be used for the life of the application, yeah, then you really kind of do need to spend time at the beginning a little slower, right, to sort of set up what you’re doing. I think the big difference is that most people if you’re playing computer games, or you’re like making a spreadsheet or whatever, you just kind of like, I’m just gonna do it, and I’m gonna do you know, I’m gonna get it done for my project, I’m gonna leave. And what we find, I think, is that because databases, by definition, are static, or they’re they’re, they’re stateful, right is the term. There’s just this life of what you do that laughs far beyond the time you need it. So I it’s, it’s really the life of data for me that I think makes people scared, and people don’t understand the life of of, of that data fall into these kind of pitfalls, whether using rmws, they’re just throwing the data into NoSQL, they don’t care about what the scheme is, they don’t care about what the future application is going to need the data for. They don’t care, they just want to get the job done. But if the application wasn’t stateful, you can live that way. Right. You know, like, I tell people, if you’re texted or dies, you just restart it, right? No problem, there’s no state in the thing. But when you’re talking about something that’s gonna live for a long time, then you got to do the upfront work, I think.
Matt Yonkovit: So you think this is short term thinking? Like just because they’re like, I have a deadline to develop this to get this out the door to get it out to the users? I’m gonna do whatever I need to do to get it out there.
Bruce Momjian: I yeah, I think a lot of it is, um, like, let’s look at the ORM case, or the NoSQL case. They’re great at getting the data in one way in the applicant, how the application wants to see the data today. That’s how I’m going to put it in. And that’s how I’m going to get it out. Right. But you’re not thinking about how other applications are going to get that data? How am I going to get it for analytics? How am I going to handle cases where the application needs change, and I have to change what data I’m storing. So then over time, things become a lot more complicated. I mean, I did the classic case. For me or object databases. Now you might not have been in the space, we’re not because these are big, but they were supposed to be this sort of mapping between object languages, like Java and databases. So you were just gonna take the Java object, you’re gonna put it in, you’re going to take it out, right? We don’t hear like JSON, but the object objects, right. If you look at where we are now, we don’t see a you know, we’ll hear about educators because in those cases, the data was literally exactly the same as the application stored it. So you couldn’t really get at the data in an object database very easily, you had to really write a programme. And if you had to write a programme, the programme had to really know all the intricacies of the application to even get the data out of the database. So it was the kind of case where it sounded great. It maps right directly for object languages to the database. But because the data is stateful, and because the needs of what you want to do with that data changes, whether it’s analytics or application changes, or just other applications, then the whole object database thing kind of fell apart. So that’s why you start use our ORMS because at least that gives you a layer over the actual data, you can map the layer to change as the application changes, but the data is still in a form that can be digested and analysed by other applications, because it’s in a sort of a discrete form. That’s not tied directly to the, to the application. That’s, I think, why I think that’s why people aren’t excited about it because they’re just I feel for application programmes, to learn SQL and have to do an application where people are changing specifications all the time, and it. That’s why I think NoSQLs got popular for a while there.
Matt Yonkovit: Well, I mean, I think that we you’re seeing this whole new generation of purpose built databases, whether you’ve got graph databases, you’ve got time series databases, you’ve got the new SQL, which is kind of old SQL, but with distributed, as you’ve got several different types of databases that are starting to crop up into these companies. And really, what I see over and over again, is developers end up choosing the one that is easiest for the job. So maybe it’s the short term thinking, they put it out there. And then unfortunately, somebody has to deal with now 15 different database engines, that all have different formats, that all have these different things. So now we have the whole new data lakes, and like all of these different, like technologies to try and wrangle all these different database engines, which is an interesting space, because now we’ve gone from hey, I put everything in a centralised you know, system try and keep everything uniform to use whatever you can to get the job done. And now we’re going to create extra layers of complexity on top of extra layers of complexity to try and deal with all of the sprawl.
Yeah, I just gave a talk at Postgres Vision. I guess it was Tuesday, talking about data horizons, because with Postgres, because in the early days, data was coming into a database, either through punch cards or dumb terminals, were teletype the old way back. But then if you look at the data ingestion methods that people have now it’s web browsers, its mobile apps, it’s GIS data, GPS data, it’s telemetry, Internet of Things, social media, right, and you need analytics to come out of it, too. So the idea that I’m going to be able to use the same database for GIS is what I’m using for JSON, it’s what I’m using for full text search is I’m using for Internet of Things, right? A lot of relational systems can handle that if they can’t, then you end up with all these separate data sources. But then you’ve got to govern those data sources, you’ve got to figure out how to integrate them, you’ve got to figure out how to have the backup consistently, you’ve got to figure out how to delete the old data you don’t want anymore. So you get that Data Silo problem, which I’ve heard a number of people complain about. And one of my interesting talks is, aspects of the talk I gave was to try and say the Postgres being object relational, and having special ability, special indexing, special data types, and so forth. For GIS, for JSON, for full text search, for high ingestion. So you’ve got now gotta database, which is relational, but it knows how to has special code to handle all of these other things. And it does, a lot of the problems do go away, you don’t have as much of an integration problem. Right?
Matt Yonkovit: It’s the Swiss Army Knife approach, right? It’s to try and give you enough to get the jobs done, you need without having to break out and go into completely different angles, then try and bring them all back together at the end. Yeah, it’s where you want to pay the pain, right? You know, it might be a little bit more upfront work to actually design a schema to think about how to use Postgres to solve these problems, as opposed to just saying, Well, I’m gonna go get this off the shelf solution that is built for this thing and then later on have to figure out how to get the data back together.
And I think I had another blog entry about this where new data, really all a lot of the innovation in IT usually starts in home grown or custom discrete applications, right? So, I’m doing AI, right, okay, I’m gonna have a separate AI tool chain for that, and it’s going to be a separate thing, or I’m doing Full Text Search, that’s going to be a separate thing, or I’m going to do GIS, that’s gonna be a separate thing, right? But over time, as those as the data needs mature, you start to see the moving into the database, because then we kind of know, okay, we kind of know what Full Text Search needs, we kinda know what GIS needs, and having this separate thing flying around to store that it’s just a headache, because I got to synchronise it, I’ve got to make two calls to get the data to bring it together. I’ve learned data governance really easily between the two. So yeah, I think that’s basically the case that then over time, relational systems sort of subsume a lot of the satellite needs, because it’s a lot of times more efficient to have it in there.
Matt Yonkovit: Well, and also, probably people look at it, like, this really looks like data that should be stored in a database. And they eventually say, Oh, yeah, we should do that. That’s a good idea.
But you have to have the, you have to have special indexing, you can’t just use the tree for you know, GIS data, for example. You have to have special data types, you have to be able to store really non relational data, right? You’re bundling X, y coordinate or longitude latitude, you’re bundling a text document in a in a in a field, and you’ve got to be able to index act efficiently. And if you can’t, then you end up just trying to try to smoosh it all into some kind of relational, pure relational structure, it’s going to be a complete headache, you’re going to lose so much of your data integrity, and it’s going to be a headache, and you’re not going to really be able to get added or indexed cleanly. And then you say, Oh, I need a separate thing, right? So you’re kind of going back between do I want a separate thing to do this? Sometimes you need it sometimes. For example, even with Postgres we don’t have 100 foreign data wrappers by accident, there’s, there’s legitimate cases where you might need to use a Mongo or Cassandra with multi server or you might need to use to do but you just might have legacy data out there, you might use Internet of Things and your data ingestion is so high, that you just need a separate data store in relation to exercise, and that’s okay. But again, you can communicate with that, typically through Postgres, at least, and kind of bring that data together in a unified way.
Matt Yonkovit: Yeah, and I mean, I think as as, as the developers, like, start to look and choose these different technologies, they’re looking for the easier solutions you mentioned we talked a little bit about the short term thinking, but that’s made a lot of the vendors and a lot of the projects that are working on these databases, look at ways to make things easier, and to automate and drive towards making sure that not only is the data secure, by default, it’s available, and it’s performing. And I knew that you had posted a blog recently that talked about hey, is this drive for zero downtime, is actually making the downtime issues more complicated. And, and this is where it’s kind of interesting, as I see this kind of push for easy, this push for these individual systems to handle more and more, because developers just want to choose that thing that’s going to make their project, get out the door quicker. You know, we’ve pushed more towards that availability, and automation to keep things moving. But you had talked here about what makes things more complicated when things do go awry.
Yeah. So for some reason, I guess I’ve always been interested in risk analysis. I’m not sure why that is. But I remember being on the Usenet forum, I think it was called all risk or something comp risk something. And they used to this obviously in the 90s. And they used to post a sort of risk analysis, not only of IT stuff, but in general risk analysis of why disasters happen, why things happen. And then probably a big turning point for me was an article in The Atlantic Monthly again, it was in the 90s. About the value jet crash out of Florida and you might not remember that one, though. So it was a case where they had put the oxygen canisters that are used to give oxygen to the passengers. If they would lose pressure in the cabin, they would need to move them from one facility to another. And then they put them in the cargo hold of the plane, which is a no-no. And the plane took off and within a couple minutes it exploded and then everyone died. Um, but in the end he did some risk analysis of why that happens. What you basically had was a safety device oxygen canisters, killing everyone on the plane. And you’re thinking, well, is that really a safety device? Or is it not a safety device and the fact that you can’t really label something unequivocally, it’s a safety device or not taking place, because every safety device technically has the potential to cause some kind of harm. If we look at it, you know, the Chernobyl accident happened during the testing of the reactor low power, you look at the Three Mile Island problem, there was a safety device malfunction. So you start to look and you say, Well, okay, um, I think the recent nuclear disasters, for example, safety systems seemed like a negative, right? They’re actually causing problems, right. And that’s when I started to extrapolate that out to pretty much all actions like if there’s a, there’s another great story where somebody is in a fighter jet, pilots are obviously heavily trained. And when they see a problem in the plane, the first thing they’re taught is to do nothing. Right, which seems like crazy, right? But basically, the first thing you should do is figure out that my plane is still in the air. If planes are still in the air, then you’re probably okay. And you should basically look at what’s going on and take appropriate action. But what we don’t want you to do is to just react right away. In certain circumstances, you have to reject it, whatever. But in a lot of cases, the planes are just fine. And here are some things you can do to mitigate the problem. If you overreact, you can make the problem a lot worse. Right. Um, so in the same way with databases, I think that I have a bunch of I think there’s too few blog really talking about this particular topic. But the most recent one talks about multi master replication, which a lot of people feel is a great uptime capability system. And one system can go down, the other one keeps going, and so forth. But those systems are very complicated. A lot of times, you have more problems with a multi Master System than a single Master replica, master standby system. Because multi master systems are very complicated they scale and weird performance characteristics, one system goes down and can bring down the other one. So it’s not a panacea. In another classic case, well, for me in the IT world, our raid cards in the 90s, name, 2000s, were the big thing. And everyone had these fancy raid cards, and oh, we have memory on them. And there’s a battery. And so a lot of times the disk drive problems were caused by the raid cards, it wasn’t, a lot of times the systems went down because of the raid cards. And but the raid cards were there to keep the system up. But they’re very complicated. The software there was very complicated. They’re dealing with different drives, different manufacturers, different failure modes on the drives. So I guess, yeah, what I was kind of trying to get at is that simple, a lot of times is better. Because you can analyse a system scible system, you can figure out if something goes wrong, how to recover it. That is not so easy, as the systems get more complicated. So sometimes, you make the system complicated to try and reduce downtime. But sometimes the complexity itself can cause a lot more unplanned downtime, than you actually save it when you started.
Matt Yonkovit: Yeah, there’s definitely a balance there. I mean, especially now, when you look at like these companies in their data centres and applications that might have hundreds or 1000s of databases that they rely on. And so you’ve got not like the same, they’re all potentially unique. They might all be different technologies. And there’s been this drive to try and simplify and automate things. And what’s really interesting is if you look at a lot of major cloud provider outages, or SaaS outages over the last couple years, a lot of them are caused because someone takes something that is seemingly innocuous. And it has a bug because it wasn’t fully tested. And then they automate that across 10,000 nodes. And now instead of having a single system that might have gone down or gone through testing and you know, hey, I need to fix this one, whereas 99% of the infrastructure stays up, the whole infrastructure goes down, because that one change was cascaded through. And it’s an interesting space. Because there is a lot of that automation, which is needed to manage at scale in a lot of cases. But it’s also making things much more complicated. But I think at the same time, I noticing that and this gets back to the original point, I was mentioning about, like kind of the database developers caring about databases, or that skill level, I’ve seen the skill level kind of drop of people who understand that, even like, some of the basics around performance tuning or optimization, or database design, because these new applications, these new functionality, these new technologies kind of hide the complexity from them oh, we’re gonna spin this up, click the button, hey, look, it’s off and running. I’m done. And now, when those problems do occur they might have been fixable if you knew the systems better understood the technology, but now it takes longer, or you have to have specialised skill sets in order to fix them.
Yeah, there’s no question that things are getting more layered and more complicated. You know, that we call the containerization virtualization cloud, obviously, we’re almost guaranteed to, to add complexity to whatever you’re doing, because you’re not dealing with physical hardware anymore, you’re dealing with this, this abstraction, that’s kind of down there. And a lot of people fix it by throwing additional hardware at it. I mean, the types of things that we use to… The best example, I have another blog entry about this, but the best example is, you really don’t write a whole lot in Assembler anymore, right? Because even though it might be a little faster, and arguably, it isn’t anymore, because compilers understand that the CPU pipelining better than any human could it’s not necessary so that languages get more abstract, they get more powerful but then you can, you can go too far, I think a lot of the a lot of the Java criticism is that Java was starting to be placed into areas that it made, it really wasn’t kind of made for, it had gotten a little too far in the abstract to kind of do some of the stuff, I think, EDB when we start, I think we had a J boss, web server. And we had gotten some kind of every Sunday night, it would be rebooted or something. Some instead of fixing it, they were back in, in the mid 2000s. But you know, there were some crazy and eventually they got rid of it, but, but trying to fix that thing and diagnose what was going on. Yeah. It compared to an Apache situation it’s like, night and day, right? So, yeah, I, there’s a sense that some people fix it by throwing additional hardware at it. Sometimes, yeah, you have to say, Hey, I took a wrong turn here. I need to go back. Yeah. And, and retool what I’m doing, and where’s the right level of abstraction for people? You know, is it something like SQL, something like Python, is it something like Java or something like C like, where do you go on that continuum? And how much do you understand it, I mean, a lot of, in a lot of sense, the SQL language allows you to abstract out so much, because you just you it’s a declarative language, you just say what you want, and you let the computer figure out the optimizer figure out how to give it to you in a way where the data can be changing, you can have new indexes. And so I think a lot of people miss that idea that you can now just ask for the data and not have to worry about how it’s getting to you, and how, and I think, I think a lot of applications of ours kind of missed that. And they end up making their applications harder, because they have to join in the application if you analysis in the application, because they don’t maybe understand how to get at the data in a way that pushes the problem to someone else push the problem, the SQL engine is usually the usually the best bet if you’re dealing with lots of data, I think that’s why the SQL language is maintain that sort of pristine thing, because you’re dealing with a lot of data. The data is changing all the time. You got a lot of concurrency. These are very hard problems to solve in an application. But they are solvable in the database. And that’s why I think we’re still talking about SQL. You know what 50 years after it was originally sort of specked out.
Matt Yonkovit: Yeah. And I mean it sometimes I wonder if if you were going to start SQL again, would you do the same things that we, you know that the same same decisions, like what decisions would change?
So, when I originally started with relational databases, I use QUEL, which is the Ingress original language from Berkeley 70s. So I didn’t learn SQL until I went to Informix. And that was, that was in the mid 90s, maybe 1993-94. So I actually use the pre-SQL. I can, and in a lot of ways, SQL was a better language, it had some queries. The query language had some weirdly weird scoping, particularly for aggregates, it was just bizarre to really understand. So. Yeah, I mean, SQL is not perfect. But it’s evolved over time, and they’ve made some, they’ve made some changes, knowledge is going to always be in a morphus argument of whether we’re doing there. Um, yeah, if we were to do it again, I don’t know. I don’t know why I don’t know that they would do anything. Super significantly different. Yeah, I’m not sure what you would, what I would,
Matt Yonkovit: It’s interesting, because most of these you know, vendors do try to reinvent how to access data. But eventually they come back and then bolt on an SQL, like mechanism at the end, right. So you start off with, oh, this is all going to be accessed through this fancy API that’s going to match what developers want. And then all of a sudden, two years after that, they’re like, oh, and here’s this new SQL interface for us. So they keep on coming back, don’t they?
I got a funny story for that. So I’m a big, original Star Trek fan. I used to go to conferences in the 70s. And, yeah, there was one Star Trek episode where there’s this, like, alternate universe where there’s like a good enterprise in the mirror interview. And the good people go to the bad enterprise, and they kind of function and they figure out how to get back to the good enterprise. And when they get back to the enterprise, they’re like, what happened to the bad guys, the bad enterprise guys? And he said, they went back at the same time or whatever. And he said, he said, I was kind of confused. He said, we were over there. And we’re like, we’re wondering, like, what were they doing here, and I think it was Spock who said he said, You as good, people could act bad, but the bad people have no idea how to act good, right. And, in a lot of ways, SQL and relational systems, they can be modified pretty easily to give a very simple API to give a very clean interface to people who need a very streamlined way of getting at their data. On the other hand, it’s very hard for a simple system to add an SQL relational capability to it. So I would argue that Postgres, bringing in JSON, GIS these other new data sources happens pretty easily, we don’t really have to strain very much to do it, particularly because we’re an object relational systems that was designed to be sort of add indexing, add data types and stuff, add operators in languages. So it’s actually really easy for us to kind of bring in the simple stuff, it’s a lot harder for a simple thing to try and bring in transactions and global consistency and snapshots and, and in a language like SQL, that’s really hard. And it happens, sometimes it’s successful, sometimes they’re not. But it’s a lot harder to go the other way.
Matt Yonkovit: Yeah, it is. Now, I would be remiss not to ask, because right now, PG 14 is going to be coming out soon. And I’m curious as someone who is deep in the codebase, and deep into the community, are there certain things that you’re really excited about?
Bruce Momjian: Yeah it’s, it’s, either I’m the guy who gets to write the major release notes. So it’s usually a really, I don’t want to say painful, but painful process to read, must be 1000s of commit messages and then digest them down to something that everyone’s going to be happy, sort of under they’re going to be happy with the way I phrased everything and the order I have everything and that is attributed to the right people and this free. And so you spend 10 or 14 days, kind of every day just kind of reading and writing this thing. And then you finally put it out and everyone’s like, Oh, this is wrong, that’s wrong it just that just the most sort of the most depressing or sort of humbling experience to sort of spend all this time on on working on this thing, and then having sort of, like, Ah but once in a while you get somebody says thanks for doing it they’re like, we know, it’s really hard, and we kind of beat you up over it. But you know, we really, we really do appreciate what you’re doing. So I think it’s a good release. Surprisingly, to me, we’ve been averaging and I did blog about this as well. We’ve been averaging in the past five years, about 170-180 features in every release, this one is going to top out in the 220 to 40 range. Wow. Um, so like, it’s 226, I think the number is changing if we’re modified slightly. But that’s impressive. So it basically means that either because of COVID, or because we have more companies now funding what we’re doing, that we’ve actually kind of propelled ourselves a little faster in this release, which I think is kind of cool. So I’m kind of happy about that, you know.
Matt Yonkovit: So if you were to pick one, one feature that you think is cool, or that you’re like, that’s my favourite.
Right, so the one, the one that I think is my favourite, is something that everyone’s I think people are just gonna be like, ah, a lot of what we already know, you like to say? They’re great, yeah, you’re gonna train? Exactly, exactly. So probably the most interesting feature is the ability to clean up the indexes. So Postgres always had a problem with sort of cleaning up after ourselves, right, as you’re deleting stuff. And you know, you have to kind of fix it. Right. Um, and the nice thing about this release is it has sort of the capability of cleaning up some of the old index stuff that’s, so is this optimization vacuum process, it’s, well, that’s the thing, it isn’t vacuum, it happens. So what happens is you got to save a B-tree. And you’ve got, you’ve got a page that’s almost full, right? And normally, you split the page. And once you split the page, and you have two half full pages, right? And very rarely Can you kind of bring those back together, like once you split a page is going to probably stay split. So you kind of want to reduce the number of times that you’re splitting. So what this effectively does is it says if you are about to split a page, then check to see if you can remove some old rows in that index page before you split it.
Matt Yonkovit: Oh, okay. So it’s that we’re using space more effectively.
Yeah, it’s, but the bizarre thing is it’s kind of, it’s kind of invisible, right? So, like, nobody’s gonna see this happening. It’s just happening. Right? Um, so yeah, so you’re not gonna really see people saying, Hey this thing is different, but it is right.
Matt Yonkovit: Right, right, but I mean, those are, those are some of the most important optimizations because you just don’t understand them behind the scenes, but they make everyone’s life easier, and everyone will benefit from them.
Bruce Momjian: Yeah. And the thing is that you don’t even have to do anything. It just happens. Like it’s not like a new keyword or a new, whatever, this optimization is going to be there. So everyone, almost all users are going to benefit from it, but they’re never really going to know what’s happening. They’re just going to know that the system is better than it was before. So yeah, and there’s you know, there’s changes, improvements in indexing code or index improvements in how the optimizer handles long lists of constants. There’s improvements in how we do partitioning. If there’s a good there’s a good swath of stuff. There’s a whole lot of new monitoring stuff that I was pretty impressed. A lot of new system views. So yeah, it’s, it’s, it’s a pretty cool, it’s a pretty cool release. It’s got a lot of heft to it. You know, sometimes a couple releases, there really isn’t a lot. This one is a big one. I mean, it’s gonna have a lot in it, I think.
Matt Yonkovit: Great. Well, we look forward to seeing that. And, Bruce, I wanted to thank you for stopping by and chatting with me. I do appreciate it. I enjoyed our time today. And if you want to check out Bruce’s website, you can check out his personal website. You can also see him all over YouTube on tonnes of videos, including a couple Percona Live talks he just gave. You know, Bruce has been out there for a while. So he’s got a lot of great content. Bruce, final word.
Bruce Momjian: Oh, thank you. I always enjoy the Percona events. And it’s wonderful to catch up with you again. All right, thanks a bunch.
Matt Yonkovit: Wow, what a great episode that was. We really appreciate you coming and checking it out. We hope that you love open source as much as we do. If you like this video, go ahead and subscribe to us on the YouTube channel. Follow us on Facebook, Twitter, Instagram and LinkedIn. And of course tune in to next week’s episode. We really appreciate you coming and talking open source with us. ∎