PostgreSQL Designing a Schema - Percona Community PostgreSQL Live Stream & Chat - Sept, 8th
Percona Community Live Stream for PostgreSQL focused on Designing a Schema. This is a part of our bi-weekly meetup to develop open-source database skills.
Dave Stokes: Good morning, good evening. Good everything to everybody. I’m Dave Stokes. I’m a Technology Evangelist at Percona. And with me is Charly Batista, one of our gurus on Postgres. And today, we’re talking about designing a schema. For those of you who don’t know, a schema is basically the template of how the database works. And with that, I’ll hand it over to Charly. And if you have any questions, please put them on the chat channel for YouTube, I’m watching it over here or on Twitch. I didn’t get enough sleep last night. So I’m trying to remember our Twitch or live stream there. And I’ll answer what we can get them to Charlie, and hopefully, you all are having a great time. So please take it away, Charlie.
Hey. Well, it’s glad to be here with you, guys, again. So last time we talked I was on the PGConf from Brazil. Right. So that was a great experience. That was a great conference. We didn’t talk much about technical stuff last time. So we try to do more today. We try to adjust my camera a little bit. I think it’s if it’s bad enough. Today, we are talking about data modeling. And what are the impacts that you can have on database performance? Right. So one thing that we usually when we go talk to the customers, or some, they are building a new database, or they’re having performance issues, they always we always try to go for database optimizations to share the configurations and hacker and do fancy stuff, right, put like more memory, more CPU, let’s make it scalable horizontally things. And we usually never think or never torch on the database model, right? And one thing that we will see today is can make a huge difference, we can, we can have a huge performance impact. And we can save actually can save a lot of money because we can improve performance, without adding more hunger, without expanding things without making things fancy. You’re just one trick here. And we’ll see that data modeling has a lot to do with those things. Right. So before we really go to data modeling, one thing that I want to talk about, too, with you guys, is how memory is organized inside of Postgres, and how Postgres actually organize the files internally, because it can help improve a lot when we talk about performance, can you see my screen, David? Are you able to see it? Okay, so here, what we can see, basically, is how the memory architecture works inside our computer, right? So when we go for the computer, we have basically two things that the computer does it. It does computation by, for example, add number, subtract number, multiply things. That’s basically the only two things a computer does. Like all the fancy things we see in the computer, there are basically key operations, IO operations or computation. That’s the CPU. Right? And when we talk about IO, there are many types of IO. And as you see here, on diagram, we can have even for the old man magnetic tapes, probably most of the people saw one, but still a lot of companies that use that, especially for backup, right? So it’s still been there. We have the old school magnetic disks, like the hard driver. So the spinning ones, right? Nowadays, most of the companies are replacing them for solid state disk - SSD. And then we go for the main memory, this is what we really call “the memory”. So when you ask him to want to do a free that, and you see that amount of memory, this is the one that we’re talking about, we have the cache memory. Cache memory, it’s a piece of memory that is stored inside of the CPU. And then you have the register, there are very tiny, more piece of memory that is inside of the Docker CPU, but it’s really close to the CPU. So one thing that you need to keep in mind is the up here on this triangle, the faster the memory is, but the more expensive is as well. So as we got it down on the base of this triangle, or the memory gets really cheap, but it’s get quite this low. Like for example, a hard drive, it’s a megabyte of a hard drive, if we compare have a megabyte of cache memory, they’re very inexpensive. But it’s super, super simple, right? So this is one thing that you need to keep in mind, because most of the data that we already storage, they are down here, right? Not magnetic disk, and more, but most now are solid. But it’s still a lot cheaper than when we go for cache memory, for example. That’s why we store them. Another thing is the memory, the data that is stored on those disks, we can proceed that. So we can restart the computer, and the data will still be there. That’s not the same as happened or with main memory and cache memory, neither the register. If they lose power, we lose that data. Whatever is in memory from memory app, whatever is in there, we’re going to lose that memory. Right? Let’s keep that in mind. So another thing that we need to understand is the CPU is not able to directly access the magnetic or the SSD. When I say it’s not evil, there is no physical connection, there is no cable from the CPU to those devices. So we need a helper from the controller to call IO controller that you usually access it by the main memory controller. So when we say that the CPU can only work with data in the main memory, that’s one of the reasons. So because the CPU has not really direct access to our hard drive, it needs the data to be copied from the hard drive to put in the main memory, right? So, every time that you need to open a file to read the file to save a file, whatever operations with a file that we do, on the disk, it needs to be sent to the memory, and then flush it from the memory to the IO needs to be copied from the disk to put in the main memory, and then the CPU can work with the data in the main memory, right? So then all the data that we work that the CPU works, it needs, effectively be inside the main memory, right? So that’s another thing. And not only because memory is faster, actually. The CPU uses the main memory as the storage, the data, that’s if you really work. But if it’s been caught into the cache, and then to the register, and accessing the bank access from the register to the main memory is also a huge difference is a lot faster if we can keep the data closer to the CPU, especially in the cache and main memory, right. That’s why cache is so important because it’s made everything a lot faster. And because of cache is so important, we’ve been designing data structures to keep more and more data closer to the CPU, or to use the data more efficiently. Right. So those strategies we call locality. So the more we can keep the data closer to the CPU, the better because when the CPU needs to access that data, that data is closer and further into cache memory or maybe in the register. So that’s why we’ve been like in Computer Science, we work a lot to try to build data structures and algorithms that can keep the data close to the CPU. And all of those things, they influence how we build our files and how we access our files. Another discussion is the way that we access the files, it’s been changed a lot, especially now that we, most of the company, or a lot of the company have been using SSD. This access method is, it’s not so problematic anymore. Because one thing that we use it to happen that we still have happened is how we access the file, we can access the information inside a file in a sequential fashion, Just like we see in this diagram here. So it’s like one byte or one block after another block. So if we have a huge pipe, we got to put it in a certain sequence, a huge chunk of data of bytes inside there. And then the CPU can just access this sequentially. Or we can have access in random fashion. So why would we access in a random fashion? So remember, that when we use it to have the spinning disks, so the user has, I don’t have a picture here. So it used to have plates, literally a weight and a hand. So, the plate was spinning and then we would move from one point to another. So and it was a lot faster if we keep writing data, not trying to wait for it to spin and then write in different parts, right? So, but then to get this data back, it was super slow, because they were just randomly put on the point. And that’s why the name was random access. So and if we could avoid random access, and okay, if we could have early sequential access, that was a lot faster. But then we get a lot of difficulties like right, for example, on the database, we reserved 1000 rolls on a table now. So we have, that match of roles, right? And then we need to serve more roles in another table, and then a lot of tables. So we end up interleaving the spaces on the disk with data from different tables, right, because we cannot tell in advance how much space is really needed for a full table as a database grows and shrink. So those tables can grow, and we can insert and update or delete things. So, it’s really hard to predict how we’re going to use that space. And that’s why random access is so effective. Because we just put the same portion of data in, let’s say, five blocks. And we use that five blocks, the next five blocks or 10, blocks, whatever works, we put data from another table, and then after that one with another table. So when we do a select, now the CPU needs to collect all those dispersed amounts of blocks, put them together and show us as if it was a unique single file. So actually, in reality, most of the files that we have, they are not unique, saved inside our days, they have those random or mounted pieces randomly distributed across the disk, right? So for hard drive or the spinning disk, that was a huge problem because well the problem was not even the spinning thing they did replace the problem was the head because it was mechanical needed to move allies mechanical IDs and keep the consistency it was super small. And that’s why if we had a random access file that was horrible performance was really cool. Right. With that in mind, we started to design and as I said has the files or the data structures. So when it goes to the database, we basically have mainly two types or two ways of storing data inside a file. We have a heap file, that’s what Postgres uses a heap file is just a, file with A sequential file, there is not really much of organization. Inside the file, you just put in the data there; of course, we have the headers to tell us in advance what is inside each block. Because each piece of the file is divided in one, that’s something that actually is kept. But it’s important how to do that we’ll come back here. So remember that the CPU, everything in the computer, is about IO. And computing. So when it comes to IO, the CPU can doesn’t get one byte per file. So even if we ask for one byte, let’s say we have a disk. Here, we have this file. It’s a sequential file, right? It will ask for, we need to read like one byte, the first byte here of this, this this this file, what happens is the CPU, you get the whole block, the file system will get the whole block, and we give it back to the CPU. So in the file system mode, most of the file systems, have block sizes of four kilobytes. So even if we want to read one single byte, we’re going to get four kilobytes inside of memory from the disk inside of memory. And within the CPU, we will show us just that 111-byte picture region, right? So and if another problem is reading, that is small pieces of bytes, let’s say I need to. Office space in memory, because we have the data. So dispersal is so sparsely denoted here inside our file, right? It’s horrible, for performance. And for everything, because we read a lot of things, we’re going to have a lot of aisles just to collect 2050 bytes, right? So that’s one thing, if we can keep the information close to each other, we, I need to read like only one by one block, for example, I can do only one IO operation, that is four kilobytes in size, the IO operation that we’re talking about here is most of the operational systems, right. So because of that, the heap file is very efficient to stall. And to read sequential data, because there is no order, we just put the data there. Let’s say we haven’t been able to start saving that data inside the tables. You just put one after another. So we feel that the first block, that’s fine, gets a lot of blocks, and then another block, and then another one, right, so it’s really simple and easy. The other way to do is to organize the file as a more complicated data structure, like a B tree, right? For example, MySQL, its exports, and its own data read organize it in as a B tree. So the advantage is, if we’re going to search, for example, using a primary key, that’s the key that organizes the B tree. So it’s going to be really fast because it will traverse the tree using the key. So it’s a really fast operation. If you need to do a full table scan, it’s a horrible operation, because of the way the tree is organized. So we have one block here, another block far away and another block even further. So no need to leave those. And we just change the access pattern from sequential to random access pattern, just because of the way how we organized the file internally, right? And that’s why Postgres is so efficient when doing a full table scan, because it’s mostly sequential reading inside of tables, right, so when does a table scan? And it’s understanding this point, some can explain a lot of it can explain why a lot of times Postgres, even having an index, it prefers to go for a table scan, because it’s cheaper. If it needs to go to the index, then it’s changing the access pattern from sequential. It’s not having these nice access patterns here anymore to random because it needs to go to the indexes. And remember the indexes. It’s organized it in A very random way, physically, and internally on the disk. So it changed. Now, instead of doing just a sequential, mostly sequential scan on the table, because of the way that the file is organized, it was good to do mostly random access because it needs to go to the index. And then from the index come back to gather the data from a table; it’s a lot more expensive to get the data from an index than to get the data from the table on Postgres. And then why is an index can save a lot of time? Well, our index file is a lot smaller. And if we can do a full only index scan, we can save a lot of things, because we’re gonna traverse using the key on a file that is a lot smaller. Sometimes we have one terabyte data file, and 100 gigabytes index file, so it’s a lot faster, right. So the effectiveness of the index is very important to make the database choose the index instead of beautiful tables. Right. So as I said, posts with the files are split in page, Postgres, by the fool uses a block of eight kilobytes here, I gonna say blocks and pages with the same meaning. In OS system, it’s very common to pipe to say blocks, and on the database is more common sight beeps. But effectively, they’re about the same idea, the same thing. So it’s just a block of bytes that we have inside our file. And in Postgres, the page size, by default, is have eight kilobytes. So it means that all the data that you need to store inside a role needs to be at max, eight kilobytes, actually, half of that library, because Postgres doesn’t split the data across pages on the database. So if we have one row, a huge role, it will not treat split that row inside many, many pages, files, files. What Postgres does in that case, is to use what they call post. So it’s external storage, a modified creates another file, it keeps a puts a link here inside the page, and then goes to find to save the data on this external file. This external file is also divided into pages, but then it supports to use of multiple pages cue to start here to the right. So if we have a role, that row is larger than if I’m not mistaken, half of the page size, it will be put into post. So all that’s how postscript works with large amounts of data. Also, the page that the files from for the table files, they cannot be larger than one gigabyte. It doesn’t mean that the table cannot be larger than one gigabyte, but whatever, if the table grows larger than one gigabyte, it will have multiple files on the file system, right? So if I have 100 gigabytes, I gonna probably have 100 files, 120 files, to to split how many files we have inside of the file system for that single table. And it helps when we need to access data, like let’s say, for example, we need to have an index, and we need to access 20 rolls, we don’t need to open a one-gigabyte file to access and do a seek inside the file. To access that role. We open a simple single one gigabyte file to get whatever we need for that, right? So it helps in the sense to make the access even faster. Okay, but how does this page thing work inside the database? So the page, the one that holds the data, it has one header. The header has all the information of that page, how many rows we have there. If those are how much free space we have and all of those things we have A lot of information on that page header, right? But what really what is important for us here actually is how the data the topo is organized inside the page. So, remember that I said that the data could not be read in in single bytes but in blocks. So that is non information that’s also important. The CPU tries to work with data aligned what they call word size, or the pointer size of the CPU. So in, in 64 bit CPU, that will size going to be six, four bits, in a 32 bit CPU, the word size will be 32 bits. So all four bytes and eight bytes, depending on the other CPU architecture. So what does that mean? It means if we can keep our data in chunks that is multiple of the word size, let’s say four bytes, or eight bytes, depending on our CPU architecture, the CPU will work a lot faster because it doesn’t need to split the data. So if we have one information that, for example, that has 1212 bytes, in a very simplistic way, for the CPU to read that 12 bytes, it needs two CPU clocks, it’s very simplistic explanation, these two CPU clocks, because every clocks, it will handle one one word size, right? So if we have 16, that’s the dub of the size of the discipline Ward, it’s also the two clocks. So in this case, having 12 We’re actually wasting a space that could be working off the CPU, if we could make our information to be of eight bytes. We could in this example, have one CPU clock to process that information. Or we could process the data or the information in 16 CPU 16 bytes, right? So and having that in mind, most of the applications that we try to keep their data structure aligned with the world size, and post miss, do the same. Post, try to save the data internally aligned with the world sites. So, but how does that work? Let me just sure, on all these green here. Let me change this one. And now I’m going to share my entire screen. So I suppose you guys can see my cursor, right? So how does that work? So let’s say I have I created this this this data model here, it’s a very simple one. Where I have three tables, I have I just unjust more than AQ. Right, I have the table that is the queue itself. So it only has the idea of the queue, a description for the queue. And when he was created, I have one table for items of the queue. So this skill can have 01 or infinite items. Right. Each item has a time. So, I have an item type here, very, very simple, really, really simple thing. The item of the cube has its own ID of, obviously, it has the ID of the pipe. It has the ID of the cube. It has a Boolean type here to say if it’s active or not. It has the ID of the parent because we can have one item that can split in multiple sub-items and be processed later. So we need to know who is the parent of that item. It has the value. It’s a marker. For us, the size of the marker is really important. Much and it has the date when it was created. So very simple, right? I go on To create this guy’s first drawn, I want to create the queue is created, I got to create the item. And then I got to create the item itself. And then let’s insert a few rows here I got a 13,000 on The Cube. So 10,000 kills the type 20 is more than enough. So I got inserted 1000 items from the queue here without parent. So just because I am lazy, I want like, this is the easiest way to to put them and then those here, I gotta insert 3 million, 2 million, 3 million items here with parents and the parents of just a random random that. So it will probably take a few seconds, maybe a minute or so. But okay, one thing that I want you to call your attention here, why this is inserting is how we model this this table. Right? So the integer on pulses is an integer of four bytes. Right? So it here is source a four byte information. Then I use it another one to a small integer for the type here that has two bytes. And another integer of four bytes. A boolean, it stores in one byte, and another integer of four bytes. So but what it has to do with what we’re talking here, so the database, we will try to align this data here and save this data aligned with multiple of four bytes. And multiple of eight bytes. Right, which in my case is the word size, it’s much more of eight bytes. So from here, in eight bytes, I can store two row and two columns here. And then I gonna waste two bytes. And then I got to store more five bytes. But it cannot display this full eight, because five plus four got to be nine. So this four need to be in another one. So then I gonna waste three bytes here. So I have a parent ID zero, okay. Let’s fix this, just to make sure. Okay, so just because the way I organize it, the rows, the columns here on my table, I’m wasting space on my table. And we’ll see soon, when this is finished here, we’ll see the file size. What is the file size that we have here? But while we wait, what are you going to do a got to reorganize this table here in a different fashion. Right? Instead I got to get everything that is off, four bytes put together. So four bytes, four bytes. And I got to put these two bytes here and then leave the vodka because while as it’s it can change the size it’s we cannot really predict how we’re going to be the final size. So that’s ideal to leave them. There. Everything that is not fixed size at the end of the table, right? But I’m not going to create this table. What I’m going to do is I’m going to I’m going to create but as a select item to select these columns because well we want to have the same data right before comparing we need to compare them Positive apples. So then this the this way, we’ll have exactly the same data from you’re still populating here. So but we already have our selection, and in a second who have here okay, while it’s great in here, so another thing, but okay, yeah, it’s done for I go to the other thing, let me just show where’s my directory, this here and this here. Okay, so let’s create a new terminal we need to go to this is the beta directory, this is my database course inside this Okay, and here, we must have this table. So let’s see the table size, the set the table size this table it’s off 290 megabytes, not a huge table, it’s fine, right? So it’s okay. We probably can probably run select Speed fast here. Let’s see, oops, I need to go to the audit terminal. Let’s see what happens with this tiny change here; look that we didn’t change the data pipes. We didn’t change anything. We just change how the data is being distributed. Right? So let me go have received the name of the table is Q two. Let’s get its own ID. Okay, this is the way to go here, only changing the order of a couple of roles here. We change the size here we’re saving 13 megabytes, or 23 megabytes actually, and saving 23 megabytes, which is around roughly 10% About seven 8% of the stable sites. And in this example here, my table only has seven columns, 1234567 columns, then I only change the order or even change anything else. But it is exactly the same. The only thing that I changed was the order of the columns inside this just to make them to use the alignment of what we have inside. So it makes us save in this example here around 10% disk space. Well, the disk is cheaper, right? So save 10%. Let’s say we save 10% cost on the space. But remember everything that is in this when you need to do a select it goes to memory, from memory from the main memory, it needs to go to cash from cash, it’s being processed by the CPU. The cost and performance gain, they can be really, really significant. So and we’re talking about a small table that has seven roles. So a lot of systems out there. They have 1000s of tables on each table. They have hundreds of columns. What would be the impact in a huge city Some UI changes the order of the rows inside the table. And this is what we want to say that understanding how the data works inside the database to make a good, tomato is important. And not only and this is example is one of the examples, they were just talking about the physical model. So love was physical. So but what about when you go for the logic or what about designing a system to be more efficient? Let me give you another example. So I from time to time, I post some challenges on my LinkedIn. And one that I posted a few months ago was this one here, so I gave two tables. I gave one table name, matter, and another table matter object. Right. So the table here has basically its ID, its parent, its name. And the challenge was to write a select. To show this report. Where I have the math object, every meta-object can have children, can have more children or grandchildren. And they can have more and more and more if we pay attention here. It’s we have a recursion here. It’s something similar to all we have here, we have the same idea, we have the queue. So the queue has items and the height the item can have as a child, the child can have another child and another child and another child. So the idea is worth is exactly this as we just called it a different thing. So, if when working on the application, signed, let’s see, we want to build an application to show this report. Right? So, I want to show how many cubes we have every queue how what the departments are and then I want to show the information. So what are the descriptions of those items, or the value of those items, and how many times those items being executed? Right. So look at keep in mind that this is here is a son with we doing activities account, notice I’m sorry, it’s account right counting how many times they’ve been executed. Right. So looking from the application perspective, what we need to do is we need to go and do a select on the map, right. So we do a select here. And to get the parents from the parents, we do a select to get the children from the children. For each of the children, we do an order select to get the grandchild, the grandchildren. And from each of the grandchild, we need to do a lot of selects. So here that we have 20 T items, we need to do once a lakh for each. So we need to at least have 20 laps inside the database to solve this problem from the application perspective, so we need to open 20 connections or reuse the 20 connections are going to be super is low because you need to go 20 times to the database and do once a lot. It’s very inefficient to solve this problem from the application side. So then it’s better to solve this problem from the database side. Alright, but how can we do recursion on the database? Well, we could have changed the data model to make something simpler, but how can we solve this problem? Well, here we can do this amazing select here. We can do a CT right? We can do a CT. Using the CTE we can get that data model that now we clean it all that wasted space and everything to instead of going 20 times to the database to open 20 selects in the database, we do one single selection. And this is a lot of the scope of, of the talk today talking about optimization. So that probably comes in, we’re going to talk about, we’re gonna talk about this later in another call. But just to as we, I mentioned, this is exercise here, when I created the exercise, so my answer was run in less than a second, I got some replies, they’ve been less than a second, for example, this one here. Actually, this was the one that I created. So, and it runs in 760 milliseconds. Just to give you context, I wrote an application in PHP to build this report here. And to get everything, it took around 2.7 seconds. From the application side, 2.7 seconds here on doing from the database side, we regarded down from 2.7 to 700 milliseconds. And that was a guy that simply required, that does the first thing. And so if these 180 milliseconds, if we go to the count here, so 2.7 divided by zero point 10, it’s 15 times faster than doing from the application side, it’s just 5015 times faster, how much money you need to put in memory CPU. To make our database works 15 times faster. So that’s probably a lot of money. And this is what we when we say that when we work with a good data model like we can save 1015 20, whatever the main font of disk space, all of those things go to memory. And then whenever you use the right access time, or access methods, not five but access method to gather information from our database, we can improve the performance and save a lot of money, we can let we can see our database growing healthier, and not losing performance. We can still keep doing by can have the database still doing the same, the same kind of workload, the same performance with a different type of workload. That’s what I meant. And that’s what we prepared for you today, guys. I hope you enjoy. So Dave, we got questions.
We had one that I think you answered. How many blocks can we read in one i o operation? So was it k one 8k block?
I cannot hear you. I think my Internet has some kind of issues here.
Now can you hear me see? Showing up here? Let me make myself louder. So that does it.
Yeah, I really cannot hear you. I hope you can hear me.
I can hear you.
I see you. You move in your mouth. But I think it’s my internet here that. Yeah. Okay. So how many blocks can be read in one IO operation? That’s a really good question. This is architectural dependent. And the book size is also architectural dependent, and I’ve seen a lot of vendors, they’ve been increasing the block size from their SSDs. So historically, we could read one block array operation, it was fixed at four kilobytes block size from it is right. But it’s been changing, and now I can not really tell you if it’s still the same is still one block pair the IO operation. I am trying to see if I remember from anything, but yeah, I don’t I really don’t remember. But I think like, mostly all the calculations that I do for performance, I always consider one block for IO operation. So I try to put the more things I can put inside the block. So it makes things faster.
I also see another one, is it possible to increase the query speed using parallelism in Postgres? Yeah, it is. I think from started on version 10, or 11, I don’t remember the version. But we can have multiple processes working on the same table. So and there is a power meter that we can enable and disable. And I believe it comes enabled by the foot, I’m not sure. And we can tell them max number for parallel executors that we can have. So there are many configurations, the few configurations that we can make it more or less prone to, to optimize it to make it parallel. But yeah, it’s possible. So it’s possible to read through to a full table scan in a parallel fashion. So if you have a huge table, and then you have, for example, 10 executors, so that’s gonna be a lot faster than doing that full table scan, using just one single executor. That’s possible to use parallelism, and possibly already does, especially on the newer versions. So if you’re on a version 13-14, for sure it’s there. And I’m top of my mind, I don’t really remember what was the version that is being included, but I think it was 10 or 11.
It’s ancient history. Well, I think that’s all the questions we have. I don’t know if Charlie can hear me.
Well, and the other question, guys, no more questions. Thanks a lot. Okay, well, it was great to be here with you again, and help. That was a job. Let me use it for
Well, folks, have a good one, and we’ll see you next time. Have a great day!
Charly BatistaPercona, PostgreSQL Tech Lead
Charly Batista is currently PostgreSQL Tech Lead at Percona. Possesses over twelve (12) years of experience in various areas of IT including Database Administration, Data Analysis, Systems Analysis and Software Development. Strong analytical skills combined with experience in object oriented programming techniques. Technical Leader for more than four (4) years for the Developer Team. Born in Brazil and now living in Shanghai-China.
Dave StokesTechnology Evangelist at Percona
Dave joined Percona last February 2022. Dave is MySQL Community Manager and the author of MySQL & JSON - A Practical Programming Guide