Percona Community Live Stream for PostgreSQL focused on setting up a PostgreSQL connection pool. This is a part of our bi-weekly meetup to develop open-source database skills. Come up with your questions and get an answer straight forward to all your questions.
Mario Garcia: Hello, everyone. Welcome to our stream. I am Mario Garcia. I’m a Technical Evangelist at Percona. And today we have Charly Batista, who will talk about setting up Postgres connection pool. And we also have Edith, and Dave, who are also Technical Evangelists at Percona. Thank you for joining us, Charly.
Charly Batista: Hey, guys. Hey guys out there. I would like to apologize. We’re having some technical issues here. And also Dave here, he is having some voice issues, you know. Yeah, today we will we are going to talk about connection pooling with Postgres. Right. So it’s a good discussion, we will try to talk about understanding the reasoning why we usually need connection pool work, what are this solutions we have in the market. And how we can use them? Right. So I think that the first thing we need to understand is, are the reasons why we need the connection puller, right. So when we talk, we think about application and database, the developers and the applications they will want the more connections they could have on the database. Right? So let’s say we have a website. We are selling T-shirts on the website. Well, I would like to have a million connections per second. Right? So if we, if we could have like a million customers selling all on our website, that would be pretty good. But we need to think about those numbers. Is the application server able to provide a million connections at the same time? What do you guys think? What do you think it is? Is it easy to get a connect a application out there that can have like a million users at the same time to think that be easy? But you guys stay?
Edith Pucla: It you know, depends on the infrastructure also where our database is, right? How we are going to scale.
Charly Batista: Let’s think just about the application, let’s forget the database a little bit. And let’s think about just application, we have an application. And let’s say the database is able to scale. So from the application perspective, do you think it’s easy to get like a million users at the same time on application?
Mario Garcia: Yeah, recently, I was trying Python multiprocessing. And I can be a good example to what you’re mentioning. Because the idea with multiprocessing is that you can use all the resources available in your laptop or any other equipment you’re using for hosting your application. So I think that could be an example of that. And it really would help us to support millions of users doing operations at the same time on our pager application.
Charly Batista: That’s a good, that’s a really good example. And what do you think we want to have a million users all at the same time
Edith Pucla: Why not? Right?
Charly Batista: Because this is not what happens, right? Usually, we have, let’s say, when you go for the statistics, you have millions views per hour, million views per minute million views per second, it doesn’t mean all that million users are dead at the same time, right? So because they access what they can, they can come concurrently, they might be there. But they might not usually, from the application point of view. We don’t scale the application for millions of users simultaneously, right? We want to attend a million users a million connections per second, for example. In a second, we have a lot of, we have 1000 microseconds, right? So and this is one thing that we need to think about, usually, we don’t have, we don’t use all the resources, at the same time exactly. Say moment is at the same time, for one reason is the laws of physics, right? There was this smart guy that he said, two bodies cannot occupy the same space. So we couldn’t break this law yet. You know, it doesn’t matter how many lawyers we have, and still cannot break that mold. Right. And, this is also valid for computation. Right? When computers they started, when they started with faster CPUs, on this way, we have the illusion that we were doing a lot of processes at the same time, right? But it’s just the perception. The thing is, we humans are not able to see things that are too fast, we cannot perceive things that happen too fast. So once if you can only process one thing at a time, right? There’s a very few do 80 by 20. See in Java, or whatever language we do, once if your core can only do one thing at a time, right? So what happens is the smart people from computing computer science, what they did is okay, if once if your call, they only do one thing at a time, why don’t we slice the period of time, let’s say we get this 100 milliseconds. So we create 100. This is why it’s every slice is one millisecond, for example. So and then we give for the applications one slice, right? So let’s say you have we have this application, my website, we have one order, then it takes five milliseconds to be processed. So it will ask for the CPU for Look, I need time to finish my operation, the CPU will give one slice, you’re gonna get that slice one millisecond, here, we’ll work that and then when you finish the CPU, get that process and put to sleep, literally the processor. And then Okay, I gotta sit down here is the pin until I get like my moment, back to the CPU. And so in this example, my this process we use five is a license, right? So it goes it uses its time, it does something and then it comes back to sleep, goes back to work a bit more comes back to sleep. So after five is license it to finish its operation.
But the thing is, what happens if we have all the processes working at the same time. So my process goes and uses one millisecond, right, then it’s wait for after nine process to do another millisecond. And then more nine process to do a lot of milliseconds. So every step, it waits around 10 seconds between the steps. So in this case, it will take around 50 milliseconds to finish 10 times more, right? But for us, it’s still super fast. It looks like that all the 10 process did everything at the same time. Right? Because, well, they all want to go to finish in let’s say 100 milliseconds, all of them are finishing in 100 milliseconds, maybe a bit more. But we humans cannot perceive things in that time. Actually, is that we’ve all those 10 process, they finish at the same time. Yeah, right. But it’s just perception for the computer for the CPU that didn’t work at the same time with the CPU juicy slice at the time, and gave to every process, some amount of the time to work, those process work. And then to, we see Whoa, yeah, you know, this, if you will let ya have much processing. That’s, that’s my was, it has moved process, right? And this is the magic, that the, the operational system that works together with the CPU, how they make these two, look like, one single core of CPU is doing a lot of things simultaneously. So we have this impression that we were doing all those processes simultaneously. But actually, it’s not. Right. So this is one thing that we need to understand when we go for any mood process application, the database or the website application, whatever application, when we work with with mood process, we need to start understanding those sorts of low levels, how the CPU work, because they want the developer or the sysadmin, come to DBA and tell them, hey, look, we need 1000 connections to the database. And the database server only has, I don’t know, 100 CPUs, let’s put it really beefy database servers, 100 CPUs and a lot, right? We need a lot of money to to pay for 100 CPU, several glands. Right? So it’s really good sir. Look, you have 100 CPUs, why can I cannot I have like 1000 connections. Remember coming back, the box, the server, it cannot give you 1000 simultaneous connections. So the CPU needs to do these lines for this those connections as well. Right? So in this example, 1000 connections may not be so bad because well, if you do a simple math, we have 100 CPU cores, 1000 connections, every core going to work with 10 connections simultaneously, right? It’s not that so bad. But imagine that we have most of our select are close to the database, they’re fast. So they’re working in milliseconds, right? So and we have 1000s of them every second, that’s not a big problem, because they’re milliseconds. And because we have like this ratio, one to 10. Every call works with with 10 connections, that fast, maybe we add a few milliseconds for each connection. But what happens if we get one select that takes like, one second or more, one really slow connection. This connection is really is low, it can slow down the whole order or connections. Because this connection will be working on the server and the server and the CPU, the kernel needs to give more than one slice of time for that only single connection.
And all the other connections that were faster, now become slower. Well, if you only have one soul connection, that’s okay. You still have my nine cost work with DoD connections, right? So but eventually, you’re gonna get a small one, and your whole application start to slow down, because we have too many connections to our database. So because your database is lower, and your application is slower, what happens is you need more connections from the application point of view. Because the application cannot get the data from the database fast enough. You need more connection. And this is a dream come to you to follow. The application is really slow, you know, can we get more connection like more 1000 connections. We give them more 1000 connection now have 2000 connections. We’re going to have these that extra 1000 connections gonna make the database even slower. So the queries that before were taking few milliseconds now they are going to take double the time they will triple the time. And the application will ask for more connections because now the application is even more slower. So and then we get into this bad cycle, that application always need more connection. And the database cannot provide the information fast enough. And this is the time that we realize what happens if we put something in front of the database that will protect the database from happening. So many connections, because duplicate for the application, it doesn’t matter if we have one connection, I’ve have 1000 connections, if it can get the data faster, more, using only one connection, or not, right, so we are getting the data as fast as we can with only one connection. If we need more connection, let’s say if we need 100 connections to get the data sources, that’s fine. So this is a actually a change project, instead of looking for the database, as number of connections we look to the database is how fast I can pull the data from my database. And the number of connections really impact how fast you can pull the data from the database. So usually there is this magic number around the number of connections around twice, three times the number of CPUs that you have usually works well. When you put more connections than three, four times the number of CPUs you have, then your database start working really small, it’s more stuff. And the more connections you put the slowing the database goals, and the slower the applications. And the perception that we have is we need more connection. But actually, we need to connect, we need to remove a lot of connections. Because another thing that we need to understand is, when the CPU puts one process to sleep, let’s go back to my process that needs five is loads of time. So after my process finish the day, it’s firstly slice of time, it was granted to work, what makes it work there. When it’s finished and the CPU puts to sleep. The CPU needs to do one thing that called context switch, because remember, my application was working right? So we have it has all the data in cache memory. Because we are always worried about how is low the disk is right because we always believe the memory is a lot faster than the disk, right? So we want to put more data as we can in memory.
But the thing is, the memory is still is low. If we think about CPU, the memories, lots like really, really slow way down on the CPU. So the CPU uses memory as a big storage area, but it doesn’t really work with the data that is in memory. So the CPU goes to the memory gets all the data it needs to work put in its own cache, we have an L one and Q three caches. So the closer the caches are, the CPU faster they are, and then the CPU works with that data. So but the cache, there is more, what happens is when the CPU needs to get to more data from the memory, so it needs to release that data in the cache, it saves that data back to the memory and then get more data to the memory works. So put that data back in this is what happens when the process is stopped working there. The CPU needs to archive or safe that state. Everything that is in cache, everything that is in on CPU needs to be saved. So it saves everything there. In that allows they’re not the other processors to start working on applications start working. So these recall context switching. So it was working with my my application. So my application ran out of time, which will allow this application to work. So it needs to save everything that my application was working safe back in memory. And then we have a deeds application come to the CPU edits of the CPU needs to get all the data that it needs application eat and start working. So when it gets application runs out of time. And then my use Application come to work this if you need to do the same thing, it gets its application, save all the data, everything, all the reference. In memory models, the application stops working. So when my application finishes, or runs out of time, my application comes back. So applications need to go, the CPU needs to go to the memory. Remember everything it said before it gets the memory put here and start working again. So this switching from the process, we call the recall contacts, which in this region, it takes time, as well. Because he needs to say the stages need to get new data. So there are a lot of things if you need to do it cost CPU cycles. So and this is lost down the operations. If we have just a few processes for each CPU core, that’s okay, we will, we usually don’t, don’t see, we don’t feel that the time waste. If we have 1000s of process that needs to be switched all the time, we may spend more time in process switching the molecule the CPU may spend more time by just changing from one process to another process than working in these is when you feel the application is stalling. The queries is now look the credit supers Whoa, what happened, which is now because there are two main contacts, which you know, the database. And this is one of the first thing that we need to understand why we need a connection pooling in front of the database. So the connection point, let me share my screen here.
Where are you? So you can see my screen? Right? Yeah. So this is a basic idea that like, like, we can use a connection pool, see, we have a database here. This is actually not a base, this is a bit more complex. But let’s say here, those are the applications. And then we have the proxy in front of the database, right? The application here, the application, they all go to the proxy, and then the proxy, send few connections to the database. So we start protecting our database, right. So the first problem that we’re solving with the connection pooling or the proxy, whatever name we give to them, is to preventing the database to have too many connections at the same time. So depending on the implementation of the connection pooler, it’s a lot easier for the connection pooler to control 1000s of connections than the database. We still have the context switching problem from the connection pooling side. But the thing is, as the only thing that the connection pooling needs to do, is to receive connection sent to the database, gather data and send back to the application. So the context switching here won’t impact that much. Because this operation takes really few cycles on CPU. It’s not like if we’re going to do this doing the whole query the whole operation on the database. This is the first benefit, even though we have the context switching here on the connection pooling, it’s not that we remove them, we have them but now they are only connection pooling. And because the word that the connection pooling does is it’s a lot simpler, and it’s smaller than the work that the database, that’s the fact is not that huge. So we can work with this penalty of the contact switching here on the connection pooling. So that’s the first thing. The second thing is it’s expensive to the database to open a new connection. So when some applications, they some language is not application, some language and some drivers, they implement what they call persistent connection to the database. So they open a connection to the database here is like they open a connection to the database. And they keep that connection open. They never close. Because closing an opening connection to the database every time that you need something for the database is very expensive. So in the case of Postgres that was used process every connection to purchases one process is a bit more expensive than one We’re implementing of database, because well, we need to fork a new process. When a new connection is created, it’s an expensive operation inside of the kernel of the operational system. It’s not cheap. So this new process needs to allocate memory. And nice allocates all the results. So creating a new connection from the database point of view is expensive operations, sometimes will depends on the select on the query that you have created a new connection takes more time than the Select itself. So if we can prevent this to create new connections all the time, we can say save resources. So that’s another interesting thing. They are connection pooling. So a connection pooling here will open the main connections of doubt to it, let’s say I want 100 connection to my database. So my connection pooling here in this case, a proxy or PG bouncer here is a PG bouncer will open 100 connections in my database and keep those 100 connections open. So they’re always keep those connections open. So whenever it needs to close one connection, sometimes the memory management of some connection pooling, they’re not as viewed. And sometimes it’s just simpler to close a connection and open anyone to release the garbage in memory. And to get everything clean. But the application doesn’t see this it does in the background.
And connection management. It’s a motor benefit that we get for connection pooling. Right, so let’s say we have 1000 connections here. 1000 connections, as we can see here, I have one primary, right MQ applicant in this diagram. So here, I’m using Patroni, to coordinate the cluster to have high availability. If these primary crashes, this primary goes down here. My connections that are to the connection pool in here, they can keep the connection from application sites. And then well, all the connections that they’ve been working here. That all the transaction that is the word that we open here, they obviously would be rolling back. So we took the connection, who will tell the application for Lord, we have a problem, we’re rolling back this, this transaction, we need to apply this transaction again. But it doesn’t need to draw connections from the application. And it can arise route, the applicate the new connections to the new primary that’s been elected. So we can work with the proxy and the pooler. To gather, we have otter coordination, in our case, here, we’re using Patroni. To hide this complexity from the application, the application doesn’t need to know all this complexity here behind the connection pool or the proxy. Right? This is another benefit that we get from using a connection pool or proxy in front of the database, right? We can hide a lot of complexity that we have here, when we’re implementing all this architecture, all this kind of stuff. Application doesn’t need to know what application okay, I’m just connecting to the database, I only need a connection to the database. And yet, I don’t care if it’s high available, if for our actual application care that it’s Harvard, but if I don’t care how the high availability is implemented, it’s using Patroni. If it’s using a lot of technology, I don’t care, I just want to connect to the database and get my data. Alright, so having a connection point here in database helps. Yep.
Edith Puclla: Yeah, I have one question. Sorry. I know a little bit about the databases, but not too much. So this connection pooling is something that we can use also in MySQL and MongoDB.
Yeah, we do. Exactly. We there are some applications and some connection pooling for the audit database as well. For example, on the MySQL approach, we have ProxySQL, but the idea is the same. We can use the pooling in front of any database like or even in proprietary data. Things like Oracle SQL Server, there are implementations of connection pooling for them as well. Right. And the benefits that we get are the same. Why? Even for database like MySQL that uses trads is type instead of process to handle connections, they still expensive. I tried in the catalog to create a trash is not so cheap as people believe and is compared to create a process especially on Linux, it’s a bit cheaper. But it’s still an expensive operation for the scanner to to create a new thread. Right? So and the benefits that you have are the same. The context switching that we have are exactly the same. Doesn’t matter if it’s a process, or if it’s a trench, right? So inside of the CPU and the lifespan, they are all threads, they all is slicing the same, but they have priorities and everything. So they work the same, the benefits that you get it, it’s not only for post wins, but they are all the same. Like we want to have similar benefits are using a connection pool, and different implementations of the database, they have different implementations of connection pools. And I don’t know much about MongoDB. But if not mistaken, Mongo , it works very similar to a connection pooling in front of MongoDB, though it does the coordination and all this kind of stuff. But I have no property, no knowledge. Thank you.
Mario Garcia: Yeah, I have another question. What if we, for example, we have a process that every process is running an amount of SQL queries? How do we guarantee that the connection to the database is maintained? Even if the process takes more time than the unexpected one? How do we guarantee that the connection won’t close?
Well, there are some problems that we need to address, right? So when we work with TCP, TCP, we have this, this feeling that TCP IP is a cable that goes from one end to another and right especially when we see diagrams of people explaining if a loop with TCP IP is a connection stream, it’s open sea and in kept open during all time, right? So actually, that’s that’s not true. What have the how the communication works between the client and the server. So the protocol will coordinate by we the protocol is the language that they use. So when we open a connection to the database, the protocol of that database will tell that database for look, I want you to keep this port open for X amount of time, right? So it will open when we send the timestamp and then it will get open for X amount of time. So if no communication happens in that amount of time, one of the ends might take Okay, the other guy is just that. But before I close, let me send a packet a ping to the guy to see if he’s still alive. So the ends we’ll think cancer can be ICMP packet can be one packet that’s sent by the protocol to the other end to ask, Hey, I still alive still waiting for me to finish. And the guy replied Well, I’m good. I’m still here. I’m still waiting for you to finish. So even when the server is busy processing things, the communication channel is still open. So that connection is becoming the channel is still open. So and they still keep talking to each other. To save a look. I’m still eating can you renew my my time ticket because you can think about that. The time expiration as a ticket, right, you get your car, you go to park, and then you get a ticket for one hour. So if your car stays there in the parking lot longer than one hour, when this hour expires, you need to go there and get another ticket for another hour, right? And so when it’s a fault, if you stay there one month, that’s all fine. As long as you go there and get another ticket in another ticket ever amount of time that you agreed to, to the owner of the parking lot that you go there and get a ticket. If you don’t know that the part of the owner can just stole your car or something, right. So this is usually how it happens on the application not only database on the application, what when you have the connection, so the connection is open. And then ever amount of time. That is there is an agreement between the two ends, I look at the amount of time I got to come here and get a new ticket to tell you to look at your life. I’m still alive, I’m still alive. Right? It’s a simplistic explanation. But this is usually how that works.
We have another thing that we have are the pipe out configurations, right? Usually, you don’t want that one, select one operation and your database take 10 hours. You want your database to give up. If it’s running for like, let’s say three hours, man, it’s, it’s running for too long. I gotta give up. So it times out. And then it replies to the application. And it takes like, it’s read three hours and not even close to finish. I’m gonna read right. So can you take your car out? I think there’s much to do something else. Right. So this is the time out. You we have configurations for now. So we have configurations for the timeout connection, let’s say you will start opening a connection to the database. Your application goes to the database locks that offer look, I want the new connection. So you send a packet to the database where playback okay, if you want a new connection, this is what I need for you. I need username and password. I need to know which database one to connect. And I need an atom more information. The application goes to type your username, password and everything replies back. So this is a three way communication, right? The application goes to the database asking for a connection, the database replies to the application saying what it needs to be able to connect. And also what is the protocol? So the database tells everything about the new connection, what is needed to open a new connection. And then the application reply back with that information. It’s everything’s fine. Connection is open the database use reply back, okay, connection is open. If the password is wrong, and the username is one, record his reply back. Nope, it didn’t work something wrong here. It didn’t work. For this process to finish, there is a timeout, the database cannot display is Wait, whatever. Because for example, one attack, especially on web application is you send a packet to the web server, the web server reply back. And when the web server reply back, it’s wait for you to acknowledge okay, I’m good here I received a packet. So if you do not acknowledge, and after some time, the server will believe okay, this the packet was lost in between and what happens? You sent another one and another one and another one they attack. One of the network attack on this way is just open the connection and never been my back. Matter reply back, right. So the application has attacked the database. Most of the database is not only Postgres, but most of the databases, they have a time out for you to finish to the connection. So when you start if you don’t finish until that many seconds, so the connection is immediately drop. The everything that you sent on that channel will be ignored will be dropped automatically by the caramel database, the other kind of a loop. Remember that guy that tried to connect him to me here on this spot yet? The guy didn’t reply. I don’t want to hear from him. Everything We dropped from the database. And also that there are other timeouts, there is the query timeout, where is idle connection timeout, like, for example, on this example here that we have a persistent connection. So the proxy opens a connection to the database. Proxy will never close this connection. But if the proxy, if this connection stays for too long, without new communication, knocking the database with the load, it’s a connection is idle, it’s open, but nobody’s doing anything after some time, that either can actually be caused by a database, because there’s this time out.
So what happens is, before that time, the proxy, send a ping send oil to the database, hi, you get, I’m still here. Hi. So it goes out and get a new ticket from the top right. So this is what keeps those collections open, in definitly, right, because before the time out, happens, the connection pool will get anything in the database and think the database. There are many ways some connection points, they do a select one or a connection, please do something a bit smarter. But yeah, they always get a new ticket to keep the connection open. For the select some database, they implement, also a timeout for the select, as I mentioned, you don’t want your one operation in your database to take forever, you want that operation to finish. So there are timeouts during operations on the database. But one thing that we need to keep in mind this there are some events that they prevent a time out. For example, if we have a data corruption, the disk is corrupted something wrong. So if we have a corruption on the file system on the disk, the CPU can send a request to get the data and the hardware will never come back. That CPU that process want to get stuck there, then you will never advanced in the code. Right The database will stay will wait forever to get data to arrive and will never do anything. It doesn’t matter if you send a cue to the application depending on the problem. It doesn’t matter if you send a Q dash nine, it will not be able to give. If it’s a hardware issue, even a Q dash nine on Linux is not able to queue it stay on uninterruptible state whether the process is there. And the only way to the key all the way to finish this is only the fiscal problem. Usually it’s not possible or rebooting the whole box. Right. So but this is an extreme case. That’s an account. But sometimes it happens. But it’s a very extreme case. But yeah, most of those things, they have the timeouts and both the timeouts they say how long it will take to to forcefully finish down operation. Does that answer your question? Yeah, thank you. Okay, all right. So back to the data connection pooling. We have many connection points in proposals. So we can use the most basic one H A proxy. A proxy is a well-known proxy. And it was firstly designed to work with web applications, not really very good applications. The thing is, it’s a TCP proxy. So for HA proxy, it doesn’t care. If the packets are going to observe or if the packets are going to database, it just get the backends you tell him to reach a server and which port on that server it needs to send the packet and need to get the job done. Right. So everything that comes here with have a look, everything that comes to the port 5432 Here, you send to this server on the port, I don’t know 5432 To get the work done. And it’s really fast, easy. For a configuration, the configuration of H A proxy is stupidly easy. It’s very reliable, as it’s really old projects been working for decades. It’s really stable and reliable and fast ag practice is stable and reliable. The problem is it doesn’t, it does not understand the database protocol. It works on TCP layer, it doesn’t understand the database protocol. So we cannot send, we cannot split Read and Write prayers, for example, we cannot put any rules or anything fancy or special on the box, or you cannot even implement a authentication on at the age of proxy. We can write plugins and extend the capabilities of a new proxy.
But it’s a bit more complex, right? So by default, your products is just it just redirects, the packets and the connection. So but it does well, it does it really well. We can work and we can have an implementation of connection pooling, we can tell if a look, you’re gonna we can receive 1000 connections from the client, if an application, the application can open 1000 connections, and those 1000 connections, you’re going to send here, only 100 connections. So how it does is you open the connection that arrived first we’ll get to the database, when you reach all the 100 connections, the other 900 connection gonna stay here waiting for one slot to be available. So when one connection finishes, it will be returned to the AU proxy. And then the next connection, we will be available. And we’ll be the Ag proxy will give it to the next one in the queue. Right. So how about a long queue, when it was finished another one for the queue, just get to the pool and get access to the database. Basically, this is how it works. It’s very simplistic. There is no much of complexity here. And depending on your application, if you really if the only thing that you need is connection pooling, if you do not need complex logic behind those things, if you do not, it’s it’s really easy. It’s possible to implement read and write the split. But here you want to have one connection for reads. And another one connection for writes. So everything that comes for the connection for read here, you can sign them. But now it’s the job of the application to decide which connection it will open. So if the connection if the application needs to write, for example, it can open a connection here like this, this red one on port 543 Q is for reads only select, you can open another one on the port 6677. For example, everything that comes from the correct 6677 The aged proxy can send to the replicas here. And everything that comes on the right one does the port 5432 It only be sent to the primary. Right? This is the way that we can do this implementation approximate. Another one is the PG bouncer, PG bouncer has is a lot more capabilities to work with both with connection that HA proxy a good thing for your proxies, because it’s, it’s not a database connection. Pooling proxy is just a TCP layer proxy. It can work with any database, it can work with MySQL, Postgres, MongoDB, any database. So this is a good thing with AJ proxy. Right. And because a lot of people read work and know how it works, because they’ve been used for webserver this implementation worked with ag proxies is a lot easier. So the pitch Bowser is a red nose that understand the protocol from PostgreSQL. And on the PG bouncer level, we now can have implementation of some features that is not able to only practice. For example, we can have a implementation or improve the security of the database here. We can have great users on the database if you use it only by the PG bouncer and you can restrict those data those users to go through a specific database so we can start creating rules inside of the PG bouncer right so we can create one pulling connection here from the PgBouncer that if comes with this user, it will send that connection to the replica or it send the connection to the primary. So we can start creating rules inside of the PG bouncer. The other one that’s really well-known in the market and but it’s not here in this diagram is the PG pool.
The PG pool has a few advantages over PG bouncer when it’s well implemented. Because even though PgBouncer understands the Post Office Protocol, it’s not able to do the read and write private split. So if you have some salons that your application sends, that selects all the time for for the database, those will be cached on pg Pool. But you can use memcache and can use Redis for caching for example. Well, for right is Mr. Chu, I think it’s it’s already using the last time it was using Memcache. But it also has its own implementation of cache. So all that data will be cached on pg Pool. If you send that same select to the data application many, many times a second, those data can be served by the PG pool, it never goes to the database. So you start alleviating the database. And because it’s in memory is in cache memory, it’s a lot faster, it will be served a lot faster to go to the database. Ah, a common implementation that we see is something like this, but instead of having a PgBouncer here is we’re having PG. Well, problem for pg Pool is there. I personally don’t lie, and is how it handles connection, just like Postgres pg Pool opens one process per connection. So it’s not scalable. Digipro cannot scale for 1000s of connections at the same time. It has a similar problem that most users grow heads. So because of that, usually we use either a proxy or PG bouncer in front of BGP, so those they can receive 1000s of connections PgBouncer can receive 1000s of connections. And from PG bouncer, we send the connections to pitchipoo like few connections, 100-200 connections so that the GPU and the pitchipoo can do the split the reads and writes split and pitchipoo can also have the cache here on pgPool. That works, fine works well. So as curiosity, PgBouncer, it implements the connection, not as using the process, actually PgBouncer uses only one process. And all the connections that come to PG bouncer, it uses a simple a synchronous i o library to put that connection. So an example is you receive a connection for the application here, the connection comes here, and the PgBouncer sends that connection to the database. So when PG bouncer sends that connection to the database, it put in one cue and now the character looks can see this network connection. Well, I will cue to do not find me when something comes back from from this connection. So it doesn’t need to the CPU doesn’t need to keep spinning to share. If something comes all the time on that connection. It really stays that do not so use zero CPU resources. When something comes on the network. The metric will not fire the candle and the candle you’ll not find the big balls for load You need to wake up because we have something from this connection.
And that job also will return this thing to the application. So this implementation is really powerful because we can handle 1000s of connections. And just because we don’t need to, to keep looking, if the database replied back, so we can leave all those 1000s connections like on the queue, and when we have something from the, from the database, we just get what to have get an excellent action, put in the queue again, and come back. Even the context switching here works in a way that’s a lot better control than if we are having connections working like either process or even tracks. So this way of implementing the connection works our better and way more scalable and that’s why PgBouncer is scales a lot better when we have hundreds of 1000s of connections Stan pgPool Even the database this is why it’s so fast so powerful to have solutions like PgBouncer another proxy that it’s new in the market and it’s cool this is another new project. They work in fashion we like big pumps of implementing things. But for those ones that are starting in one to learn how those things, things work in one to hack into the code. The code of the OG Ra is a lot simpler to understand the code of VG and PG bouncer. So it’s a nice implementation to have a nice implementation. I’ve seen some companies that are starting using it’s not so popular like pgPool and PgBouncer because they for vacates in the market. Digibox was firstly developed by Skype. It was a product that they built to solve their own problems they have pgPool was developed a long ago even before posts which had the built in replication. So one problem that digital was intended to solve pgPool one was intended to solve as well was to have replication from composers. That’s why it still hurts still, we’re still able to send data for multiple servers using PG. Not sure if it’s useful anymore. So I’ve seen some people trying to make nice stuff that we can we can have problems. So be careful when using those fancy features for your implementation. And thing is, as per call, we support H A proxy PgBouncer and pgPool So we support those three especially pgPool. We have developers in house so one of the main developers of pgPool work with us for chronic and but we also support PgBouncer, and we really usually recommend depends on your architecture PG bouncer. And sometimes as I mentioned PgBouncertogether with PG boo If it can give really good results. Well, we don’t have time to go for some demos here. But the configuration of that it’s simplistic. We don’t have much, very, so much to do in the configuration is very simple. That’s it. I hope we learned something today. And it was enjoyable.
Edith Puclla: Yeah, it was great. Thank you so much for your time.
Charly Batista: Yeah, talking too much as, as usual.
Mario Garcia: Thank you. Thank you. Thank you so much for, for sharing. Well, I was interested in learning more about connection pool because I’m working on a project that implement well, that I recently implemented multiprocessing. But I have this problem, how do I implement connection pooling for all the queries that I need to execute? So what you share today would really help me to have a better idea on how to solve that.
Edith Puclla: Do we have another session virtually in another time? No, this is
Charly Batista: We have the session every two weeks, right? So ever all the week? We have one? So two weeks where we have another one? I don’t know what’s going to be the topic. But yeah, we do. We have a county called scatter for sessions up to the end of the year, or the week we were here to talk that much and bother people a little bit about database, especially poses.
Edith Puclla: Yeah, this is nice. I’m learning positives now. So thank you so much for this. Yeah.
Mario Garcia: Well, we don’t have any questions in the chat. So I think we can finish. And really, I really thank everyone who joined us today. And I hope that you learn something from this session, Charly shared some interesting tools that I think we need to start checking. So thank you so much for that. And thank you, everyone, who join us. See you next week.
Follow us on Twitter and get informed of all upcoming meetups. ∎