we are talking about the Galera Replication Basically, the flow control in Galera Replication First, let鈥檚 define Galera Replication Galera Replication is replication scheme used to make the data flow from writer to reader, so primary to secondary node. In this diagram, we have a simplistic data flow or representation of the application in Galera. Let鈥檚 say you have your app, your web server sending requests to the primary. This is data is getting replicated. The data is going to be encapsulated into some write set, and the write set will be broadcast into the secondaries. That鈥檚 not the topic in Galera Replication But just to keep in mind that the time it takes on the primary is much shorter for any given transactions. It鈥檚 much shorter than the time it basically takes to execute the same transactions on the secondary or secondaries. Because some internal design, but we are in the future presentation, we might dig into it, delve into it to explain why. This is the pinpoint, is the flow control. Flow control, the definition is flow control means pause. The application gets paused or stopped. It鈥檚 as simple as that. Here鈥檚 the use case. You see a show process list, you see a slow queries in commit state.
If you go on top, you鈥檙e going to see this one. It鈥檚 on commit state, it鈥檚 deleting data. Here鈥檚 the thread number. It鈥檚 true that there are other threads on the bottom, but they are in sleep mode, and the other ones, they are system. We are not interested in those. Those are internal threads. We are talking about custom threads, threads created to carry out queries, execute queries for Magento application. Here鈥檚 the thread number 0084. This query has been running for 740 seconds. It鈥檚 true that subsequent query is taking longer, but there鈥檚 a reason for that, and I鈥檓 going to explain you why. Here鈥檚 the thread. Let鈥檚 copy it and go to this slow query log. Let鈥檚 go to slow query. Let鈥檚 first look for it. You see the first query, and it鈥檚 basically at the same time. This time coincides with the time of the capture of the show process list. You see the delete was running. It was deleting more than 67 million records at once. The shortcoming of Galera is that this, you see on the primary, it鈥檚 going to be executed at once, like in bulk. The server is going to execute one commit, is going to basically apply the data to the primary. Whereas once the transaction, the query is broadcasted or the output is broadcasted to the secondaries, the secondaries are not going to execute those 67 million records or data changes at once. It鈥檚 going to execute them one by one. That鈥檚 one of the biggest shortcomings when it comes to Galera. That鈥檚 basically what鈥檚 happening. That鈥檚 why it鈥檚 much faster on the primary, and that鈥檚 why the secondary nodes cannot keep up with the primary. If you keep looking, you see there鈥檚 another delete. This one is nearly 20,000 records. It鈥檚 not as many as 67, but then if you move to the third one, you see it was deleting 82 million records at once. That means 82 million individual queries on the secondary nodes. That鈥檚 by design. This is again, this is shortcoming as far as Galera is concerned. That鈥檚 again, you remember we talked about why subsequent queries were taking longer. This is the explanation that was being executed in a loop. There are many multiple queries, but it鈥檚 the same thread. The remedy is to have some good practices in place. One of them is to, that鈥檚 the only one as far as I know, because this is shortcoming by design. There鈥檚 no other way to overcome it, to solve it. The only way you can at least alleviate this issue is through good practices. The only one I know by experience is to basically, instead of trying to delete 82 million records at once on the primary, you鈥檙e doing in small batches like 2,000 between 2,000 and 5,000 records at once. Then you can do 2,000, you delete 2,000, you take a break. In the code, you can do sleep or whatever command you have at your disposal. You can wait for five seconds so that the secondary can execute those 2,000 records and take a break until you send the second batch, subsequent batches. In a nutshell, that鈥檚 how we should basically approach this issue and how we can at least alleviate this issue. That鈥檚 the recommendation. Thank you for listening.