Percona Toolkit pt-query-digest
Learn why you use the pt-query-digest and some real-world examples to help deepen the reasoning.
Who is this video for?
- Architects
- Developers
- DevOps
Video content
- Learn about pt-query-digest usage
- Learn about the benefits and shortcomings of this Percona Toolkit feature
- Understand the results and learn what possible performance steps should be considered
Transcript
Today鈥檚 topic is PTQuery Digest. What鈥檚 PTQuery Digest? PTQuery Digest is one of the periclonal tools. It鈥檚 basically a periscript that allows you to analyze MySQL queries from logs, processes, and TC dump. Basically, the issue, the pain point we are trying to solve today is, for example, oftentimes your website or an application or experience, it is spike, and as a result of that, a database bottleneck might happen. We open an investigation to dig into the issue in an attempt to basically shed light on what might have happened. PTQuery Digest might come in handy. The benefits of PTQuery Digest are the fact that it summarizes all queries from the slow query log, processes, and TC dump in a very intuitive format. Obviously, it also has shortcomings. One of the shortcomings is the fact that it doesn鈥檛 basically tell the full story. That means it doesn鈥檛 provide as much details as PT stock. Here鈥檚 how we execute PT Digest. We are assuming that you already have Prokono toolkit installed on your server. PTQuery Digest is one of many dozens of Prokono tools that allow you to tackle very complicated, combo-loaded database tasks. We are assuming PTQuery Digest is installed. What you need to do, you just need to invoke it with the slow query log. There are two parameters you can pass to basically narrow the window, because here it鈥檚 going to analyze the entire slow query log. It could be 24 hours worth of query. If you want to narrow it, let鈥檚 say the bottleneck happened between 8 AM to 10 AM UTC. If you want to narrow it down to this specific window, you can pass two parameters since and until. It鈥檚 going to provide this output. Here you have the first query. What does this output mean? It鈥檚 going to give you the percentage during this specific window you specify, if you specify in a window. It鈥檚 going to tell you the percentage of the percentage of the specific query. In this example, this query basically accounts for 20 percent of all queries during this specific window time range. This is the time range from December 22nd, 6 AM to December 23rd, 1 AM UTC. This is a pretty large window. This query accounts for 26 percent of all queries. There are a total of 209 single queries that executed during this specific window. The most interesting part is this one. You need to know how long the query takes. Basically, on average, it takes three seconds to execute. However, 95 percent of the query, 95 percent of time, it took three seconds. Basically, this could be seven or four seconds. It could be different, but in this specific case, the average and 95 percent of time, the query executed is the same. You have this query here. It took three seconds and there鈥檚 a way to optimize it. PD query digest is telling you if a full scan was performed, yes, 100 percent and zero, no. Basically, 100 percent of the time, the query did a full table scan. Basically, we can add an index, a covering index. It鈥檚 not the topic, but we can add it to speed up the query. So the PD query digest is going to basically give you some insight as to how to eventually optimize the query. So it鈥檚 telling you, it tells you about how many rows that were sent, and if there鈥檚 the query size. But again, the most important line is basically the execution time line. The second query is basically the same thing. You focus on this line, but let鈥檚 jump to the last one. The last one, as I said before, you see the average is three seconds, whereas the 95 percent metric is different, because 95 percent of this time, the query took seven seconds, which means it鈥檚 pretty slow query. There鈥檚 a temp table was created 100 percent of the time. So it doesn鈥檛 say anything about full scan, but it tells you that 100 percent of the time, the temp table was created. So what you can do is to optimize the query accordingly. What we did is we created covering index on all of the fields that are in the query. Quantity ordered, updated, all of them. We added them to a single index, that鈥檚 hence the covering index. In a nutshell, it鈥檚 one specific situation like use case of PTQuery digests. The recommendation would be to run it, executed alongside other comments, because again, it doesn鈥檛 tell you the full story. It doesn鈥檛 tell you there are so many other details that could shed light and could explain what was happening on the database. Thank you for listening.
Code references
Be sure to change to match your logs and time frame
$ pt-query-digest mysql-slow.log.7 > mysql-slow.log.7.DIGEST
Useful resources
recommendation-more-help
3a5f7e19-f383-4af8-8983-d01154c1402f