Contact
Site: US UK AU |
Nexcess Blog

Diagnosing MySQL Performance Bottlenecks With Maatkit

March 5, 2011 0 Comments RSS Feed

MySQL
Hello again! Last time we looked at optimizing performance of web apps, we were checking out how to diagnose PHP performance bottlenecks using XDebug to generate valgrind-style logs of what PHP was doing. While that can be immensely useful in some situations, just looking at how the PHP code executes doesn’t give us the whole story of most modern PHP / MySQL apps.

Database queries can be as important or more important than the PHP code itself, and we’ve seen lots of apps that run well enough until it’s time to access the database…where performance can easily go down the drain. Luckily, with most well-written apps like Magento, fixing database query issues is pretty straightforward. Either it’s a quick fix and we just add indices to a table, or it might be a known issue that we have a tried and tested solution for, or it goes back to the developers to figure out.

However, not all apps are well written; developers come and go, extension authors are sometimes hard to reach…so we end up doing a fair amount of MySQL performance troubleshooting in-house, and we’re very grateful to the MySQL community (especially #mysql on irc.freenode.org and Maatkit) for providing us with the tools and knowledge necessary to resolve these issues for our clients.

Today, we’re going to look at a very simple PHP / MySQL app that performs poorly, how to recognize the signs of a MySQL issue, and how to correct the issue. Finally, we’ll test our fix and make sure that it’s actually helping. Since I learn most effectively when I get to actually get in and play around with this stuff, the exercise today is going to be a bit involved and requires some basic PHP / MySQL knowledge to get working. If you are stuck on something or would like to submit a correction, please leave a comment and I’ll reply / check it out.

MySQL offers different storage engines; the two that you’ll likely run into the most are MyISAM and InnoDB. They both have their place, but generally you’ll see MyISAM being used where faster storage is required and the data isn’t critical (such as in visitor log and session tables) and InnoDB used where absolute data integrity is required (such as situations that require ACID compliance or payment / invoice handling). Magento uses InnoDB for most tables, but creating a good example of a Magento MySQL performance issue would be a bit complex / difficult, so for today we’ll be working with MyISAM. The issues between the two are similar and the same basic troubleshooting techniques can generally be applied to either.

First, we’re going to need a simple application that uses a MySQL back-end, a MySQL database (and user that has permissions on said database), and some data to populate the database. You’ll need to replace the database name, database user, database user’s password with whatever are relevant to your setup.

PREREQUISITES

  • A linux / mac machine running mysqld (5.x preferred but everything should work on 4.x as well) and PHP 5.x (again, 4.x might work fine too)
  • shell access to aforementioned machine and basic competency with shell commands
  • ‘wget’ or ‘curl’ or some other way to download the tools
  • perl and the modules described here
  • phpMyAdmin or basic competency with the MySQL interactive CLI client

NOTES

  • All the files for this project are available from our GitHub repository (except the Maatkit tools)
  • SQL uses backticks (the key usually to the left of “1” on the top row; the un-shifted key for tilde [‘~’]) for containing column names. Things inside single quotes (the same key as apostrophe on most keyboards) are evaluated as strings and probably won’t do what you want. Here is an example that demonstrates this. Since there is no column named `test` (we didn’t even select a database yet), it just throws an error, whereas the single-quoted ‘test’ is interpreted as a string and rightfully selected:

<br />
mysql&gt; SELECT 'test';<br />
+------+<br />
| test |<br />
+------+<br />
| test |<br />
+------+<br />
1 row in set (0.00 sec)</p>
<p>mysql&gt; SELECT `test`;<br />
ERROR 1054 (42S22): Unknown column 'test' in 'field list'<br />

Here is a very quick and simple app I wrote that connects to a MySQL database and just selects all the records from one table and prints the results:

<br />
&lt;?php<br />
// DB CONNECTION<br />
$dbhost = 'localhost';<br />
$dbuser = 'some_dbuser';<br />
$dbpass = 'some_dbpass';<br />
$dbname = 'some_database';<br />
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ('Error connecting to mysql');<br />
mysql_select_db($dbname);</p>
<p>// Start the timer<br />
$time = microtime();<br />
$time = explode(&quot; &quot;, $time);<br />
$time = $time[1] + $time[0];<br />
$start = $time;</p>
<p>// Query the DB<br />
$myArr = array();<br />
$someNumbers =  implode(',', array(rand(1, 100000), rand(1, 100000), rand(1, 100000), rand(1, 100000), rand(1, 100000)));<br />
$sql = &quot;SELECT * FROM `some_table` WHERE `number` IN ($someNumbers)&quot;;<br />
$resource = mysql_query($sql, $conn);<br />
while($records = mysql_fetch_assoc($resource)){<br />
  $myArr[] = $records;<br />
}</p>
<p>var_dump($myArr);</p>
<p>// Stop the timer and show how long it took<br />
$time = microtime();<br />
$time = explode(&quot; &quot;, $time);<br />
$time = $time[1] + $time[0];<br />
$finish = $time;<br />
$totaltime = ($finish - $start);<br />
printf (&quot;Page generated in %f seconds.&quot;, $totaltime);<br />
?&gt;<br />

You’ll need to run this in either phpMyAdmin or the MySQL CLI before we start:

<br />
CREATE TABLE `some_databaseName`.`some_table` (<br />
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,<br />
`number` INT UNSIGNED NOT NULL<br />
) ENGINE = MYISAM;<br />

Here is the script to populate our database:

<br />
&lt;!--?php&lt;br /--&gt;// DB CONNECTION<br />
$dbhost = 'localhost';<br />
$dbuser = 'some_dbuser';<br />
$dbpass = 'some_dbpass';<br />
$dbname = 'some_database';<br />
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ('Error connecting to mysql');<br />
mysql_select_db($dbname, $conn);</p>
<p>for( $i=1; $i&lt;=100000; $i++ ) { mysql_query(&quot;INSERT INTO some_table (number) VALUES ('$i')&quot;); printf(&quot;$i&quot;); } mysql_close($conn); ?&gt;<br />

You’ll now need to run the above PHP to populate the database with numbers. You can run it multiple times if you’d like; each time will insert 100K records into the DB. This will probably take a few seconds to do, so just let it run until it finishes.

Whew! So what does all this do? Try executing the main project file (I called mine “index.php” and see what happens. On *nix systems, you can usually execute PHP by doing this:

php -f some-file.php

and you should get output similar to this:

<br />
[ip2k /www/blog]# php -f index.php<br />
array(5) {<br />
  [0]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;17628&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;17628&quot;<br />
  }<br />
  [1]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;39512&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;39512&quot;<br />
  }<br />
  [2]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;53494&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;53494&quot;<br />
  }<br />
  [3]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;68806&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;68806&quot;<br />
  }<br />
  [4]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;88348&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;88348&quot;<br />
  }<br />
}<br />
Page generated in 0.023291 seconds.<br />

The timer output on the last line is what we’re really after. PHP took 23.291ms to run this SQL query, which doesn’t sound like a lot, but keep in mind that we’re only selecting 5 items here. If this was a real application, we’d probably be selecting much much more and have much more complex SQL that pulls in data from other columns / tables as well. Why did it take so long to select so little data? The answer is that it wasn’t indexed, and our ‘number’ column could seriously benefit from an index.

At this point, there are a few methods you can use to analyze a query. MySQL has a built-in function called ‘EXPLAIN’ that lets you see what MySQL is doing with the query. Maatkit can analyze a slow query log, Since our script generates random numbers, we can’t really see the exact query being ran, so this method requires a bit of knowledge about how your application works to really get much use out of, if you can’t actually see real queries being ran.

<br />
mysql&gt; EXPLAIN SELECT * FROM `some_table` WHERE `number` IN (1,2,3,4,5);<br />
+----+-------------+------------+------+---------------+------+---------+------+--------+-------------+<br />
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows   | Extra       |<br />
+----+-------------+------------+------+---------------+------+---------+------+--------+-------------+<br />
|  1 | SIMPLE      | some_table | ALL  | NULL          | NULL | NULL    | NULL | 100000 | Using where |<br />
+----+-------------+------------+------+---------------+------+---------+------+--------+-------------+<br />
1 row in set (0.01 sec)<br />

We can see here that MySQL is looking through 100K rows to find the answer. Keep that in mind moving on.

The other options are using Maatkit’s mk-query-digest to look through a MySQL log (slow query log or full query log) or to look through a tcpdump log, which you can see how to do here or here. Since we’re really just looking to analyse a single query, most of those would be overkill, so let’s just use Maatkit’s mk-query-profiler on our single query and see what happens. You’ll need perl installed, and download instructions / links are here. You’ll also probably want ‘chmod +x mk-query-profiler’ so you can just run it with our query like so:

echo &quot;SELECT * FROM some_table WHERE number IN (1,2,3,4,5);&quot; \<br />
| ./mk-query-profiler --host=0 --user=some_dbuser --pass=some_dbpass --database=some_database<br />

Which should give you results approximately like this:

<br />
+----------------------------------------------------------+<br />
|                      1 (0.0214 sec)                      |<br />
+----------------------------------------------------------+</p>
<p>__ Overall stats _______________________ Value _____________<br />
   Total elapsed time                        0.021<br />
   Questions                                 1<br />
     COMMIT                                  0<br />
     DELETE                                  0<br />
     DELETE MULTI                            0<br />
     INSERT                                  0<br />
     INSERT SELECT                           0<br />
     REPLACE                                 0<br />
     REPLACE SELECT                          0<br />
     SELECT                                  1<br />
     UPDATE                                  0<br />
     UPDATE MULTI                            0<br />
   Data into server                         57<br />
   Data out of server                      199<br />
   Optimizer cost                        20221.726</p>
<p>__ Table and index accesses ____________ Value _____________<br />
   Table locks acquired                      1<br />
   Table scans                               1<br />
     Join                                    0<br />
   Index range scans                         0<br />
     Join without check                      0<br />
     Join with check                         0<br />
   Rows sorted                               0<br />
     Range sorts                             0<br />
     Merge passes                            0<br />
     Table scans                             0<br />
     Potential filesorts                     0<br />

Wow! Look at that Optimizer Cost. You can read about what Optimizer Cost is here, but basically, it’s how hard mysqld had to work trying to figure out the most efficient way to execute our query.

Now, let’s try adding an index on the ‘number’ column in our table. Execute the following SQL in your test database:

<br />
ALTER TABLE `some_table` ADD INDEX ( `number` );<br />

And now, re-run the query profiler:

echo &quot;SELECT * FROM some_table WHERE number IN (1,2,3,4,5);&quot; \<br />
| ./mk-query-profiler --host=0 --user=some_dbuser --pass=some_dbpass --database=some_database<br />

My results look like this:

<br />
+----------------------------------------------------------+<br />
|                      1 (0.0003 sec)                      |<br />
+----------------------------------------------------------+</p>
<p>__ Overall stats _______________________ Value _____________<br />
   Total elapsed time                        0.000<br />
   Questions                                 1<br />
     COMMIT                                  0<br />
     DELETE                                  0<br />
     DELETE MULTI                            0<br />
     INSERT                                  0<br />
     INSERT SELECT                           0<br />
     REPLACE                                 0<br />
     REPLACE SELECT                          0<br />
     SELECT                                  1<br />
     UPDATE                                  0<br />
     UPDATE MULTI                            0<br />
   Data into server                         57<br />
   Data out of server                      199<br />
   Optimizer cost                           12.009</p>
<p>__ Table and index accesses ____________ Value _____________<br />
   Table locks acquired                      1<br />
   Table scans                               0<br />
     Join                                    0<br />
   Index range scans                         1<br />
     Join without check                      0<br />
     Join with check                         0<br />
   Rows sorted                               0<br />
     Range sorts                             0<br />
     Merge passes                            0<br />
     Table scans                             0<br />
     Potential filesorts                     0<br />

We went from 21.4ms to 0.3ms for our query here, or made it about 71 times faster. Let’s take a look at the ‘EXPLAIN’ query again now that we have the index in place:

<br />
mysql&gt; EXPLAIN SELECT * FROM `some_table` WHERE `number` IN (1,2,3,4,5);<br />
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+<br />
| id | select_type | table      | type  | possible_keys | key    | key_len | ref  | rows | Extra       |<br />
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+<br />
|  1 | SIMPLE      | some_table | range | number        | number | 4       | NULL |    5 | Using where |<br />
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+<br />
1 row in set (0.00 sec)<br />

MySQL is only searching through 5 rows now to return 5 results because it knows exactly where to look. While this might seem like common sense and something that mysqld should be able to figure out itself…well, the short answer is that it doesn’t. Also note that under ‘key’ it has the name of the key that it used; ‘number’. We can name keys anything we want, and they should show up under ‘possible_keys’ which show what the query optimizer had to pick from. You can have many keys on a table, primary keys, foreign keys, and composite keys consisting of multiple rows. They’re all useful in different scenarios, and can help as massively as we saw our index help.

But, as always, the proof is in the pudding, so let’s see how much faster this change made our test app:

<br />
array(5) {<br />
  [0]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;57302&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;57302&quot;<br />
  }<br />
  [1]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;57648&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;57648&quot;<br />
  }<br />
  [2]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;67701&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;67701&quot;<br />
  }<br />
  [3]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;78444&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;78444&quot;<br />
  }<br />
  [4]=&gt;<br />
  array(2) {<br />
    [&quot;id&quot;]=&gt;<br />
    string(5) &quot;85407&quot;<br />
    [&quot;number&quot;]=&gt;<br />
    string(5) &quot;85407&quot;<br />
  }<br />
}<br />
Page generated in 0.000684 seconds.<br />

A similarly monumental change. I really can’t stress enough how much good DB schema and proper indices matter in performance. They make the difference between 5 second page loads and 500ms page loads in real-world applications. Sometimes people just forget to add indices in the proper places or they aren’t being used for whatever reason (some MySQL functions cause the query optimizer to not use indices even if they’re otherwise set up correctly…one example is the ‘CONCAT’ function…but any function that modifies a column in the context of the query will void the indices for that column; CONCAT does exactly that).

Posted in: Linux