MySQL: Are you actually utilizing your indexes?

This might seem elementary to those of you that are DBAs or something similar, but it was fascinating to find out (not to mention it greatly helped what I had to do), so I decided to post it, in case it helps someone else too. A few moments ago I found out that whereas a query along the lines of…

SELECT title, COUNT(1) as replies
FROM post INNER JOIN thread USING(threadid)
WHERE UNIX_TIMESTAMP(NOW()) - post.dateline < 86400
GROUP BY threadid
ORDER BY replies DESC
LIMIT 5

took a whopping ~10 seconds on a post table of around 2,000,000 rows and a thread table of around 40,000 rows, the following:

SELECT title, COUNT(1) as replies
FROM post INNER JOIN thread USING(threadid)
WHERE post.dateline > UNIX_TIMESTAMP(NOW()) - 86400
GROUP BY threadid
ORDER BY replies DESC
LIMIT 5

took a mere 0.03 seconds!

Probably, MySQL wasn’t able to utilize the B+ tree index of the dateline column in the first query, whereas in the second, things were a bit more obvious to it. This can also be observed by examining the information about the execution plan that EXPLAIN provides:

mysql> explain select t.threadid, t.title, count(1) as replies from vb3_post as p inner join vb3_thread as t using(threadid) where unix_timestamp(now()) - p.dateline < 86400 group by p.threadid order by replies desc limit 5;
+----+-------------+-------+------+---------------+----------+---------+------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key      | key_len | re         | rows  | Extra                           |
+----+-------------+-------+------+---------------+----------+---------+------------+-------+---------------------------------+
|  1 | SIMPLE      | t     | ALL  | PRIMARY       | NULL     | NULL    | NULL       | 39859 | Using temporary; Using filesort |
|  1 | SIMPLE      | p     | ref  | threadid      | threadid | 4       | t.threadid |    49 | Using where                     |
+----+-------------+-------+------+---------------+----------+---------+------------+-------+---------------------------------+
2 rows in set (0.01 sec)

mysql> explain select t.threadid, t.title, count(1) as replies from vb3_post as p inner join vb3_thread as t using(threadid) where p.dateline > UNIX_TIMESTAMP(NOW()) - 86400 group by p.threadid order by replies desc limit 5;
+----+-------------+-------+--------+-------------------+----------+---------+------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys     | key      | key_len | ref        | rows | Extra                                        |
+----+-------------+-------+--------+-------------------+----------+---------+------------+------+----------------------------------------------+
|  1 | SIMPLE      | p     | range  | threadid,dateline | dateline | 4       | NULL       | 1171 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t     | eq_ref | PRIMARY           | PRIMARY  | 4       | p.threadid |    1 |                                              |
+----+-------------+-------+--------+-------------------+----------+---------+------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

So, don’t rest assured that MySQL will use your indexes every time it should. It seems that sometimes you have to explicitly point it out.

  • http://www.addyosmani.com Addy Osmani

    That you knew how to trim down the execution time by that much truly amazes me. I’ll try to remember the tip. Thanks for sharing!

  • Piet

    It’s because in the first you use a function to modify the the value from your index (mysql has to calculate the new value for every row).

    In the second you subtract a constant from the current timestamp, which only has to be calculated once, then the index can be used..

    • http://leaverou.me Lea Verou

      Obviously, but I was surprised MySQL couldn’t do that optimization on it’s own. DBMSs do a ton other more complex optimizations to run a query faster.