During the last couple of months I have been involved in an unusually high amount of performance audits for e-commerce applications running with Magento. And although the systems were quite different, they also had one thing in common: the MySQL query cache was very useful. That was counter-intuitive for me as I’ve always expected the query cache to be such a bottleneck that response time is better when the query cache is turned off no matter what. That lead me to run a few experiments to better understand when the query cache can be helpful.

Some context

The query cache is well known for its contentions: a global mutex has to be acquired for any read or write operation, which means that any access is serialized. This was not an issue 15 years ago, but with today’s multi-core servers, such serialization is the best way to kill performance.

However from a performance point of view, any query cache hit is served in a few tens of microseconds while the fastest access with InnoDB (primary lookup) still requires several hundreds of microseconds. Yes, the query cache is at least an order of magnitude faster than any query that goes to InnoDB.

A simple test

To better understand how good or bad the query cache can be, I set up a very simple benchmark:

  • 1M records were inserted in 16 tables.
  • A moderate write load (65 updates/s) was run with a modified version of the update_index.lua sysbench script (see the end of the post for the code).
  • The select.lua sysbench script was run, with several values for the --num-threads option.

Note that the test is designed to be unfavorable to the query cache as the whole dataset fits in the buffer pool and the SELECT statements are very simple. Also note that I configured the query cache to be large enough so that no entry was evicted from the cache due to low memory.

Results – MySQL query cache ON

First here are the results when the query cache is enabled:

qcache_on

This configuration scales well up to 4 concurrent threads, but then the throughput degrades very quickly. With 10 concurrent threads, SHOW PROCESSLIST is enough to show you that all threads spend all their time waiting for the query cache mutex. Okay, this is not a surprise.

Results – MySQL query cache OFF

When the query cache is disabled, this is another story:

qcache_off

Throughput scales well up to somewhere between 10 and 20 threads (for the record the server I was using had 16 cores). But more importantly, even at the higher concurrencies, the overall throughput continued to increase: at 20 concurrent threads, MySQL was able to serve nearly 3x more queries without the query cache.

Conclusion

With Magento, you can expect to have a light write workload, very low concurrency and also quite complex SELECT statements. Given the results of our simple benchmarks, it is finally not that surprising that the MySQL query cache is a good fit in this case.

It is also worth noting that many applications run a database workload where writes are light and concurrency is low: the query cache should then not be discarded immediately. And maybe it is time for Oracle to make plans to improve the query cache as suggested by Peter a few years ago?

Annex: sysbench commands

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Mohamed

So, to sum things up, you mean that if I’ve high concurrencies, then it’s better to turn off query_cache and vice versa?

Henning

@Mohamed: Yes… And no.

The most important sentence is in my humble opinion a bit hidden:

“Note that the test is designed to be unfavorable to the query cache as the whole dataset fits in the buffer pool and the SELECT statements are very simple. Also note that I configured the query cache to be large enough so that no entry was evicted from the cache due to low memory.”

The query cache can be great. However it requires quite a special setup…

The best example that comes to my mind ( haven’t tested, but I think it fit’s the query cache well) is the following:

You’ve got quite large data that you should analyze and aggregate, for example from a warehouse system. As you know that crunching these numbers in real time would give a severe speed penalty on the live system, you’ll instead aggregate the data on another database… You still have very complex queries as you just aggregated the data (prepared? not a native speaker, sorry), let’s assume something like a union of several dozen tables with conditions (maybe you splitted the database into “year tables” and now just need to union the different tables).

Like… All items sold from Year 2000-2010 including different aspects of the items ( “product lifetime”, price atm etc., basically a large graveyard of properties that might or might not be relevant )

That should fit the query cache I guess… You are running complex queries with conditions which are kept the same (like in this case, the “date” parts for the union of the tables) on a read-only to light-write database, so that the speed penalty from checking if the query is present in the query cache is less than the time needed to prepare the execution of the query.^^

Baron Schwartz

Pleeeaaaaase do not use those fancy 3d charts. Please. (headdesk) Please. I’m begging you. Update this post with charts that can be read and understood visually.

OK I may be a little melodramatic about this 😉

(Please?)

Morgan Tocker

I’m curious which version of MySQL you are running?

The results are pretty consistent with my understanding as well. The only comment I would have is that query cache is a throughput-based optimization at low concurrency. It may improve response time of some queries, but it has high variance from cache misses, so the p99 is unlikely to shift.

無名套裝

Thanks for your share

I think the article is a bit misleading concerning Magento. As there is a high frequency of updates during sales season to tables that are involved in a lot of selects. Also majority of Magento stores don’t have huge amount of data (1b+ records) to really benefit from it and combination of input parameters for query is too complex to really get a good enough hit ratio without cache being evicted and run into constant cache locking. I usually always disable it and focus more on optimizing the underlying query. As well Magento already uses a lot of application based materialization of complex queries for read operations.