Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228
Increasing website performance, part #3: caching expensive data

Comments Blog About Development Research Sites

Increasing website performance, part #3
Caching expensive data

Mar 13, 2011
Not too long ago I was tasked with increasing the performance of a high-end CMS system. Now this was not an average your-mom-wants-to-post-holiday-pictures kind of CMS but a complex system designed for large companies developed over several years by a team of highly trained professionals. It was also horribly slow for todays standards, where a single pageview could easily take over ten seconds to generate, even with only a handful of users logged in.

The first steps were building a profiler, much like I described earlier, and finding bottlenecks. To my surprise, I found none. As was to be expected from a well designed application there were no obvious performance problems. No intricate nested queries lasting seconds, no expensive file IO operations, no slow service calls to external webservices. What I did find however, was that an average pageview required over 500 SQL queries. Let this sink in for a moment: something as simple as viewing a list of the latest publications required several hundred queries.

The reason for this soon became apparent, as did the solution: the CMS used an internally developed CRM system which did not store previous results. Thus you would see for example 40 identical queries retrieving the currently logged in users' ID. Soon enough I had rewritten the CRM system to store most common values and only perform an SQL query for them when they were unknown or altered (luckily, updates were also handled by that same CRM system allowing for data consistency - most of my time was in fact spend ensuring this was so). The number of SQL queries needed for a single pageview dropped from over 500 to about 50 ~ 60, depending on the page, and page generation times went from over a dozen seconds to little more than a second.

Expensive simple data
What this story teaches is one very simple fact: even if your data is very fast and easy to retrieve, it can still slow your application down if you need too much of it. In fact, it is quite often easier to speed up a single complex slow instruction than to speed up multiple simple fast instructions. Consider for example one slow instruction which is required once every ten pageviews which takes 100ms to generate, and a very fast instruction required for every pageview which takes 10ms to generate. Over ten pageviews both instructions will require just as much time to generate in total. But if you store the result of the long instruction, retrieving it will be a very simple instruction so you go from 100ms to 10ms quite easily. But how do you speed up a 10ms instruction to only take 1ms?

Sometimes choosing an alternate data provider can help here. I already talked about Memcached as a fast alternative for simple data, and in some scenarios filestorage performs better than a normal database as well. However, there are limits to how far this can get you. The trick then is to lower the number of requests so you can retrieve more data per simple instruction.

Caching simple variables
Using PHP's ability to serialize and unserialize a value it is possible to store variables and arrays of variables as strings. This allows us to to store and load a large amount of variables directly from a single storage location, with a single instruction:
Code (php) (nieuw venster):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class CacheFile extends Cache {
  
  private
static    $cache    = array(),
                    $modified = false;
  
  const             LOCATION  = 'Cache/files/controller/data.cache';
  
  
  public
final function __construct () {
    if (
file_exists(ROOT . CONTROLLER . self::LOCATION))
      Cache::setCache(unserialize(file_get_contents(ROOT . CONTROLLER . self::LOCATION)));
  }
  
  
  public
final function destroy () {
      file_put_contents(ROOT . CONTROLLER . self::LOCATION, serialize(Cache::getCache()));
  }
}

A generic Cache class is required to utilize this Cache data provider over our entire application:
Code (php) (nieuw venster):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/**
 *  Cache - providing generic caching
 *
 *  Cache works in mysterious ways. Since caching a little
 *  bit of data can save expensive queries / file searches,
 *  it is often smart to do so. However, making many requests
 *  on the caching mechanism is often quite expensive, performance
 *  wise. Thus, Cache loads in ALL cached data each pageview
 *  from a single file. Up to a dozen kilobytes or so this is
 *  still much more efficient than loading several cache-files seperately.
**/

class Cache {
  
  
  private
static    $provider   = NULL,
                    $cache      = array(),
                    $modified   = false;
                    
  
  /**
   *  Create a Cache object with a specific provider. By default
   *  this is a file provider, which is usually the best way to
   *  go. Currently a Memcache interface is available as well and
   *  other interfaces can in theory be added relatively easily
   *  by adding them as classes in the Cache folder.
   *
   *  @param  provider    String    Name of the caching data-provider, or
   *                                storage facilitator.
  **/

  public function __construct ($provider = 'File') {
    if (
self::$provider)
      return;
      
    require_once(
ROOT . CONTROLLER . "Cache/" . $provider . ".class.php");
    $provider = 'Cache' . $provider;
    self::$provider = new $provider();
  }
  
  
  public
static function has ($variable) {
    return isset(
self::$cache[$variable]);
  }
  
  
  public
static function set ($variable, $value) {
    self::$modified         = true;
    self::$cache[$variable] = $value;
  }
  
  
  public
static function get ($variable) {
    return
self::$cache[$variable];
  }
  

  public
static function setCache ($cache) {
    self::$cache = $cache;
  }
  
  public
static function getCache () {
    return
self::$cache;
  }

  
  public function
__destruct () {
    if (
self::$modified && defined('DEBUG') && DEBUG === false) {
      self::$provider -> destroy();
    }
  }
}

What happens here is quite simple: a caching data provider is set (in this case FileStorage, but it is just as easy to utilize Memcache or MySQL for example) and used to load an array of variables and their values. When a new variable is set to be stored, the cache class notices that it is modified (self::$modified == true) and when it is destroyed it asks the data provider to destroy itself, storing the entire data array as a string, which is loaded again next pageview.

For a more complex solution you might want to add data expiration so your cache stays small. Memcache already provides methods for this by specifying a TTL parameter when storing your cached datastring.

Using this cache provider now is relatively easy. Say for example you want to know the accesslevel of a certain user. You can perform a query to do this, but you will need to do this at every pageview, so it is an ideal candidate to cache:
Code (php) (nieuw venster):
1
2
3
4
5
6
7
if (Cache::has('userdata_123'))
  $accesslevel = Cache::get('userdata_123')['accesslevel'];
else {

  $userdata    = performSQLquery(123);
  $accesslevel = $userdata['accesslevel'];
  Cache::set('userdata_123', $userdata);
}

Now as soon as a new user is encountered his details will be stored by the caching mechanism and be available again the next pageview. Of course, doing this for a single value makes no sense: as we already established a single simple instruction might take 10ms and by loading some data which might not even be used this pageview it could easily take longer, say 15ms to load. The advantage only becomes apparent when you need a dozen of these values: loading an array of data this way scales extremely well for a large amount of simple values. Thus if you need ten values, it will still only take about 15ms to load, but doing it the normal way takes 10 * 10 = 100ms - thus we have just increased performance from 100ms to 15ms!

Everything is of course not as simple as just that: performance of various storage methods differs between platforms and even between different types of data. In general it is best to cache small simple variables up to a total of about 64k (which really is a LOT of data if stored this way). Performance usually decays with larger data and cache hitrate (the better your hitrate, the better your average performance). Caching too much is thus not good but neither is caching too little since then you will still be loading much data from your database and the relative gain of this caching method is negated. Make sure that before you start you know what data to cache and test at each stage what the effect is using a profiler. It can not hurt to test different storage providers either, some might just surprise you. And as always, keep security in mind: your cache files might just contain sensitive (unencrypted?!) information, are they stored securely?

The best kind of data to cache will be stuff needed at each pageview, for all users. The location of your class files for example, access levels, menu items, translations for common labels, etc. Dynamic data which is small and unlikely to change often. These might not actually be all that abundant in your application so ask yourself whether you actually need simple-variable-caching. Again, profiling here can give you at least a clue about what the answer might be to that question.

FragFrog out!

New comment

Your name:
Comment: