Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /share/CACHEDEV1_DATA/Web/www/libraries/UBBcode/text_parser.class.php on line 228
Increasing website performance, part #2: compiling PHP files

Comments Blog About Development Research Sites

Increasing website performance, part #2
Compiling PHP files

Sep 14, 2010
As many PHP developers soon learn, it is good practice to split up your code in classes. By not dumping every function you write in the global namespace you prevent a lot of trouble and increase readability of your code. A good second step is to stick to one class per file. Now, if you adhere to more or less strict naming conventions you can eliminate the need for includes and requires: by defining a custom classloader, classes can be loaded exactly when they are required!

Useability and maintainability wise there are strong arguments for such a setup. By eliminating the need for manual includes you ensure only those classes you need are loaded, and by using a somewhat smart classloader you also prevent duplicate class definitions. There is however a slight downside: including a file is a relatively expensive procedure, performance-wise. In a nuttshell then, a short introduction to PHP's classloader as well as a simpel caching mechanism to speed it up greatly.

Defining a custom PHP classloader
Using spl_autoload_register or __autoload we can specify a method to use when an unknown class needs to be loaded. A very simple example of how this can work:
Code (php) (nieuw venster):
1
2
3
4
5
6
7
8
9
10
class ClassLoader {
  public function
__construct () {
    spl_autoload_register(array($this, "load"));
  }
  
  public function
load ($name) {
    require_once(
$name . '.php');
  }
}
new
ClassLoader();

If you name your files the same as the classes defined in them, this will work just fine, but in most scenarios you will want to include a way to search through directories, handle case, postfixes, etcetera. A more complete (and complex) classloader can be found in Turok, a high-speed open source PHP framework I helped develop.

Compiling PHP files
The term compiling is probably a bit confusing here: I do not propose to compile PHP code into byte-code, as some applications can. That is a fairly complex matter altogether generally requiring specialised (and expensive) software. What I propose to do here is a lot simpeler yet fairly effective: instead of loading a lot of seperate PHP files each pageview, create one big file stripped of all excess whitespace and comments containing just the PHP code of all the files you need (and nothing more!). Now we could do this manually, but why do that if you can do it automatically?

This is where our classloader comes in: each time a new PHP file needs to be loaded, its 'load' method is called. By augmenting that load method with a way to store the file contents we can add them to a 'compiled' file containing all the PHP code we need for a specific pageview:
Code (php) (nieuw venster):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  /**
   *  Attempt to load the class given by the name parameter.
   *
   *  Note that the file location of a class is cached, with automatic cache
   *  invalidation: if a class' location is known, we attempt to load in that
   *  location. If the location does /not/ contain that class (anymore) or if we
   *  do not know the location of the requested class, attempt to find it by
   *  searching through the registerd paths.
   *
   *  @param  name    Classname to load.
   *  @return Void.
  **/

  public function load ($name) {
    Profiler::start('ClassLoader', 'load');
    global $context;
    if (isset(
$this -> locations[$name])) {
      if (
file_exists($this -> locations[$name])) {
        Cache::addFile($name, $this -> locations[$name]);
        Profiler::stop('ClassLoader', 'load');
        return require_once(
$this -> locations[$name]);
      }
    }

    // Search for the file of class $name
    if ($filename = $this -> find($name)) {
      // Found the class in $filename, load it
      $this -> locations[$name] = $filename;
      require_once(
$filename);
      Cache::addFile($name, $this   -> locations[$name]);
      Cache::set('locations', $this -> locations);
    }

    Profiler::stop('ClassLoader', 'load');
  }

Note that this is the (much more advanced) load method in Turoks classloader. If we go over it line by line, we first see a Profiler call - this lets us determine how long each load call took, as well as how many there are. The data from this you can see in my previous article on profiling.

Next we check a (cached) list of file-locations. As said, Turoks ClassLoader is fairly complex and can handle classes stored in (nested) subdirectories easily. There is no limit to how deep these may go either - but searching through directories is a slow process, to speed it up the exact location of a class' file is cached so we know directly where to find a class again later. Next comes a fairly simple check to see if the file still exists where we thought it should be, and if it is we add the file to our file-cache and require_once it.

When either of those conditions is not met we try to find the file and if we find it, store its location, require_once it and update the location-cache with this new array of file locations. If not the class is not loaded and PHP will throw an error, but that should never happen.

PHP-file caching
Our cache now needs to do three things:

  1. Load the compiled PHP classes at the start of each pageview.

  2. Add newly loaded classes if so required.

  3. Update the compiled classfile to include new classes if so required.


To go over each of these steps in turn:

Load a code-cache at the start of each pageview
Code (php) (nieuw venster):
1
2
3
4
5
    if (defined('DEBUG') && DEBUG === false) {
      self::$codeCache =  ROOT . CONTROLLER . 'Cache/files/' . md5(substr($_SERVER['REQUEST_URI'], 0, 15)) . '.php';
      if (
file_exists(self::$codeCache))
        require_once(
self::$codeCache);
    }

This is fairly simple: by looking at the location in the URL, we make a guess at what classes will need to be included. The same location usually means we'll be using roughly the same classes. It does not matter if we miss a few or having a couple too many: loading a few extra classes in a single file is still faster than loading only the required ones later and if we missed a few classes our classLoader will take care of them and make sure they are included the next pageload. Our only concern then are lambda-classes and those you can only make with a custom classLoader, so we've got that covered as well.

Add new classes when required
This is also fairly simple: just load the contents of each class file after the classloader encounters it and store it in the already registered code-cache location:

Code (php) (nieuw venster):
1
2
3
4
5
6
7
8
9
10
11
public static function addFile ($class, $path) {
  if (!
self::$pageCode && file_exists(self::$codeCache))
    self::$pageCode = file_get_contents(self::$codeCache);
  
  if (
class_exists($class))
    return;


  $code = php_strip_whitespace(ROOT . $path);
  self::$modified = true;
  self::$pageCode = $code . self::$pageCode;
}


This loads a string of the currently loaded code (if it was not loaded already), updates the modified check so we know to store a new definition of our classCache later, and prepends the contents of the new class to our existing string of class-code. Looking at the php_strip_whitespace method we see it does exactly what we need: strip excess whitespace (linebreaks, tabs, etc) as well as comments. In other words, all the stuff we do not need is removed while leaving everything else intact! Calling this function has a price of course, but it can significantly reduce the filesize of the cached code and is only called when classes are modified / added - which should be a rare occasion on production environments.

Update the compiled classfile
After we are done, we need to update the cached code with these new or updated classes, but only if we actually updated them (so, if self::$modified is true). A logical place to do this is in the Cache destructor:
Code (php) (nieuw venster):
1
2
3
4
5
6
  public function __destruct () {
    if (
self::$modified && defined('DEBUG') && DEBUG === false) {
      file_put_contents(self::$codeCache, self::$pageCode);
      self::$provider -> destroy();
    }
  }

If the Cache was modified and more importantly, we are not debugging, store the contents of our self::$codeCache string in the code-cache location defined all the way at the start of our request. This ensures that soon enough all classes we needed for a specific request are stored in a single file with their whitespace and comments stripped. For a simple page, this can shave off about 25% parsetime as our profiler demonstrated. By reducing the number of file access requests our website will scale a lot better, especially when more complex pages require dozens of included files.

Of course, this was not the only measure taken to increase page speed: caching common data helped at least as much. More on that next time, when I will also discuss the differences between using a file and memcache as caching provider.

FragFrog out!

Mar 29, 2011 FragFrog

As someone recently pointed out to me: much the same effect can be achieved through the APC extension. The DLL can be found here.

A simple dump of apc_cache_info() will tell you whether it is working - or of course checking your profiler will.

New comment

Your name:
Comment: