Parallelize downloads across hostnames (deprecated) - Implementation Guide

Overview

The general steps for implementing domain sharding on a site would be:

  • Create the domains to be used for sharding
  • Configure the web server to use these domains and to serve the static content
  • If you are using a CDN, they may already have support for domain sharding and the two steps above can be skipped
  • Update the pages to point the resource URLs to the shards

For dynamic sites, your CMS or application framework may have some sort of CDN or sharding plugin that will change resource URLs to be distributed among several hosts. If such a solution isn’t already available, see the sections below for suggestions on sharding implementations.

For a static site, you don't have much of a choice other than to manually distribute the resources between the shards.

How many domains should I shard?

As a general guideline, 2 domains is a good number of domains to shard. Both Google and Yahoo! recommend using 2 hosts and research from Yahoo! suggests that you shouldn’t use more than 4 domains. With modern browsers supporting 6 or more concurrent connections, a high number of shards isn't required.

In the end, the number of shards depends on your site and your users. Choose a number of shards that is optimal with the number of resources that need to be downloaded. Your choice should also fit well with the concurrent connection limit of the browser used by the majority of your users.

How to split resources

Resources should be split evenly amongst the shards to ensure that all the concurrent connections to the shards are fully utilized. However, in doing so, the resources must be served from the same shard on every request so that the resources can be cached by the browser.

One way of achieving the above is to use a hash function. Hash functions are used to map a large number of data into a fixed number of values - in this case, you will be mapping resource URLs to a domain shard. There are many hash function algorithms to choose from, but an easy implementation can use an encryption algorithm such as MD5 or SHA1 and then modulo operation to convert the number down to the number of shards.

Sample code:

Perl using SHA1:
sub shard {
   my ($url, $shards) = @_;

   require Digest::SHA1;
   my $sha1 = Digest::SHA1::sha1_hex($url);

   # SHA1 is a 160-bit result, split it up into 5 32-bit chunks and xor
   my $sha1_32;
   for (0 .. 4) {
       $sha1_32 ^= hex substr($sha1, $_ * 8, 8);
   }

   my $shard = $sha1_32 % $shards;
   return "http://static$shard.gtmetrix.com/$url";
}
PHP using MD5:
function shard($url, $shards) {
   $md5 = md5($url);

   $md5_32 = 0;
   for ($i = 0; $i <= 3; $i++) {
       $md5_32 ^= hexdec(substr($md5, $i * 8, 8));
   }

   $shard = $md5_32 % $shards;
   return "http://static$shard.gtmetrix.com/$url";
}

Using an encryption algorithm as the hash function is a little more CPU intensive than using a general purpose hash function (such as Jenkins’ Hash), but should be negligible on a modern system with the small number of strings to encrypt.

Browser Concurrent Connections

BrowserCC per hostname (HTTP/1.1)Maximum Conns
IE 6, 7235
IE 8, 9635
Firefox 22 24
Firefox 3, 4630
Safari 3460
Safari 4, 5630
Chrome 1, 2660
Chrome 3460
Chrome 4660
Chrome 5, 6, 7630
Chrome 8, 9, 10635
Opera 9420
Opera 10830
Opera 111664

Source

More Resources

« Back to Parallelize downloads across hostnames (deprecated)