Parallelize downloads across hostnames (deprecated) - Implementation Guide
Overview
The general steps for implementing domain sharding on a site would be:
- Create the domains to be used for sharding
- Configure the web server to use these domains and to serve the static content
- If you are using a CDN, they may already have support for domain sharding and the two steps above can be skipped
- Update the pages to point the resource URLs to the shards
For dynamic sites, your CMS or application framework may have some sort of CDN or sharding plugin that will change resource URLs to be distributed among several hosts. If such a solution isn't already available, see the sections below for suggestions on sharding implementations.
For a static site, you don't have much of a choice other than to manually distribute the resources between the shards.
How many domains should I shard?
As a general guideline, 2 domains is a good number of domains to shard. Both Google and Yahoo! recommend using 2 hosts and research from Yahoo! suggests that you shouldn't use more than 4 domains. With modern browsers supporting 6 or more concurrent connections, a high number of shards isn't required.
In the end, the number of shards depends on your site and your users. Choose a number of shards that is optimal with the number of resources that need to be downloaded. Your choice should also fit well with the concurrent connection limit of the browser used by the majority of your users.
How to split resources
Resources should be split evenly amongst the shards to ensure that all the concurrent connections to the shards are fully utilized. However, in doing so, the resources must be served from the same shard on every request so that the resources can be cached by the browser.
One way of achieving the above is to use a hash function. Hash functions are used to map a large number of data into a fixed number of values - in this case, you will be mapping resource URLs to a domain shard. There are many hash function algorithms to choose from, but an easy implementation can use an encryption algorithm such as MD5 or SHA1 and then modulo operation to convert the number down to the number of shards.
Sample code:
Perl using SHA1:sub shard { my ($url, $shards) = @_; require Digest::SHA1; my $sha1 = Digest::SHA1::sha1_hex($url); # SHA1 is a 160-bit result, split it up into 5 32-bit chunks and xor my $sha1_32; for (0 .. 4) { $sha1_32 ^= hex substr($sha1, $_ * 8, 8); } my $shard = $sha1_32 % $shards; return "http://static$shard.gtmetrix.com/$url"; }PHP using MD5:
function shard($url, $shards) { $md5 = md5($url); $md5_32 = 0; for ($i = 0; $i <= 3; $i++) { $md5_32 ^= hexdec(substr($md5, $i * 8, 8)); } $shard = $md5_32 % $shards; return "http://static$shard.gtmetrix.com/$url"; }
Using an encryption algorithm as the hash function is a little more CPU intensive than using a general purpose hash function (such as Jenkins' Hash), but should be negligible on a modern system with the small number of strings to encrypt.
« Back to Parallelize downloads across hostnames (deprecated)