Composer, Capistrano and Github at scale: beware of the stampede

Composer is great, Github is great, Capistrano is great, Hubot powering Capistrano deploys is even better. At LinkORB we've embraced all of these technologies to help us deploy into production 20 to 30 times a day.

The problem: Beware of the stampede

When combining these excellent tools, and using them on a large amount of production servers, you have to realise what happens:

  • Capistrano performs a git clone from Github.com on many production servers in parallel.
  • When you deploy often, this happens multiple times a day, often multiple times an hour.
  • If you add Composer to the mix, your build process runs a 'composer install', running a new git clone for each of the external libraries you've added to your composer.json dependencies.

So, to calculate your git clone requests to Github per day:

Deploys per day X production servers X dependencies in composer.json + 1

In our case, this number started to surpass 3000 clone requests per day. This involved repositories of various sizes, some containing 10+ years of history and code bases of ridiculous sizes.

Bam! We've hit a wall

Turns out, the smart Octo-cats at Github are doing an excellent job protecting themselves and their many users, by throwing up some limits. And they are right!

The exact numbers seem to differ over time, but the amount of api requests (done through composer) and git clone operations per ip-address are clearly limited.

Solutions

We are using the following solutions to solve this problem;

Use a Github oauth key in composer

The Github API request limits of anonymous users vs authenticated users are significantly different!

Starting from October 14th 2012, Github started limiting the requests for unauthenticated users to 60 per hour. Using the formula above, you'll see you hit that limit quite quickly in production.

However, they offer an option to authenticate using oauth, raising this limit to 5000 requests per second!

Composer supports this. To use this solution, do the following:

Request an oauth key:

Run the following command to retrieve an oauth key:

curl -u 'yourusername' -d '{"note":"Your app name"}' https://api.github.com/authorizations

This command will ask you to enter your github password. On successfull authentication, you will be presented with output similar to the following:

{
  "created\_at": "2013-04-18T14:43:54Z",
  "app": {
    "url": "http://developer.github.com/v3/oauth/#oauth-authorizations-api",
    "name": "Your app name"
  },
  "url": "https://api.github.com/authorizations/1047183",
  "token": "fea34f253d3cbd543d3d2367325dec362dec9262",
  "updated\_at": "2013-04-18T14:43:54Z",
  "scopes": [
  ],
  "note": "Your app name",
  "id": 9871232,
  "note_url": null
}

You are primarily interested in the value of the "token" attribute.

Next, create a file called ~/.composer/config.php and add the following content:

{
   "config":{
      "github-oauth": {
         "github.com":"my\_awesome\_token"
      }
   }
}

Make sure you create this file on all of your production servers, perhaps using something like puppet, chef, mcollective or dsh.

Next time your servers request composer to perform an update or install, it'll authenticate to Github using these credentials, and therefore increasing your daily and hourly request limits by a lot!

You can test your oauth token, and the remaining requests, using the following command:

curl -I -H "Authorization: token my_oauth_token" https://api.github.com/repos/my_github_username/ my_repo_name/zipball/master

This requests will return the following information:

HTTP/1.1 302 Found
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 3839
Location: https://nodeload.github.com/my_github_username/
my_repo_name/legacy.zip/master

Download distributions vs running a full git clone

Composer usually runs a git clone operation for each package your project depends on in composer.json

Cloning a repo is a pretty heavy process, as it not only downloads the current version of the project files, but also the entire history, and any version that ever existed since the git repo got initialized. On large projects this may be an enormous amount of files/blobs/trees. Note that you don't need any of this historical information, just the most current versions.

Alternatively, you can instruct composer not to run a git clone, but instead download a pre-packaged archive that github prepares for you. The archive will just include the current version, no history, and therefor downloads much quicker, and is much lighter on Github's servers...

To use this method, simply apped --prefer-dist to your composer install command, like this:

php composer.phar install --prefer-dist

Any dependencies will now be pulled in through minimal tarballs, bypassing the cloning process.

Conclusion

Using both techniques described above, we've managed to keep using our fancy Capistrano + Composer + Github power-tools, and run it many times a day, deploying new features onto our production servers.

We hope this will help you scale your services like it did for us!

Until next time,
Joost Faassen // Team LinkORB


Follow us

Code with us!

We're working on many cool small and larger projects, and we'd be happy for you to code with us on one or more of these.

Let's code

What is it like to work at LinkORB?

To get a bit of insights on how we work and what technologies we use, click here.

Want to know more?

Don't hesitate to reach out to us for more information. We're looking forward to meeting you!