True stories from the devs and ops at LinkORB

True stories from the devs and ops at LinkORB

If you and your team are in charge of keeping a large service running, then you may recognise yourself in the following situations. Enjoy!

Monday morning for devops

First customer demo

Sales guy tells devs what he agreed with the new client

Switching to a new datacenter while in production

Handling support tickets after launching a new product

Talking to the sysadmin of your new client

QA giving ‘feedback’ on your new feature

Emergency patch required

QA’s first look at your new plugin

Merging that gazillion-line pull-request

Customers after applying a performance boost

Demo’ing your new feature without testing

Sysadmin of new client insists on using IE6

Waiting for your cool new feature to get deployed

After a night in the datacenter

Reading O’Reilly “Mastering Regular Expressions”

Getting back to that code you wrote last year

Sales guy to devs right before big demo

Multi Master MySQL in practice

Fixing CSS on IE

Ok, devs, we have visitors today.. pretend to be normal

When somebody misplaces a bracket

Flipping the switch on that feature-flag

What the team sees when you introduce Git

Reviewing that mega pull request

Reading the new requirements document

Somebody commits a ‘syntax error’

This fix is going to be easy, let me show you

Messing up a server, but having puppet

Finding the best value for MaxClients

Watching somebody setup MMM

After fixing that cluster wide outage

Deleting a branch — the wrong one

Beta users after the first day

Someone suggests switching to NoSQL-all-the-way

Real users interact with your new feature

Sticking to coding conventions — at 3am

Almost dropping the wrong production database

Our servers on monday morning 9am

Server maintenance during primetime

After clearing your inbox

Your feature got canned

Hand tweaking a production database during primetime

Here, let me fix that in production…

Bug reports after a new release…

Trying something new in CSS

Checking network traffic on the load-balancer

Visiting the guy that wrote the build scripts

Crazy shell script works exactly as planned

Users during outage

Somebody decided to switch to cygwin

Importing the production database

Hey, I know that caller-id

Enjoying dinner while on pager-duty

You sold them what?

Checking Facebook wall of competitor after deploying your new release

Ops working on the firewall

Waiting for devs to fix the intranet

Releasing feature before your competitor does

User agreed to join ‘the beta-group’

User-error: Replace user

Junior dev talking to senior dev

What happened to the guy that wrote THIS?

Client insists on using windows mobile

Restart that server — oh no, the other one

Only 148 more clients to upgrade

Hi customer, we call this ‘the manual’

Getting your pull-request declined

Trying to focus on that urgent bug

When a Sales Droid wants to “work from the NOC”.

Reviewing code from “the new guy”

Can’t we just increase max execution time?

Product launch — almost there

When a user likes your new feature

Joining product demo of competitor

When ops team comes in for emergency

Scaling User-Data with Object Storage in PHP

Scaling User-Data with Object Storage in PHP

Does your app allow your users to upload files, and are you running out of storage? We were there… read on to see how we solved it, and learn about the open source libraries we’ve created that you can use to solve this problem in your apps.

Introduction

At LinkORB we are are hosting hundreds of installations of various web-based products. These products often allow our end-users to upload arbitrary files to attach to their contacts, products, projects, etc.

Early 2015 we were quickly starting to run out of storage space and we could not properly scale it up any further due to our current storage architecture. This led us to investigate new storage solutions for large amounts of user-data, and initiate a migration project.

Previous architecture – the problem

Our PHP apps allows our users to upload their files in various parts of the application. Newly uploaded files were stored on a shared NFS server. This NFS server was mounted from all of our application servers, so they could write new files to the NFS share, and read files when the user wanted to download these files.

We were using a simple directory and file naming-convention to easily find these files:

/nfs-share/$account/$tablename/$recordkey/$filename

All of our installation have a unique account-name, so this is part of the path. Then we’d create a directory for each table in our database, and a subdirectory for each primary key. In this directory we’d drop any files that the user uploaded to those records.

The benefits:

This simple solution allowed us to add upload boxes to various parts in the application, which is nice. Retrieving a list of available files was as simple as listing all files in that directory, based on account, tablename and record key.



The problems:

This simple architecture did come with a set of problems, which ran us into problems eventually.

  1. Mounting an NFS server from multiple servers works, but isn’t a great solution. Frequently running file listings from all of your servers on most of the requests will cause performance problems.

  2. A central NFS server adds a massive single-point-of-failure (SPOF). If the NFS server crashes/burns/etc (it happens), all of the application servers will get stuck.. they can no longer list the files, and are forced to wait for a time-out.

  3. But most importantly: At some point you’ll run out of storage. You are limited to the maximum storage your single NFS server can provide. Disk-space doesn’t come cheap at most VPS providers. Users keep uploading files, and expect their old files to remain accessible ‘forever’.

So the disk usage keeps growing and growing. If you’d like to scale this, you’d have to add another NFS server, find a way to partition the data, and add even more mounts from your application server, adding multiple points of failure (MPOF?).

New architecture – the solution

To get out of this situation, we came up with a completely new storage architecture.

Requirements:

  • Scale ‘indefinitely’
  • Fast file listings
  • High availability

As we are expecting the user data to keep growing, the new solution needed to be able to scale nicely and horizontally.

As our users request file listings on nearly all of our requests, the file listings needed to be super fast.

The new solution would need to be designed for failure, not locking up all of our application servers when the storage backend becomes unavailable. It would need to gracefully explain to the user that the storage is currently unavailable, but keep other functionality working fine as before.

Object Storage

To fulfill most of these requirements, we investigated Object Storage. This storage architecture moves away from files and directories. Instead it simply uses keys and data.

When storing file-data, you write to a ‘key’. To read the filedata back, you read the key. Simple as that.

Some examples of services you could use:

  • AWS S3: A great cloud storage solution by Amazon. Storage is cheap.
  • GridFS: A storage solution built on top of MongoDB. You can host this yourself.

We’ve written a PHP library to interface with these different backends, which is available on GitHub: https://github.com/linkorb/objectstorage



Naming convention for keys

To make this work, you need a naming convention for your keys. Based on our previous architecture, we could have chosen the following:

$account.$tablename.$recordkey.$filename

This way you can make sure to always save to a unique key for your user data.

It does come with a set of problems:

  • If you want to retrieve a list of files for a given account + tablename + recordkey, you’d have to scan all keys and keep only the ones with the matching prefix. This is ok for a small number of files, but doesn’t work if you have thousands of files to manage. Especially when using remote services like S3, listing files becomes unacceptably slow.
  • Adding dots and other characters in the filename becomes cumbersome.
  • If multiple users upload the same file in multiple places, the storage cost is doubled.

Splitting storage and metadata: FileSpace

In order to solve this, we decided to split the file meta data, and actual storage backend.

We’re now storing the data in object storage (s3, gridfs, etc) and keep file listings close to our application servers in a mysql database.

We’ve written a PHP library for this called File Space. It integrates with our objectstorage library described before. You can find it here on github: https://github.com/linkorb/filespace

FileSpace allows you to create a ‘space’ where users can upload files. In our case, a space would be a contact in the crm, or a product in the product catalog. Each space has a key. For example:

$accountuuid.$tablename.$recordkey

These are stored in the filespace table. The filespace table holds the space_key, and some meta-data about when it was created or deleted, etc.

Inside that space, you can upload files. These files are stored in the filespace_file table. A filespace_file holds the following properties:

  • space_key: the space in which this file was uploaded
  • file_key: a unique identifier in that space for this file. Usually this is simply the filename.
  • created_at: timestamp where this file was created
  • deleted_at: timestamp where this file was deleted (if applicable)
  • data_hash: a hash of the file contents

This allows you to quickly retrieve a list of files for a given “space”.



The data_hash column is an interesting one. When a user uploads a file, we calculate a hash based on the contents of the uploaded file. This hash is stored in the filespace_file record. But more importantly: this hash is also used as the key to upload the file into object storage.

This means that all keys in our object storage backend are hashes. No filenames, directories, etc.

We always retrieve files from object storage bases on the hash value in the file record. This comes with the added benefit of storage savings when multiple users upload the same file in multiple locations. They will all calculate to the same hash, and no duplication will happen. When saving your files to external services like AWS S3, these space savings can result in significant cost savings.

Also note that it is perfectly possible for the filespace_file table to contain two records with the same space_key and file_key. This is intentional. It offers you the possibility of versioning files.

When our users upload a new file with the same filename, the old record remains in our database, and the data remains in object storage. So in order for us to retrieve previous versions, we simply query the filespace_file table for all records of a give space_key and file_key. It even includes stamps of when these versions were created.

Compression

To further maximize cost savings, all files are transparently compressed using Bzip2 (compression level 9) before they are uploaded to the object storage backend. This is done by wrapping our storage adapter in a compression adapter. This way any client code doesn’t need to know about the compression layer, but it will work transparently.

Encryption

If you plan on storing user data ‘in the cloud’, it’s good practice to encrypt the data before uploading it to the third-party. Don’t rely on their server-side encryption. To do this we added an encryption adapter to our object storage library. It works in the same way as the compression adapter, by wrapping the ‘original’ adapter, keeping everything working transparently as before for your client code.

The encryption is performed by openssl, and uses the AES CBC algorithm. Using this standard allows you to decrypt the files with tools other than our object storage library. You can use standard openssl commands.

Conclusion

By splitting the problem into 2 distinct parts (storage and meta-data), we managed to keep the solution simple, while fulfilling all of the requirements. The new solution is super fast and will scale indefinitely. High availability is provided by standard mysql replication.

Both our objectstorage and filespace projects are available on GitHub and Packagist as open source libraries that you can use immediately in your own applications. We hope it will benefit you as much as it does for us today.

As always, we’d be happy to work with you on pull-requests. It’s easy to add support for other backends by implementing the simple adapter interface.

If you’re interested in working on projects like this, be sure to check out our engineering website… we’re hiring!

Cheers,
Joost Faassen // LinkORB Engineering

Releasing the LinkORB\Buckaroo PHP library

Releasing the LinkORB\Buckaroo PHP library

Working on integrating online payments into your new web-application? Check out our new library for connecting your services to Buckaroo!

The library is fully PSR-0 compliant, simplifying integration into your own aplication or framework.

This library is now part of the standard SDK as distributed by Buckaroo!

Check out the sources, and quick manual on Github:

It’s available on packagist too:

Enjoy, and feel free to send us your pull-requests!
Team LinkORB

Composer, Capistrano and Github at scale: beware of the stampede

Composer, Capistrano and Github at scale: beware of the stampede

Composer is great, Github is great, Capistrano is great, Hubot powering Capistrano deploys is even better. At LinkORB we’ve embraced all of these technologies to help us deploy into production 20 to 30 times a day.

The problem: Beware of the stampede

When combining these excellent tools, and using them on a large amount of production servers, you have to realise what happens:

  • Capistrano performs a git clone from Github.com on many production servers in parallel.
  • When you deploy often, this happens multiple times a day, often multiple times an hour.
  • If you add Composer to the mix, your build process runs a ‘composer install’, running a new git clone for each of the external libraries you’ve added to your composer.json dependencies.

So, to calculate your git clone requests to Github per day:

Deploys per day X production servers X dependencies in composer.json + 1

In our case, this number started to surpass 3000 clone requests per day. This involved repositories of various sizes, some containing 10+ years of history and code bases of ridiculous sizes.

Bam! We’ve hit a wall

Turns out, the smart Octo-cats at Github are doing an excellent job protecting themselves and their many users, by throwing up some limits. And they are right!

The exact numbers seem to differ over time, but the amount of api requests (done through composer) and git clone operations per ip-address are clearly limited.

Solutions

We are using the following solutions to solve this problem;

Use a Github oauth key in composer

The Github API request limits of anonymous users vs authenticated users are significantly different!

Starting from October 14th 2012, Github started limiting the requests for unauthenticated users to 60 per hour. Using the formula above, you’ll see you hit that limit quite quickly in production.

However, they offer an option to authenticate using oauth, raising this limit to 5000 requests per second!

Composer supports this. To use this solution, do the following:

Request an oauth key:

Run the following command to retrieve an oauth key:

curl -u 'yourusername' -d '{"note":"Your app name"}'
https://api.github.com/authorizations

This command will ask you to enter your github password. On successfull authentication, you will be presented with output similar to the following:

{
  "created_at": "2013-04-18T14:43:54Z",
  "app": {
    "url": "http://developer.github.com/v3/oauth/#oauth-authorizations-api",
    "name": "Your app name"
  },
  "url": "https://api.github.com/authorizations/1047183",
  "token": "fea34f253d3cbd543d3d2367325dec362dec9262",
  "updated_at": "2013-04-18T14:43:54Z",
  "scopes": [
  ],
  "note": "Your app name",
  "id": 9871232,
  "note_url": null
}

You are primarily interested in the value of the “token” attribute.

Next, create a file called ~/.composer/config.php and add the following content:

{
   "config":{
      "github-oauth": {
         "github.com":"my_awesome_token"
      }
   }
}

Make sure you create this file on all of your production servers, perhaps using something like puppet, chef, mcollective or dsh.

Next time your servers request composer to perform an update or install, it’ll authenticate to Github using these credentials, and therefore increasing your daily and hourly request limits by a lot!

You can test your oauth token, and the remaining requests, using the following command:

curl -I -H "Authorization: token my_oauth_token" https://api.github.com/repos/my_github_username/
my_repo_name/zipball/master

This requests will return the following information:

HTTP/1.1 302 Found
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 3839
Location: https://nodeload.github.com/my_github_username/
my_repo_name/legacy.zip/master

Download distributions vs running a full git clone

Composer usually runs a git clone operation for each package your project depends on in composer.json

Cloning a repo is a pretty heavy process, as it not only downloads the current version of the project files, but also the entire history, and any version that ever existed since the git repo got initialized. On large projects this may be an enormous amount of files/blobs/trees. Note that you don’t need any of this historical information, just the most current versions.

Alternatively, you can instruct composer not to run a git clone, but instead download a pre-packaged archive that github prepares for you. The archive will just include the current version, no history, and therefor downloads much quicker, and is much lighter on Github’s servers…

To use this method, simply apped --prefer-dist to your composer install command, like this:

php composer.phar install --prefer-dist

Any dependencies will now be pulled in through minimal tarballs, bypassing the cloning process.

Conclusion

Using both techniques described above, we’ve managed to keep using our fancy Capistrano + Composer + Github power-tools, and run it many times a day, deploying new features onto our production servers.

We hope this will help you scale your services like it did for us!

Until next time,
Joost Faassen // Team LinkORB