At LinkORB we are are hosting hundreds of installations of various web-based products. These products often allow our end-users to upload arbitrary files to attach to their contacts, products, projects, etc.
Early 2015 we were quickly starting to run out of storage space and we could not properly scale it up any further due to our current storage architecture. This led us to investigate new storage solutions for large amounts of user-data, and initiate a migration project.
Previous architecture - the problem
Our PHP apps allows our users to upload their files in various parts of the application. Newly uploaded files were stored on a shared NFS server. This NFS server was mounted from all of our application servers, so they could write new files to the NFS share, and read files when the user wanted to download these files.
We were using a simple directory and file naming-convention to easily find these files:
All of our installation have a unique account-name, so this is part of the path.
Then we’d create a directory for each table in our database, and a subdirectory for each primary key. In this directory we’d drop any files that the user uploaded to those records.
This simple solution allowed us to add upload boxes to various parts in the application, which is nice. Retrieving a list of available files was as simple as listing all files in that directory, based on account, tablename and record key.
This simple architecture did come with a set of problems, which ran us into problems eventually.
Mounting an NFS server from multiple servers works, but isn’t a great solution. Frequently running file listings from all of your servers on most of the requests will cause performance problems.
A central NFS server adds a massive single-point-of-failure (SPOF). If the NFS server crashes/burns/etc (it happens), all of the application servers will get stuck.. they can no longer list the files, and are forced to wait for a time-out.
But most importantly: At some point you’ll run out of storage. You are limited to the maximum storage your single NFS server can provide. Disk-space doesn’t come cheap at most VPS providers. Users keep uploading files, and expect their old files to remain accessible ‘forever’.
So the disk usage keeps growing and growing. If you’d like to scale this, you’d have to add another NFS server, find a way to partition the data, and add even more mounts from your application server, adding multiple points of failure (MPOF?).
New architecture - the solution
To get out of this situation, we came up with a completely new storage architecture.
- Scale ‘indefinitely’
- Fast file listings
- High availability
As we are expecting the user data to keep growing, the new solution needed to be able to scale nicely and horizontally.
As our users request file listings on nearly all of our requests, the file listings needed to be super fast.
The new solution would need to be designed for failure, not locking up all of our application servers when the storage backend becomes unavailable. It would need to gracefully explain to the user that the storage is currently unavailable, but keep other functionality working fine as before.
To fulfill most of these requirements, we investigated Object Storage. This storage architecture moves away from files and directories. Instead it simply uses keys and data.
When storing file-data, you write to a ‘key’. To read the filedata back, you read the key. Simple as that.
Some examples of services you could use:
- AWS S3: A great cloud storage solution by Amazon. Storage is cheap.
- GridFS: A storage solution built on top of MongoDB. You can host this yourself.
We’ve written a PHP library to interface with these different backends, which is available on GitHub: https://github.com/linkorb/objectstorage
Naming convention for keys
To make this work, you need a naming convention for your keys. Based on our previous architecture, we could have chosen the following:
This way you can make sure to always save to a unique key for your user data.
It does come with a set of problems:
- If you want to retrieve a list of files for a given account + tablename + recordkey, you’d have to scan all keys and keep only the ones with the matching prefix. This is ok for a small number of files, but doesn’t work if you have thousands of files to manage. Especially when using remote services like S3, listing files becomes unacceptably slow.
- Adding dots and other characters in the filename becomes cumbersome.
- If multiple users upload the same file in multiple places, the storage cost is doubled.
Splitting storage and metadata: FileSpace
In order to solve this, we decided to split the file meta data, and actual storage backend.
We’re now storing the data in object storage (s3, gridfs, etc) and keep file listings close to our application servers in a mysql database.
We’ve written a PHP library for this called
File Space. It integrates with our objectstorage library described before. You can find it here on github: https://github.com/linkorb/filespace
FileSpace allows you to create a ‘space’ where users can upload files. In our case, a space would be a contact in the crm, or a product in the product catalog. Each space has a key. For example:
These are stored in the
filespace table. The filespace table holds the space_key, and some meta-data about when it was created or deleted, etc.
Inside that space, you can upload files. These files are stored in the
filespace_file holds the following properties:
space_key: the space in which this file was uploaded
file_key: a unique identifier in that space for this file. Usually this is simply the filename.
created_at: timestamp where this file was created
deleted_at: timestamp where this file was deleted (if applicable)
data_hash: a hash of the file contents
This allows you to quickly retrieve a list of files for a given “space”.
data_hash column is an interesting one. When a user uploads a file, we calculate a hash based on the contents of the uploaded file. This hash is stored in the filespace_file record. But more importantly: this hash is also used as the key to upload the file into object storage.
This means that all keys in our object storage backend are hashes. No filenames, directories, etc.
We always retrieve files from object storage bases on the hash value in the file record.
This comes with the added benefit of storage savings when multiple users upload the same file in multiple locations. They will all calculate to the same hash, and no duplication will happen. When saving your files to external services like AWS S3, these space savings can result in significant cost savings.
Also note that it is perfectly possible for the
filespace_file table to contain two records with the same
file_key. This is intentional. It offers you the possibility of versioning files.
When our users upload a new file with the same filename, the old record remains in our database, and the data remains in object storage. So in order for us to retrieve previous versions, we simply query the filespacefile table for all records of a give spacekey and file_key. It even includes stamps of when these versions were created.
To further maximize cost savings, all files are transparently compressed using Bzip2 (compression level 9) before they are uploaded to the object storage backend. This is done by wrapping our storage adapter in a compression adapter. This way any client code doesn’t need to know about the compression layer, but it will work transparently.
If you plan on storing user data ‘in the cloud’, it’s good practice to encrypt the data before uploading it to the third-party. Don’t rely on their server-side encryption. To do this we added an encryption adapter to our object storage library. It works in the same way as the compression adapter, by wrapping the ‘original’ adapter, keeping everything working transparently as before for your client code.
The encryption is performed by openssl, and uses the AES CBC algorithm. Using this standard allows you to decrypt the files with tools other than our object storage library. You can use standard openssl commands.
By splitting the problem into 2 distinct parts (storage and meta-data), we managed to keep the solution simple, while fulfilling all of the requirements. The new solution is super fast and will scale indefinitely. High availability is provided by standard mysql replication.
Both our objectstorage and filespace projects are available on GitHub and Packagist as open source libraries that you can use immediately in your own applications. We hope it will benefit you as much as it does for us today.
As always, we’d be happy to work with you on pull-requests. It’s easy to add support for other backends by implementing the simple adapter interface.
If you’re interested in working on projects like this, be sure to check out our engineering website... we’re hiring!
Joost Faassen // LinkORB Engineering