Amazon's Dynamo - Highly Available Key Store

This post is part of a series of posts relating to distributed system design that I'm completing as part of my Computer Science MS program at BYU.

Amazon's Dynamo is a key-value storage system used internally at Amazon. It provides high-speed simple storage at extremely high levels of availability. The paper is available online.

In addition to describing the Dynamo system, this paper is an excellent resource for issues relevant to system design. I've selected a few of the topics mentioned in the paper, with some comments on each.

Symmetric Nodes

Dynamo clusters have two basic duties. Each read or write has a coordinating server, and each read and write uses several storage servers. Rather then form Dynamo into a 2 layer system with a layer of coordinators and a layer of storage servers, they formed the servers to perform both the coordination services and storage services. They cite simplified system provisioning and maintenance as their reasons for this choice.

This design does make scaling a little easier, as deploying a node adds resources to both services, but I think it applies so well here because of the predictable levels of work and the constant ratio between each of the services. An increase in requests will increase work for both the coordination services and the storage services, and adding additional servers will support both services to approximately the same degree.

In a system where the processing at each layer is highly variable, this strategy could result in an unbalanced situation, where launching more servers will only support one of the services. As always, considering the characteristics of the service will guide the proper solution.

Dynamically Scaling Under Load

Throughout the paper, they discuss the requirement that the system be capable of adding and removing nodes under load without impacting the external performance of the system. They noted that their first design required significant background processing, and they had to develop systems to carefully monitor and control the processing power consumed by the background services.

This issue is a sneaky one, because serving requests must continue during the process of re-distributing data and processing assignments. They mentioned some changes they made to reduce the background processing required, which allowed them to devote more processing time to serving requests.

Nearly every service will have background processing of some kind, and setting these services to be 'kind' to the performance of requests will help improve performance when the system comes under load.

Eventual Consistency

Dynamo (and several of the public Amazon Web Services) follow an eventually consistent model. By relaxing consistency, they can offer greater levels of availability and speed. Near the end of the paper, they reveal that 99.94% of the read requests had no artifacts of this relaxed requirement.

Their design leaves the process of conflict resolution to the client, which allows a resolution process specific to the nature of the data. Allowing the most recent write to win will work in many situations, and is easy to resolve. Under more specific requirements, the client may want to combine the conflicting data. Fortunately, passing the resolution buck to the client allows each application to resolve this issue in whatever way is appropriate. (CouchDB follows this same principle.)

Eventual consistency forces the consideration of failure; when it fails, how hard is it to recover? Understanding the consequences of failure can allow proper tuning of the system, as well as guide resolution processes.

Smart Clients

The paper describes two methods for distributing requests to the nodes in the cluster: load balancers and smart clients.

Load balancers are a common solution, and are considered standard fare in most systems. They forward inbound requests evenly among nodes, and then return the node's response to the client. Most large websites use load balancers to serve traffic, enabling many servers to act like one.

Smart clients are a powerful tool that can provide the same levels of fail over at increased speeds. If the client is programmable, then it can choose the server to ask itself, and eliminate the need for a load balancer. It chooses it's node according to embedded logic, and will also fail over to a different node if the first one does not respond. The clients must be 'smart' enough to discover cluster nodes and handle failure gracefully. In a load balanced configuration, these tasks are performed by the load balancer.

Smart Dynamo clients poll a random cluster node every 10 seconds to retrieve a list of nodes. The clients will also retrieve a new list if they detect a failure situation.

In many applications of smart clients, an update frequency of 10 seconds is far too fast and can be reduced to match the speed of adding or removing nodes. Smart clients can also choose cluster nodes based on network speed, allowing easy geographic load balancing.

The speed gains measured through the user of smart clients is remarkable. response times were more then 50% shorter as a result of removing the load balancer from the picture.

Thoughts

Each of these concepts, and others in the paper, can be applied to a wide variety of systems. I'm grateful to the Amazon team that published this paper, and I'm grateful for the insight it provides.

smart clients - improved performance by half by removing the load balancer. they polled frequently 10s for server lists, could be much longer. smart polling on failure.