Progressive Spatial Networks

I’ve been pretty silent the last few months here on my blog. I’ve been pretty busy with things like settling into our new house and starting full-time at Kynetx. A major sap on my free time both recently and for the last several years has been my Master’s Thesis. I’ve graduated now, and finally carved out some time to update my blog.

In my efforts in resuming regular blogging, I find it appropriate to post my Thesis for all the world. First, a little backstory.

For my thesis work, I worked on an algorithm to combine GPS tracklogs into what I call a spatial network. I chose this work because of my experience building ActiveTrails.com. As any excited graduate student (pre-thesis student, that is) I had grand ideas about what I was going to accomplish with my thesis work. Luckily for me, my graduate advisor guided me properly through the process, and I finally completed my work.

I do find it strange that only a written Thesis is required for an MS in Computer Science. I’ve decided that it only makes sense to post my code, that others might be able to experiment with my work without having to rewrite it from scratch. Now, I’m sure I’ve made plenty of mistakes in my code, and I hope that others can produce much better results then I, and not fall into the same lines of thinking that perhaps restricted my results.

I originally had plans to organize my code, clean it up, flush it full of comments, and organize my result files. And then I realized it might never happen. I’ve packaged my code, source files, and results into a zip file, and though it isn’t perfectly clean, I hope it’s useful for those who want to use it.

Progressive Spatial Networks: Learning from GPS Tracklogs (pdf link)

Source data, python source code, and result files (zip file).

Cloud Computing - The 5th Utility

Cloud Computing

Image by stan via Flickr

This post is part of a series of posts relating to distributed system design that I'm completing as part of my Computer Science MS program at BYU.

My 6th paper was written by some good folks at The University of Melbourne, Australia. They discuss the emerging cloud computing paradigm as the 5th Utility, and compare it with both clusters and grids. The paper (PDF) argues that clusters and grids cannot be considered a utility by themselves, but cloud computing fits the necessary requirements.

Clusters are groups of machines that work together to accomplish a single task, such as serve web content. Each machine is the same, and they all perform the same task. Clusters can scale in size to handle varying loads. Grids support multiple jobs of different characteristics, typically within a required framework. Clouds can scale (like clusters), and support a wide variety of jobs simultaneously. Clouds take scaling to such an extreme that they can scale to nothing, which neither clusters or grids support. This minimal commitment, without minimum usage levels, is what makes clouds so useful. Just as water, electricity, gas, and telephony (the first 4 utilities) can scale from nothing to very high usage, clouds can scale to any reasonable load.

As I’ve mentioned in several of my other paper reviews, it is very clear that cloud computing exists in a layer underneath clusters and grids. Clusters and grids can be built on top of cloud computing systems, as cloud systems utilize virtual machines as a hardware abstraction. The unique piece that makes it possible is the dynamic provisioning made available through the API. While some hosting providers can provision servers in only a few hours, cloud providers provision their resources within minutes, and sometimes within seconds.

At this point, there are few cloud providers, and each has their own API, terms of use, and types of services. As more providers enter the market, a consistent interface will be needed in order to tame the API chaos. While some services may end up sharing an API, consistency can also be provided via a meta-interface that can translate the users commands into whatever syntax is required by the particular provider and service being utilized. This layer can either be constructed in the cloud itself, in client based toolkits, or as a combination of the two. The concept of a metalayer is demonstrated in the paper through the creation of a meta-storage service, capable of storing data in several cloud services through a single API.

Just as interfaces are not likely to completely converge, the properties of each service are also not likely to be identical between providers. Speed, price, reliability, and other factors will vary, allowing users to select the proper service to fit their particular need. Some services will provide an SLA, providing service guarantees.

It is still very early in the development of cloud services, and I’m sure that we will be seeing new entries for years to come. Amazon has hinted at some of the services that will be made available in this next calendar year, including load balancing and monitoring and automation management. As we see more entries in the space, it will become easier to understand the strengths and weaknesses of cloud computing, as well as define it’s limits.

I’m excited for the expansion of cloud computing, and I look forward to more studies that can help us understand better.

Planning Ahead: Resilient Load Variations in Distributed Stream Processing

This post is part of a series of posts relating to distributed system design that I'm completing as part of my Computer Science MS program at BYU.

Background

Stream processing systems are used to process data and provide low latency data derived from source streams. Good examples of this processing include stock market updates, highway traffic data, and network traffic analysis. Stream processing systems take a stream of data and run it through a series of operations. Most of these operations perform data conversions or calculations/reductions of the source data. The final result is either a new data stream or an updated data set.

Small stream processing operations usually operate on a single machine. When the load grows beyond the power of the host machine, the stream processing operations must be distributed between several machines. Adding processing power enables the system to maintain low-latency processing. This latency is the time that data takes to travel from system input to system output. The task of dividing the processing operations onto multiple machines is difficult, particularly in situations with variable load.

Early distributed systems were managed manually, and changes to the operator spread were difficult and often required downtime. Research into dynamic operator distribution allowed systems to move operators between machines to evenly distribute the processing load in response to changing load.

Research

In a 2006 paper titled "Providing Resiliency to Load Variations in Distributed Stream Processing" (pdf), Ying Xing et. al. describe a new approach to handling variable load.

They observe that the cost of moving operators in a dynamic distribution system is very high. They outline a process of choosing an initial operator distribution such that the range of manageable load scenarios is maximized. This reduces the need to migrate operations between machines in a system. By maximizing the range of load that a system is capable of handling, they can reduce and possibly eliminate the need to support dynamic operator migration.

First, they begin by modeling the operators present in the distributed processing system, and then evaluate different methods of choosing the optimal distribution. In addition to reporting the optimal distribution, their method reports on the range of load that they system can handle without operator migration.

While their work is designed to reduce the need to migrate operators, they explain that their algorithms cooperate well with dynamic systems. By choosing an initial distribution resilient to load variation, they reduce the need to perform expensive operator migrations. When a migration is needed, these algorithms can be used to recommend the distribution most resilient to the new expected loads.

Response

This paper, along with the previous work described, operates under the assumption that adding hardware to the cluster is a time consuming operation. This assumption leads to the static thinking that the only operation available is to move operations from one existing node to another.

Cloud computing allows spinning up hardware in just a few minutes to help in handling variation in system load, and then the ability to shut down the systems when load has decreased to reduce cluster costs. With the concepts in this paper, the system can decide when additional hardware is needed, and choose the best set of operators to migrate to the new hardware to maintain desired latency levels.

As cloud computing is a fairly recent development, I expect a few years to pass before solid work is published describing cloud friendly methods for load distribution.

The Power of Epidemics

This post is part of a series of posts relating to distributed system design that I'm completing as part of my Computer Science MS program at BYU.

As systems grow, the task of communication between nodes in the system can become a herculean task. Frequently, the limitations of the communication protocol serve as the upper bound of system size despite plentiful system resources. Many methods exist for allowing communication to scale with system size, but most succeed only in raising the threshold for system size.

Epidemic communication protocols, also known as gossip protocols, approach the communication problem from a completely different angle. I read about these concepts in a 2002 paper entitled The Power of Epidemics: Robust Communication for Large-Scale Distributed Systems [pdf] by Werner Vogels, Tobbert van Renesse, and Ken Birman. Whereas many other protocols follow rigid and organized communication patterns, Epidemic protocols rely on random communication between nodes to achieve rapid and robust information spread.

Naming these protocols after patterns of disease spread and the gossip patterns of human social groups is not an accident. These protocols have the advantage of being very simple because of our familiarity with these situations in society. Just as disease spreads through a population through random contact between two society members, information spreads between two nodes as each node chooses another at random with which to compare state. Gossip protocols rely on frequent communication between nodes to spread new information. Putting these two concepts produces the system described by Vogels et. al.

Each node selects another node from it's stored membership list once per second. The two nodes transfer a representation of state quickly, and upon discovery of a mismatch send updates to the other party. When a single node becomes aware of new information, it will share that information within one second to another node. The two copies become 4 copies, then 8, 16, 32, and so forth for each second passed.  There are three important properties of these systems: speed and reliability, and constant load

Speed

In my previous example of spread, I assumed a doubling of information for each time period. This is in fact low, as on average each node will be in contact with two other nodes per time period, causing the number of informed nodes to grow by a factor of 4. One copy becomes 4 copies becomes 16, 64, 256, and so on. This growth can spread information even throughout very large systems in a very short time. Assuming our average of two gossip connections per second, the theoretic message spread is 1 million nodes after 10 seconds. For a system of 250 thousand nodes, the chance of a node NOT receiving the information update within 10 seconds becomes vanishingly small

Reliability

The paper introduces the concept of probabilistic guarantees. While it sounds counter-intuitive, considering the probability that a node will NOT have received an update within a period of time isn't insane. The probability is so low, given even only a few seconds, that it exceeds the probability of system and network failures.

Constant Load

The really cool thing about these types of protocols is the constant load required of each machine at any scale. Used in a cluster of 3 nodes, each node talks to (on average) 2 other nodes per time period. Expand that cluster to 250,000 nodes, and each node talks to ...... 2 nodes per time period. Few communication protocols operate under such constant load. Because of the randomness required for node selection, each node may talk to only one node or quite a few during a single period. Measured over a few time periods, the load will appear to be quite constant.

So there you have it. Next time you need to design a scalable communication system, consider giving your system the flu.

Amazon's Dynamo - Highly Available Key Store

This post is part of a series of posts relating to distributed system design that I'm completing as part of my Computer Science MS program at BYU.

Amazon's Dynamo is a key-value storage system used internally at Amazon. It provides high-speed simple storage at extremely high levels of availability. The paper is available online.

In addition to describing the Dynamo system, this paper is an excellent resource for issues relevant to system design. I've selected a few of the topics mentioned in the paper, with some comments on each.

Symmetric Nodes

Dynamo clusters have two basic duties. Each read or write has a coordinating server, and each read and write uses several storage servers. Rather then form Dynamo into a 2 layer system with a layer of coordinators and a layer of storage servers, they formed the servers to perform both the coordination services and storage services. They cite simplified system provisioning and maintenance as their reasons for this choice.

This design does make scaling a little easier, as deploying a node adds resources to both services, but I think it applies so well here because of the predictable levels of work and the constant ratio between each of the services. An increase in requests will increase work for both the coordination services and the storage services, and adding additional servers will support both services to approximately the same degree.

In a system where the processing at each layer is highly variable, this strategy could result in an unbalanced situation, where launching more servers will only support one of the services. As always, considering the characteristics of the service will guide the proper solution.

Dynamically Scaling Under Load

Throughout the paper, they discuss the requirement that the system be capable of adding and removing nodes under load without impacting the external performance of the system. They noted that their first design required significant background processing, and they had to develop systems to carefully monitor and control the processing power consumed by the background services.

This issue is a sneaky one, because serving requests must continue during the process of re-distributing data and processing assignments. They mentioned some changes they made to reduce the background processing required, which allowed them to devote more processing time to serving requests.

Nearly every service will have background processing of some kind, and setting these services to be 'kind' to the performance of requests will help improve performance when the system comes under load.

Eventual Consistency

Dynamo (and several of the public Amazon Web Services) follow an eventually consistent model. By relaxing consistency, they can offer greater levels of availability and speed. Near the end of the paper, they reveal that 99.94% of the read requests had no artifacts of this relaxed requirement.

Their design leaves the process of conflict resolution to the client, which allows a resolution process specific to the nature of the data. Allowing the most recent write to win will work in many situations, and is easy to resolve. Under more specific requirements, the client may want to combine the conflicting data. Fortunately, passing the resolution buck to the client allows each application to resolve this issue in whatever way is appropriate. (CouchDB follows this same principle.)

Eventual consistency forces the consideration of failure; when it fails, how hard is it to recover? Understanding the consequences of failure can allow proper tuning of the system, as well as guide resolution processes.

Smart Clients

The paper describes two methods for distributing requests to the nodes in the cluster: load balancers and smart clients.

Load balancers are a common solution, and are considered standard fare in most systems. They forward inbound requests evenly among nodes, and then return the node's response to the client. Most large websites use load balancers to serve traffic, enabling many servers to act like one.

Smart clients are a powerful tool that can provide the same levels of fail over at increased speeds. If the client is programmable, then it can choose the server to ask itself, and eliminate the need for a load balancer. It chooses it's node according to embedded logic, and will also fail over to a different node if the first one does not respond. The clients must be 'smart' enough to discover cluster nodes and handle failure gracefully. In a load balanced configuration, these tasks are performed by the load balancer.

Smart Dynamo clients poll a random cluster node every 10 seconds to retrieve a list of nodes. The clients will also retrieve a new list if they detect a failure situation.

In many applications of smart clients, an update frequency of 10 seconds is far too fast and can be reduced to match the speed of adding or removing nodes. Smart clients can also choose cluster nodes based on network speed, allowing easy geographic load balancing.

The speed gains measured through the user of smart clients is remarkable. response times were more then 50% shorter as a result of removing the load balancer from the picture.

Thoughts

Each of these concepts, and others in the paper, can be applied to a wide variety of systems. I'm grateful to the Amazon team that published this paper, and I'm grateful for the insight it provides.

smart clients - improved performance by half by removing the load balancer. they polled frequently 10s for server lists, could be much longer. smart polling on failure.

Building a Scalable, Distributed system with Undergraduate Students

This semester is my second as a TA for CS462 - Large Scale Distributed System Design. My graduate advisory, Phil Windley, asked me to be a TA again a month ago or so, and we started talking about things we'd like to do different. We came up with a pretty cool plan.

[More]

Cellular Change

I spent last Friday and Saturday at the FuturTech, a yearly technology conference hosted by the University of Michigan. I was a panelist on two panels, and only had to attend one other panel at the conference. The panel I attended was titled "Fresh Communications, Ubiquitous Connections: The Next Generation of Mobile Services" and was staffed by representatives (read: Senior Managers and Vice Presidents) from Accenture, Motorola, Infosys, and Cisco.

With a panel like that, I had my ears open. I've been frustrated with the lack of growth in cellular services in the States for some time, and I was ready for answers. The answers that I received were very strong from all members of the panel: Change and growth in cellular services will not begin with the incumbents. Change will start with disruptive technologies introduced by new faces in the large carrier space.

They further restated something I've heard many places before: The current stagnation of mobile services is caused by the current practice of carriers subsidizing mobile devices. We have become accustomed to getting our phones for free or at a heavily discounted price. The phones we receive are often disabled before we get them at the request of the carriers, who must protect their device subsidy with extra charges for extra services. Asian countries do not suffer the same problem, and that is precisely why we find amazing services there, but not in the US of A. Also mentioned in the panel, and elsewhere at the conference, was an interest in the current deal between Apple and Cingular over the iPhone. Recent news highlights some of these same politics and corporate muscle at work, and it will be interesting to see what develops as a result of this agreement.

Based on what I've learned, I feel that some new faces in the mobile space will be required to shake up the current mobile model to allow better cheaper services from US mobile carriers. If I have to buy my own handset, then I'm more then willing to do it.

Collaborative Editing

Earlier today I was working on a lab report with my lab partner for an Artificial Intelligence lab, and we needed to work on the document at the same time. We had already started some notes in Google Docs, and we continued to work on the report, each of us signed into our Google accounts on separate computers.

I've used Google Docs since before it was Google Docs, and I love the ability to edit my stuff from anywhere. Working on the same document at the same time with someone else gave me new appreciation for the system, and for collaborative editing in general. Editing our parts separately and then merging them together would have taken much longer, and I don't think the resulting document would have been as good. There were a few glitches, mostly in the form of random scrolling when saving, but they were not difficult to deal with. From now on, I'll be doing all editing of group papers online.

We even used the 'Save as PDF' feature to get a print ready document ready to turn in. If I needed images for more precise formatting, I would have pulled it into OpenOffice for final editing.