Amazing Things happening at Kynetx - Come to Impact

Over sushi a few nights ago, I had the opportunity to bring Drummond Reid up to speed with what Kynetx has been doing. After several months in the trenches, it was a revealing experience to climb a tower and see just how far we've come. As I unfolded item after item, I was surpised to find another thing to describe.

At our last Impact Conference, we unfolded the Kynetx Engine, and demonstrated some ways to use Kynetx in your quest to do amazing things.

At this next Impact Conference, we will deliver an update on the progress with the engine, the improvement of our tools, and all of the things that have kept us busy over the last few months. If you want to hear about some of the stuff I downloaded to Drummond, you'll want to attend the conference. If you came to the last Impact, this one will be better. If you didn't, then it's time for you to understand what we are doing and what it means for the world.

Oh, and the food will be worth the price of admission alone. Seriously. Sign up for Impact.

Kynetx Impact Spring 2010
April 27-28 2010
Miller Free Enterprise Center (MFEC)
at Salt Lake Community College
9750 South 300 West
Sandy, UT 84070

Use Code FOK2010 for a 33% discount on the conference price.

Changing the World at Kynetx

I've been pretty quiet on my blog lately, and I have a really good excuse. Now that I've graduated and have time to get involved in some serious endeavors, I've joined the folks at Kynetx. I've been contributing in their efforts to change the world. Indeed, this does change everything.

At Kynetx, we believe that experiences can be made better through better use and understanding of context. (Gartner agrees...) Context is data in time and space: who we are, what we are doing, and what our purpose is. Kynetx has what we call a Context Automation Engine, which does the heavy lifting required to produce intelligent applications. Using our engine, you can create complex applications easily, and deploy them fast.

I'll take a break now and again from my work to post more info, including examples and more information about what we are doing. If you want to learn more, attend the Kynetx Impact developer conference Nov 18-19. We will cover our technology and our vision.

If you are interested in using our platform to add contextual intelligence to your applications, go sign up. Use code Windley50 for a 50% discount, and I'll see you there!

Sam Rides 1000: Augmenting the Web

In my previous two posts, I introduced my project and described data collection using my G1 and Google Spreadsheets. Today, I'm going to show you how I used Kynetx Network Services to add my ride stats to my personal blog and to the Google homepage.

Dataset Conversion

Google Spreadsheets, where my stats are calculated, can publish data in a variety of formats. It cannot publish JSON data, so I use Yahoo's YQL to convert the data from CSV to json, with the following statement:

select * from csv where url='http://spreadsheets.google.com/pub?key=rxzHBMZyj1S-HVLy9lFEU7A&single=true&gid=1&range=A12%3AC16&output=csv' and columns='period,miles,hours' and period != ""

(See the raw JSON results)

Building the App

I then build my Kynetx App in AppBuilder, defining the following datasource in the Global block:

dataset ridestats <- "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20csv%20where%20url%3D'http%3A%2F%2Fspreadsheets.google.com%2Fpub%3Fkey%3DrxzHBMZyj1S-HVLy9lFEU7A%26single%3Dtrue%26gid%3D1%26range%3DA12%253AC16%26output%3Dcsv'%20and%20columns%3D'period%2Cmiles%2Chours'%20and%20period%20!%3D%20%22%22&format=json&callback=" cachable for 2 hours

Since I ride in the morning and the evening, I cache the dataset for 2 hours. This keeps the data fairly current, but still keeps the service fast.

Annotating My Blog

I have two rules, the first of which adds stats to my personal blog:

select using "http://sam.curren.ws/" setting ()

pre {
daymiles = ridestats.pick("$..results.row[0].miles");
weekmiles = ridestats.pick("$..results.row[1].miles");
monthmiles = ridestats.pick("$..results.row[2].miles");
totalmiles = ridestats.pick("$..results.row[3].miles");
milesmessage = <<
<h2>Sam is riding 1,000 miles. Progress:
#{(daymiles > 0 ? daymiles + " Today, " : "")}
#{(weekmiles > 0 && weekmiles != daymiles ? weekmiles + " This Week, " : "")}
#{(monthmiles > 0 ? monthmiles + " This Month, " : "")}
#{totalmiles} Total.</h2>
>>

}

replace_html("#logo h2", milesmessage);

I set the rule to fire on my blog's domain, and then use the pick() method to extract different totals from the json dataset declared in the Global block. I construct a message string that varies depending on the different stat values. Finally, I replace the text at the top of my blog page with the message.

To run the Kynetx application on my blog, I plant Kynetx tags on my blog. This enables everyone to see the Kynetx Application with no installs or Action Cards. The html tags are available within AppBuilder, and I simply copied them into the template for my blog.

Annotating Google's Homepage

My second rule is activated by an Action Card installed on the user's computer (instructions for installing this are in my first post). My second rule is very similar to the first rule, with some minor difference in inserted HTML and appending the message to the existing site, instead of replacing anything on the page.

select using "http://www.google.com/" setting ()

pre {
daymiles = ridestats.pick("$..results.row[0].miles");
weekmiles = ridestats.pick("$..results.row[1].miles");
monthmiles = ridestats.pick("$..results.row[2].miles");
totalmiles = ridestats.pick("$..results.row[3].miles");
milesmessage = <<
<h2>Sam is riding 1,000 miles.</h2><p> Progress:
#{(daymiles > 0 ? daymiles + " Today, " : "")}
#{(weekmiles > 0 && weekmiles != daymiles ? weekmiles + " This Week, " : "")}
#{(monthmiles > 0 ? monthmiles + " This Month, " : "")}
#{totalmiles} Total.</p>
>>

}

append("#body>center", milesmessage);

Activating Kynetx Rules with an Action Card also requires an update to the Dispatch block of the rule, adding this line:

domain "www.google.com"

I also generate the card inside AppBuilder, providing a custom image that I created using Pixlr

And there you have it. Sam rides 1000 miles, with automated stats provided by Android MyTracks, Google Spreadsheets, YQL, and Kynetx Network Services.

Shameless Plug

Kynetx is a cloud based automation engine, capable of doing the things I've demonstrated and much, much more. If you'd like to use Kynetx Network Services, sign up for an account, and start using AppBuilder.

Sam Rides 1000: Collecting Ride Data using the Android Powered G1

On my Android powered T-Mobile G1, I'm using the free My Tracks application to record my rides. I start recording just before I start, then throw it in my pocket or bag. I stop recording at the end of my ride, then use the Upload to Google option in the map menu. You can upload the track to My Maps within Google Maps, but my rides are very similar, so I usually only upload to Google Spreadsheets.

Uploading to Google Spreadsheets creates a new spreadsheet in Google Docs, with one page for ride data, and another for stats. The program creates a new spreadsheet for each activity type, so I make sure and select Cycling when I stop recording at the end of my ride.

Additional Stats

It's important not to manually change too much on the Log sheet, as the program will get confused, but everything else in the spreadsheet is open to tinkering. In addition to the total miles and total time stats, I wanted to calculate daily, weekly, and monthly totals.

Before I computed those stats, I had to handle the date field uploaded by the My Tracks application. Google Spreadsheets cannot parse the provided date as a date value, so I had to help it along. I created a Dates sheet to accomplish that task. Rather then try and explain what I did, see this spreadsheet, which contains my modifications and formulas:

Sample Spreadsheet with modifications.

I then added some additional stats, first by calculating the start of the date range I wanted to sum, then using SumIf() to only add the mileage and time from the period I wanted.

Finally, I published the Stats sheet of my spreadsheet, so I could use it as a dataset for my Kynetx Application, which I'll explain in a future post.

Sam Rides 1000: An exercise in collecting data and web augmentation

May 15, 2009 was National Ride your Bike to Work day, and I pulled out my bike and rode the 6 or so miles between my house and the Kynetx offices at Thanksgiving Point. I enjoyed it, and rode a few more times in the next week. I was musing how many miles I could ride this summer, and mentioned it to my wife. She promptly challenged me to ride 1,000 miles this summer.

I better say here that I'm not a cyclist. I haven't ridden 1,000 miles in the past 5 years, all combined together. While 1,000 miles might not be much for a cyclist, it is quite the challenge for me.

Tracking My Progress

In addition to making a few bike repairs and buying some commuting tires for my mountain bike, I immediately cooked up some geeky ways to track my progress and share my results with my family and friends.

I'm a (BIG) Android fan, and so I'm using my T-Mobile G1 as a cyclometer. I'm using the My Tracks application, which records both the route of my ride and my ride stats. After my ride, the app uploads my stats to a Google Docs Spreadsheet, where stats are calculated. I'll share more about that in a future post.

To share my progress, I'm using Kynetx Network Services (KNS) to augment my personal blog with my stats. If you are reading this post on my blog, look at just under the title for my updated stats. KNS pulls my ride stats from the Google Spreadsheet and annotates my website. All I had to do was plant some javascript tags in my blog's template to activate the Kynetx Application that makes the change.

My friends and family don't visit my blog EVERY day, so I've also produced an Action Card that displays my ride stats on Google's home page. KNS allows me to augment websites for anyone that has my Action Card installed. I'll explain more about how I wrote my Kynetx Application in a future blog post, but for now, you can install my "Sam Rides 1000" card to track my progress on Google's home page.

Installing The Action Card

An Action Card is a type of Information Card that allows your web experience to be augmented with a Kynetx Application. You activate the application by installing an Action Card Selector if you don't already have one, and then installing the card. You can disable or remove the card if and when you don't want to use the application.

  1. Verify you have a supported browser: IE/FF on Win, FF on OSX
  2. Install Azigo
  3. Install the Sam Rides 1000 Action Card

Then, browse to Google's main page to see my stats: http://www.google.com

When I finish a ride and update my stats, you'll see the new numbers. If I'm falling behind, be sure and give me a nudge!

Progressive Spatial Networks

I’ve been pretty silent the last few months here on my blog. I’ve been pretty busy with things like settling into our new house and starting full-time at Kynetx. A major sap on my free time both recently and for the last several years has been my Master’s Thesis. I’ve graduated now, and finally carved out some time to update my blog.

In my efforts in resuming regular blogging, I find it appropriate to post my Thesis for all the world. First, a little backstory.

For my thesis work, I worked on an algorithm to combine GPS tracklogs into what I call a spatial network. I chose this work because of my experience building ActiveTrails.com. As any excited graduate student (pre-thesis student, that is) I had grand ideas about what I was going to accomplish with my thesis work. Luckily for me, my graduate advisor guided me properly through the process, and I finally completed my work.

I do find it strange that only a written Thesis is required for an MS in Computer Science. I’ve decided that it only makes sense to post my code, that others might be able to experiment with my work without having to rewrite it from scratch. Now, I’m sure I’ve made plenty of mistakes in my code, and I hope that others can produce much better results then I, and not fall into the same lines of thinking that perhaps restricted my results.

I originally had plans to organize my code, clean it up, flush it full of comments, and organize my result files. And then I realized it might never happen. I’ve packaged my code, source files, and results into a zip file, and though it isn’t perfectly clean, I hope it’s useful for those who want to use it.

Progressive Spatial Networks: Learning from GPS Tracklogs (pdf link)

Source data, python source code, and result files (zip file).

Trouble with Random Long-Running request in ColdFusion 8

I've been experiencing some run-time weirdness with ColdFusion for the past year or so, and I've finally decided to post my observations and see if anyone else has been having similar troubles.

The trouble shows up on any number of scripts, but is most likely to appear with scripts that are called frequently. The screenshot I've included shows a Slow Request report from the ColdFusion Server Monitor. You can see that this request took 110 seconds to complete. The VERY strange part is that the runtime of the Application.cfc onRequest method (the outermost piece of code to run on any request) took only 468ms to complete. So what happened to the 109 seconds that were not spent executing my request? How can I prevent this from happening?

Also observe the Min/Max/Avg response times for the script. At the time of this screenshot, this script had been called 250 times. Multiplying the average response time (.505 seconds) by the request count (250) tells us that this script has occupied 126.25 seconds of server time. Removing this one long running request (250-1, 126.25-110.234) from the statistics, we find the script has an average running time of 0.064 seconds, which is a more reasonable run-time, considering the reported min response time of 0.015 seconds.

Eliminating these random long-running requests will do great things for the usability of the website, as well as free up server resources.

Misc Details

  • ColdFusion Enterprise 8,0,1,195765
  • Monitoring and Profiling Enabled
  • Windows 2003
  • Java 1.6.0_04

Summary

Scripts will randomly take MUCH longer then they usually do.

There seems to be a massive discrepancy between the Response Time and the Time Taken by the onRequest method.

Why is this happening, and what can be done about it?

Sorting SimpleDB queries on Multiple Attributes

Image representing Amazon as depicted in Crunc...

Image via CrunchBase

I was recently working with a dataset in Amazon’s SimpleDB, and I needed to be able to sort query results on multiple attributes. SimpleDB currently only allows sorting by a single attribute, so I was stuck. The solution is rather simple: combine the columns that I need to sort on into a single new attribute, and use this new composite attribute to perform my sorting. I kept the existing attributes in their current form, but added a new attribute that is the concatenation of the two attributes I wished to sort.

SimpleDB sorts lexicographically, so concatenating two fields produces exactly the expected results provided you handle a few situations properly. There are a few things to keep in mind as you produce the multi-sort attribute.

Properly pad your fields

Sorting numbers in SimpleDB requires some encoding to provide the expected results. The SimpleDB documentation provides examples of this. If you have properly encoded your numeric fields, then the length of each attribute will be the same as all the rest. Each values of an attribute must be the same length to provide consistent results.

Align the sort order of each field

You can only sort your one field one direction; ascending or descending. If you wish to sort column 1 ascending and column 2 descending, then you must reverse the encoding of field 2 prior to concatenation, then sort the combined column ascending. You could also do the reverse: reverse the encoding of field 1, then sort the combined attribute descending.

The SimpleDB documentation describes encoding numeric values, but this concept applies to both numeric an alphanumeric fields, as long as the length of the attribute value is either the same, or can be padded to be the same.

Reversed encoding is nothing more then changing the data to be the reverse of what it really is. A single digit number attribute value could be subtracted from a base value, such that 0 becomes 9, 1 becomes 8, and so on. Characters can be reversed as well, turning a into z, b into y, etc. The exact encoding used will depend on the properties of the data being stored.

Keep in mind that the value of the multi-sort attribute never needs to be parsed or read: it is only used for sorting results. Even if the translation is an ugly one, it only needs to be done on an update.

Concatenate attributes in order of sorting preference

If you wish to sort on attribute A, then attribute B, followed by C, your combined attribute must be combined in that order, left to right.

Flexibility

This approach is fairly flexible, including creating a sorting attribute from any number of attributes and creating multiple sorting attributes, as long as the attribute length and number of attributes fall within the restrictions of SimpleDB.

While this technique is not new, I wanted to explain it here to provide hints to others with no idea how to solve the problem. Credit for revealing this concept to me go to Phil Windley, who told me that the only reason I don’t know it already is that I’m “too young� to remember a time when most database platforms carried the same restriction. I’m just glad to have experienced old guys to learn from. :)

Cloud Computing - The 5th Utility

Cloud Computing

Image by stan via Flickr

This post is part of a series of posts relating to distributed system design that I'm completing as part of my Computer Science MS program at BYU.

My 6th paper was written by some good folks at The University of Melbourne, Australia. They discuss the emerging cloud computing paradigm as the 5th Utility, and compare it with both clusters and grids. The paper (PDF) argues that clusters and grids cannot be considered a utility by themselves, but cloud computing fits the necessary requirements.

Clusters are groups of machines that work together to accomplish a single task, such as serve web content. Each machine is the same, and they all perform the same task. Clusters can scale in size to handle varying loads. Grids support multiple jobs of different characteristics, typically within a required framework. Clouds can scale (like clusters), and support a wide variety of jobs simultaneously. Clouds take scaling to such an extreme that they can scale to nothing, which neither clusters or grids support. This minimal commitment, without minimum usage levels, is what makes clouds so useful. Just as water, electricity, gas, and telephony (the first 4 utilities) can scale from nothing to very high usage, clouds can scale to any reasonable load.

As I’ve mentioned in several of my other paper reviews, it is very clear that cloud computing exists in a layer underneath clusters and grids. Clusters and grids can be built on top of cloud computing systems, as cloud systems utilize virtual machines as a hardware abstraction. The unique piece that makes it possible is the dynamic provisioning made available through the API. While some hosting providers can provision servers in only a few hours, cloud providers provision their resources within minutes, and sometimes within seconds.

At this point, there are few cloud providers, and each has their own API, terms of use, and types of services. As more providers enter the market, a consistent interface will be needed in order to tame the API chaos. While some services may end up sharing an API, consistency can also be provided via a meta-interface that can translate the users commands into whatever syntax is required by the particular provider and service being utilized. This layer can either be constructed in the cloud itself, in client based toolkits, or as a combination of the two. The concept of a metalayer is demonstrated in the paper through the creation of a meta-storage service, capable of storing data in several cloud services through a single API.

Just as interfaces are not likely to completely converge, the properties of each service are also not likely to be identical between providers. Speed, price, reliability, and other factors will vary, allowing users to select the proper service to fit their particular need. Some services will provide an SLA, providing service guarantees.

It is still very early in the development of cloud services, and I’m sure that we will be seeing new entries for years to come. Amazon has hinted at some of the services that will be made available in this next calendar year, including load balancing and monitoring and automation management. As we see more entries in the space, it will become easier to understand the strengths and weaknesses of cloud computing, as well as define it’s limits.

I’m excited for the expansion of cloud computing, and I look forward to more studies that can help us understand better.

Really Bad reasons not to auto-scale cloud based systems

O'Reilly writer George Reese posted today what I consider to be a poor evaluation of the perils of auto-scaling in the cloud.

He does mention the concept of using a governor to limit the power of the auto-scale agent to spin up servers (and spend money), but his insight ends there. Anyone following cloudy issues will have read Don MacAskill's excellent post this past June, where he explains their auto-scale operation, and the need to set limits.

George also makes a few arguments against auto-scaling, which I'll address briefly:

1. Amazon and other clouds cannot respond fast enough to increased capacity needs.

George claims that a 10 minute instance spin up time cannot respond fast enough to help. This is only true if you start to spin up your service when the existing is already (or nearly) toast. Common strategies involve already having some extra capacity running, so as to not immediately fold under an increase. Solving this problem is just tuning the thresholds.

2. Got any disgruntled employees, unhappy customers, or malicious competitors?

George claims that auto-scaling will waste your money in the event of a denial-of-service attack. What he doesn't mention is that a DoS on a non-auto-scaled system will likely take it down. At the very least, it will artificially inflate your usage anyway, and you will still have to spin up more resources to handle the load. I'd rather spend a few extra bucks and STAY UP.

3. So you think you'll stick some governors in place...

George's main claim here is that your governor is likely to be set at the wrong value. Although he doesn't explicity say, he seems to be implying that a governor can only be used to limit the total number of machines. SmugMug (in the aforelinked post) indicates that their governor limits the rate at which new machines can be started. Using this strategy, only the rate of traffic growth.

4. So what about getting slashdotted?

The main complaint here is that an auto-scale agent cannot tell the difference between true traffic growth and a random spike. Clearly, George has never worked with noise filters, which smooth data to reveal real trends. Evaluating load data from the past few minutes will allow agents to ignore spikes easily. Again, this is reduced to tuning thresholds.

5. Don't you lose a key value of the cloud without auto-scaling?

Despite George's claims that no value is lost, there are clear cases where auto-scaling can save your bacon. He claims that 'capacity planning' is the clear answer. I agree with him on the importance of capacity planning, but disagree that proper capacity planning eliminates the need to auto-scale. A good auto-scaling system can save quite a lot of money in cloud processing expenses, which will do wonders for the bottom line.

Summary

I'm not bashing capacity planning here. I believe that capacity planning concepts work very well with auto-scaling, that that proper user of governors and properly set thresholds are the right way to go.

I rarely respond to lousily written posts and dumb opinions, but this one irked me for some reason. At this point, I have nothing but logic and the experiences of others to rely upon. Over the next few years, I plan on gaining some extensive experience in auto-scaling cloud based systems, and perhaps then I'll be in a better position to dish a proper smack-down.

More Entries