Saturday, March 31, 2012

Server Side Clustering: Why you need it

Server Side Clustering: Why you need it:
It’s been almost 4 years since I posted about using server side clustering to make point maps wicked fast, and we’ve used this same technique on 4 or 5 projects since then. Today, as I was catching up on actually reading the tweets I’d favorite-ed over the last few weeks, I came across a site that really shows a scenario where this technique would massively help.
Before I get into this – I’m not trying to point fingers and call anyone out on this, I’m simply trying to show a technique that can really help make your maps fast.
Ok – the site is “Climate Change and African Political Stability” and it’s at http://ccaps.aiddata.org/ . I got to it from a blog post titled “Crisis Mapping Climate Change, Conflict and Aid in Africa”, and when I first saw the screen cap in the blog post, I thought it looked like something by DevelopmentSeed or Vizzuality. So, yeah, the site looks good. Here’s the screen cap on first load…
Initial Load
Cool! So it’s using the Esri Javasctipt API, and some fancy client-side clustering. What I could not get in the screen cap is that the little clusters have fly-out “octopus” arms depicting how many items are in the cluster. Very cool.
The intent of the app is to have the users manipulate the slider at the bottom to look at changes over time. So, when the app first loads, we are looking at data for 2010.
Whenever I see clustered data like this, I’m curious to see what’s on the wire… so a quick peek into Firebug tells another part of the development story…
Fb 1
The line I high-lighted is a little hard to see, but the key things are POST, 343k and 3.6 seconds. First off the POST is kinda hokey, to say nothing of turing the stomach of every RESTafarian around. Why? Because POST is meant to CREATE or UPDATE data… queries should be done with GET my friend. And, no one else can POST to that url, to get the data. Maybe that’s by design – I’m not sure. The next two numbers are really performance indicators, and really not too bad, but let’s look at what’s actually coming down.
Fb 2
The actual object does not see to have a much extraneous info, which is good, and 343k is not far off the size of a map tile, so it’s all good right?
So, here’s where I’m gonna start making suggestions… using long property names is wasteful. Being a nerd, I grabbed the response data, stuffed it into an editor, and started hacking. Changing the properties to be much more concise (lat,lng,did,et,ed,a1,a2,c,l corresponding to the items in the order shown above) drops almost 1/3 of the file size – down to 260K.
But really, there is another way to shrink that down that’s much much more effective. You see, there are 1418 data points coming down in that package. To show then effectively on the map, they have to be clustered, or the map would be a total schear of points, and be impossible to make sense of. Looking at the map, we are seeing *maybe* 30 points total? So that means that we have an extra 1388 data points on the wire. We could make exactly the same map, with roughly 0.03% of the data on the wire. And you know what that means? The app will be monstrously faster.
“But Dave, a few seconds when the page loads – who cares?” Ah, yes, but let’s get back to the point of the app – to look at this data over time. And not just at different points in time, but aggregated over time. What say we slide that date back to 2003 shall we? Go ahead, try this… I’ll wait…
Impatient? Ok, here’s what you’ll see if it finishes loading…
Ccaid 2
Kinda similar eh? I moved the map a bit for this screen cap, but really we see pretty much the same thing – approximately 30 points. Except for that heinous lag while it loaded, the map is pretty much the same. And what does Firebug tell us? Yeah – that huge lag was what happens when you jam 5MB of JSON down the wire. That little hack using short property names to save 30% doesn’t look so stupid now does it ;-)
Fb 3
And the load time on that? 1 minute, 18 seconds. Firebug did not even JSONify the payload, but swaging is based on file-size, this is somewhere around  25,000 points. And in the end the map shows 30 points? Right.


A Better Way…


Server-side clustering is the solution here. Based on the Url’s we are seeing in FireBug, I’m guessing that this data is stored in a simple tabular database, which makes things even easier. But first a demo…
Conveniently, we launched a site last week which shows server-side clustering in action. Check it out at http://maps.saferoutesinfo.org. Basically it’s a map of the projects which have been funded by Safe Routes to School – if there are no projects in your area, contact your school board and congress person.  Anyhow, the data covers 6 years, and has roughly of 13,000 data points.
Srts 1
What’s great about this clustering solution is that the payload on the wire is small, and stays small. We load the map with the “worst case scenario” – all types of awards, for all years, for the entire country. Payload is close to the other app, at 211Kb, 1.4 seconds and we are showing 450 points. If you go poking into the JSON, you’ll see that we were under a time-crunch to get this live, so it’s not optimized – we could likely save another 20% by dropping extra / unused attributes. Speaking of Url, this is a simple REST service so you can run this same query using this link.
The idea here is to store the data in a normal SQL table, send in the bounding box and use that in a simple SQL query to get all the points currently in the map. We then cluster the points, JSON that up and shoot it back. The only downside here is that we need to re-fetch the points every time we pan or zoom. But given the fact that the points are often rendered before the map tiles come down, this does not seem to be a performance issue.
The clustering is based on a 22 pixel square at the current zoom resolution, so we never get back any more than 1 point for every 20×20 pixels of map space.
What’s more, we can do some interesting stuff – in this app, we have 3 types of funding “awards” – School Awards, District Awards and State-wide Awards. If a cluster contains only schools, we use the green cluster icon. If it has 1 or more District Awards, it gets the blue cluster icon, and if there is a State-wide award in there, we show the gold icon. I’m mentioning this because the gist I posted has this logic in there – if you just want to show points, you’ll need to rip out that extra logic.
Since  I’m running on the el-cheapo WordPress.com hosting, I can’t embed the gist of the code, but you can get it here: https://gist.github.com/2188210


Summary


I hope this shows why server side clustering is a killer way to build really fast web apps. In the end, you can have great design, great data, a great story, but if the app is slow, all that work is lost.
Also – if you are the developer/project manager/benevolent overlord/in-any-way-associated with the Climate Change and African Political Stability App, I encourage you to get in touch – although it looks like you are not using ASP.NET, I am more than willing to help you work through converting the C# over to whatever you are using. The site tells an important story, and it deserves to be told well.



No comments:

Post a Comment

Thank's!