From Convex hull to the Voronoi algorithm

During the run-up to the 2008 election I had the opportunity to work on a challenging mapping problem, as part of my work for We Also Walk Dogs. Our client MoveOn.org planned to run a multi-state get-out-the-vote field campaign. This kind of campaign involves teams of volunteers going door-to-door having conversations with voters and recording the results. Later the same volunteers will help turn out voters to their polling places.

The early phase of this project focused on picking the areas where the campaign would operate – a process known as turf assignment. The traditional way to do this is to assign work by precinct. We needed a way to display precinct maps visually to aid in picking precincts and assigning them to offices.

You might think that there would be some publicly available resource for precinct maps in a consistent format covering every state. Sadly, this is not the case. In fact, for reasons that escape me to this day precinct maps are carefully guarded secrets controlled at the state level. It is sometimes possible to buy the precinct map for a given state, but not always and certainly not in a single standardized format.

What we did have was the voter file. The voter file is a publicly available list of every voter in every state (almost, nothing is perfect of course). Critically for our purposes it includes both the voter’s address and their precinct. We theorized that we could essentially reverse engineer the precinct maps from the voter file data – essentially defining a precinct as a list of voters in that
precinct and then drawing a shape which enclosed them.

To give you an idea of how I thought it would work, consider this map:

map-1

The dots on the map are voters assigned to three different precincts. My idea was to come up with some way to draw shapes around them which would approximate the actual precinct shape. For example:

map-2

My first attempt at solving this problem was to write a geometric algorithm which attempt to find what’s call the “convex hull” containing all the voters in a precinct. The simplest way to think about a convex hull is as the shape a rubber-band would make if you pulled it around all the points. In this case it might look something like:

map-3

Not bad, right? Sadly things in the real world aren’t so simple! It’s unfortunately all too common to have precincts more like this:

map-4

And when you try to draw a convex hull around these points you get:

map-5

And if you think that’s bad, imagine what that would look like if there were a few more precincts shown and they all had overlapping segments. It’s not unheard of for one precinct to entirely enclose another!

Even worse, the voter file data contains errors – some people are assigned precincts that are actually many miles away from where they should be. When you try to fit a convex hull around a precinct with just one bad address you get a shape with a very long spike. Ugly and ultimately unusable.

My first solution to this problem was to divide the map into a grid and color each grid box according to its composition, subdividing as necessary. The maps produced this way were actually surprisingly usable, given the brute-force nature of the algorithm. They looked something like:

map-6

This worked but it was still pretty far from ideal. In particular it looked nothing like the pretty paper precinct maps that people are used to looking at.

The final solution I arrived at after boning up on my math skills was something called a Voronoi diagram. In simple terms a Voronoi diagram forms shapes by including all the area which is closer to a given point than any other. It’s ideal for building up maps based on a set of points.

Here’s what the final results looked like:

map-7

As you can see the Voronoi algorithm is able to construct shapes to fit very complex constraints, and it even provided enough shape data to draw outlines around the shapes. Compared side-by-side with the real precinct maps (often just scans of paper maps, sadly) the generated maps were often very close.

If you’re interested in using the Voronoi algorithm in your own code I was able to release the code of it as a Perl module on CPAN. You can download it here:

http://search.cpan.org/~samtregar/Math-Geometry-Voronoi/

I’m hoping the 2010 elections will give me a good excuse to dive back into this problem – there’s still so much that could be done to make precinct maps easier to use.

Advertisements

Nice Animations with Time Manager’s Offset Feature

You probably know this video from my previous post “Tweets to QGIS”. Today, I want to show you how it is done.

After importing the Twitter JSON file, I saved it as a Shapefile.
Every point in the Shapefile contains the timestamp of the tweet. Additionally, I added a second field called “forever” which will allow me to configure Time Manager to show features permanently.

To create the flash effect you see in the video, we load the tweet Shapefile three times. Every layer gets a different role and style in the final animation:

  • Layer “start_flash” is a medium sized dot that marks the appearance of a new tweet.
  • Layer “big_flash” is a bigger dot of the same color which will appear after “start_flash”.
  • Layer “permanent” is a small dot that will be visible even after the flash vanishes.
  • Three layers with different styles will make the animation more interesting.
  • We’ll plan the final animation with a time step size of 10 seconds. That means that every animation frame will cover a real-world timespan of 10 seconds.

    We configure Time Manager by adding all three tweet layers:
    Layer “start_flash” starts at the orginal time t. Layer “big_flash” gets an offset of -10 seconds, which means that it will display ten seconds after time t. Layer “permanent” gets an offset of -20 seconds and ends at time forever.

  • Layers can be timed using the “offset” feature
  • Use a time step size of 10 seconds so it fits to the offset values we specified earlier.

    Besides watching the animation inside QGIS, Time Manager also enables you to export the animation to an image series using “Export Video” button. Actual video export is not implemented yet, but you can use mencoder (Windows users can download it from Gianluigi Tiesi’s site) on the resulting image series to create a video file:

    1
    mencoder "mf://*.PNG" -mf fps=10 -o output.avi -ovc lavc -lavcopts vcodec=mpeg4

    Time offsets are a new feature in version 0.4 of Time Manager. You can get it directly from the project SVN and soon from the official QGIS repo.

HOW TO IMPORT DATA TO RAPID MINER

You  just copy and paste this code into your rapid miner XML view(remove the previous xml code), click the green check symbol and switch back to the normal Process (diagram) view.

Code:
<?xml version=”1.0″ encoding=”UTF-8″ standalone=”no”?>
<process version=”5.2.008″>
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated=”true” compatibility=”5.2.008″ expanded=”true” name=”Process”>
<process expanded=”true” height=”145″ width=”279″>
<operator activated=”true” compatibility=”5.2.008″ expanded=”true” height=”60″ name=”Read CSV” width=”90″ x=”179″ y=”75″>
<parameter key=”csv_file” value=”C:\iris.data”/>
<parameter key=”column_separators” value=”,”/>
<parameter key=”first_row_as_names” value=”false”/>
<list key=”annotations”/>
<parameter key=”encoding” value=”windows-1252″/>
<list key=”data_set_meta_data_information”>
<parameter key=”0″ value=”att1.true.real.attribute”/>
<parameter key=”1″ value=”att2.true.real.attribute”/>
<parameter key=”2″ value=”att3.true.real.attribute”/>
<parameter key=”3″ value=”att4.true.real.attribute”/>
<parameter key=”4″ value=”att5.true.polynominal.attribute”/>
</list>
</operator>
<connect from_op=”Read CSV” from_port=”output” to_port=”result 1″/>
<portSpacing port=”source_input 1″ spacing=”0″/>
<portSpacing port=”sink_result 1″ spacing=”0″/>
<portSpacing port=”sink_result 2″ spacing=”0″/>
</process>
</operator>
</process>
The second way is that belowhttp://www.youtube.com/watch?v=cVjyJ9Ag0_0

http://rapid-i.com/videos/tutorials/02_rm5_data_import/rm5_data_import.html

The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City

Studying the social dynamics of a city on a large scale has tra- ditionally been a challenging endeavor, requiring long hours of observation and interviews, usually resulting in only a par- tial depiction of reality. At the same time, the boundaries of municipal organizational units, such as neighborhoods and districts, are largely statically defined by the city government and do not always reflect the character of life in these ar- eas. To address both difficulties, we introduce a clustering model and research methodology for studying the structure and composition of a city based on the social media its res- idents generate. We use data from approximately 18 million check-ins collected from users of a location-based online so- cial network. The resulting clusters, which we call Livehoods, are representations of the dynamic urban areas that comprise the city. We take an interdisciplinary approach to validating these clusters, interviewing 27 residents of Pittsburgh, PA, to see how their perceptions of the city project onto our findings there. Our results provide strong support for the discovered clusters, showing how Livehoods reveal the distinctly charac- terized areas of the city and the forces that shape them.

http://videolectures.net/icwsm2012_cranshaw_city/

Neighborhood boundaries based on social media activity

Researchers at the School of Computer Science at Carnegie Mellon University investigate the structure of cities in Livehoods, using foursquare check-ins.

The hypothesis underlying our work is that the character of an urban area is defined not just by the the types of places found there, but also by the people who make the area part of their daily routine. To explore this hypothesis, given data from over 18 million foursquare check-ins, we introduce a model that groups nearby venues into areas based on patterns in the set of people who check-in to them. By examining patterns in these check-ins, we can learn about the different areas that comprise the city, allowing us to study the social dynamics, structure, and character of cities on a large scale.

It’s most interesting when you click on location dots. A Livehood is highlighted and a panel on the top right tells you what the neighborhood is like, related neighborhoods, and provides stats like hourly and daily pulse and a breakdown location categories (for example, food and nightlife). Does foursquare have anything like this tied into their system? They should if they don’t.

There’s only maps for San Francisco, New York City, and Pittsburgh right now, but I’m sure there are more to come.

Want more on the clustering behind the maps? Here’s the paper [pdf].livehoods_icwsm12livehoods

craiggers

If you’ve ever used craigslist before then you know, it’s just not very good. That’s not to say you can’t find what you need on there. The site is full of amazing deals and goods and services of all kinds, but navigating it involves opening new browser tab after browser tab, going back and forth and generally losing your way.

 

For those of you who are tired of the craigslist user experience from circa 1996, head on over to craiggers, the site that lets you interact with Craigslist the way you ought to.

As the craiggers’ tagline says, the site is simply “craigslist data, better than craigslist!” It allows users a number of simple functions you’ve likely unconsciously wished for for years but didn’t even realize you were desperately missing. For example, the site separates navigation into a number of columns, so you don’t need to open listings in new tabs or hit the back and forward buttons all the time. Click on a result and it loads in the same page. Hit the down arrow or click on a different entry and it loads in the right most column without ever leaving the page.

Beyond navigation – which is quite an improvement already – craiggers adds on a new layer of functionality when it comes to searching. No longer do you have to search simply within a single geographic area. As the site points out, “there are cases when searching outside your immediate community benefits both seekers and providers,” giving the example of searching for a stolen bike or adopting a dog. When you search on craiggers, you can specify that you want to see results from neighboring locations and it will show you those as well.

Furthermore, if you wanted to search craigslist repeatedly, say for a job or an apartment, craiggers will not only let you save the search to repeat later, but it will also send you an email notification twice a day of results.

craiggers: An Example for Developers

For those of you out there interested in more than simply craiglist searches, there’s another interesting aspect to craiggers – it was built using the 3taps API. We firstwrote about 3taps last month when the company launched at the Data 2.0 conference, explaining how the company wanted to “democratize the exchange of data.”

Through the 3taps API, data from craigslist, eBay, Indeed, Etsy, Amazon and a host of other services is available in real-time, making mash-ups like this possible. Craiggers was built by the 3taps team as an example of the potential of its offering and we think it makes quite an argument.

Craigslist rentals and boundary layers API on rentrent.org (Alpha)

This API originated from my ‘Craigslist Rentals on Map’ websitewww.rentrent.org

As you can see, I haven’t put a lot of efforts to make this site pretty. I just wanted to make it usable. Making this API public is an effort to encourage others to create better websites than mine.

This API takes away the pain of crowling, mining, geocoding and indexing Craigslist data and provides very simple web service calls to fetch the data. This way you can focus on creating a great rentals classifieds application without worrying about GIS bit of it.
You can use this API with Google Maps, Microsoft Bing maps, Yahoo maps etc.


The API supports 2 calls:

Ads.aspx

Service URL: http://www.rentrent.org/RENT/Ads.aspx

Parameter Description
xmin Longitude (min)
ymin Latitude (min)
xmax Longitude (max)
ymax Latitude (max)
bd Number of Bedrooms
ba Number of Bathrooms
type 1: For room rentals
2: For apartment and houses
maxrecords If not passed, maxrecords is set to 250.
If you pass maxRecords=1500,
you can retrieve bulk data using one request.
throwErrorIfOverLimit If not passed, this is ‘true’
You can set throwErrorIfOverLimit=false to get the top ‘maxrecords’ instead of error.
callback Name of a javascript function you want to be called back.

Example URL:
http://www.rentrent.org/RENT/Ads.aspx?xmin=-118.01925659179687&ymin=33.71948521132481&xmax=-117.68142700195314&ymax=33.85644642218431&bd=&ba=&pets=-1&type=2&throwErrorIfOverLimit=false&callback=xxx

The output will be in JSON format. (If you need specific API, send an email on rentrentorg@gmail.com and I will try to speed up the documentation process.)


Map.aspx

Service URL:http://www.rentrent.org/BUY/Map.aspx

Parameter Description
TID Tile ID or Quad Key. (%4 in VE map)
GridX X value of a tile (For google map)
GridY y value of a tile (For google map)
GridZ Zoom level (For google map)
Layer Name of a layer

1. Neighborhoods
2. ElementarySchoolDistricts
3. SecondarySchoolDistricts
4. UnifiedSchoolDistricts

 

Example URL:

http://www.rentrent.org/BUY/Map.aspx?TID=0230121301213&Layers=Neighborhoods

hhttp://www.rentrent.org/BUY/Map.aspx?TID=0230121333&Layers=UnifiedSchoolDistricts

License/Disclaimer/Terms of Use:http://www.rentrent.org/BUY/Disclaimer.html