Chicago Neighborhood Analysis Indicators

Class classification

1. Working class

2.Service Class

3. Creative Class (creative class diagram)

Compared Neighborhood

  • Wicker Park (Creative-hipster subgroup)
  • Pilsen (Working class neighborhood)
  • Englewood (Service Class Neighborhood)
  • Lakeview/Lincoln Park (Established Creative Class Neighborhood)
  • Logan Square (Potential Creative Class Neighborhood)

Chicago Neighborhood Indicators (chicago neighborhood analysis)

Household:

  • size
  • ethnic diversity
  • education level (>bachelor)
  • income
  • job

Existing building fabric:

  • typology
  • rent cost
  • house value
  • ownership
  • density

Transport and infrastructure:

  • distance to public transportation
  • traffic flow
  • parking availability
  • bicycle lane

local context:

  • restaurant
  • bar
  • coffee shop
  • grocery shop
  • farmers market

zoning:

  • residential
  • office
  • commercial

public facility/service:

  • public school
  • safety
  • health

green space:

  • recreational park
  • playground
  • dog park

Comparison Diagram

 chicago neighborhood comparison diagram

calculating bicycle lane percentage over neighborhood

with

rawdata as
(
–join the table, #of bike route in a neigh, total length of bike route in a neigh
select n.pri_neigh,
count(s.f_street) as totalroutes,
sum(st_length(s.geom)) as length,
sum(n.shape_area) as neigh_area
from jur_neighborhoods as n
left join cty_bikeroutes as s
on st_within(st_transform(s.geom,3435),n.geom)
group by pri_neigh
limit 100
)
–divide total lenght of bike route by neigh area
select pri_neigh,
sum(totalroutes),
sum(length),
(r.length/r.neigh_area)*100 as percentage
from rawdata as r
group by percentage, pri_neigh
order by percentage
;

Result

bicycle percentage table

Naming Conventions

For the database it is important to set up conventions for naming our files so everyone knows what files are and it will be more organized. A few of us met today to define major categories and this is what we came up with. We can discuss this tomorrow further, so it is more suitable for everyones projects, but until then use this naming for the new database. Just a reminder… do not use spaces or upper case letters in your table names.

City Geom:
– ex. Sidewalks, City Limits, Streets, Building Footprints
– should ideally just be an “id” and “geom” column as this will be used mainly for visualizing in QGIS
– start layers in this category with cty_

Transportation:
– ex. Bus Routes, CTA Stops
– start layers in this category with trn_

Demographics:
– ex. Census by Block 2010, Income by Block 2000
– start layers in this category with dmo_

Landscape:
– ex. Parks, Trees
– start layers in this category with lnd_

Jurisdiction:
– ex. Police Districts, TIF Districts, Neighborhoods, Wards
– start layers in this category with jur_

Report:
– ex. Crime 2013, 311 Requests, Permits, Business Licenses
– start layers in this category with rpt_

Social:
– ex. Facebook API, Yelp API
– start layers in this category with soc_

Agent:
– ex. Customer, Alderman, Dogs
– start layers in this category with agt_

conventions

Another place to get data: https://opendata.socrata.com/
(worldwide, but has all Chicago Data Portal data and then some)

SQL and PgRouting Continued

In order to get pg routing to work, first a transportation network must be formed with a source and target column created to specify which start and end points are used. The SQL below creates a network  in a buffer of a 1 mile radius and examines the Damen CTA Stop. To alter the name of the table created, change the buffer distance, or to change the CTA Stop, change the red values in the SQL below.

CREATE TABLE damennetwork AS
WITH ctastop AS
(SELECT * FROM cta_railstations
WHERE gid = 32),
transpo AS
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id, 
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
st_startpoint(ST_LineMerge(transportation.geom)) as source,
st_endpoint(ST_LineMerge(transportation.geom)) as target
FROM ctastop
LEFT JOIN transportation
ON st_within(transportation.geom, 
st_setsrid(st_buffer(ctastop.geom, 5280), 3435))),
nodes AS
(SELECT DISTINCT transpo.source AS gid 
FROM transpo
UNION
SELECT DISTINCT transpo.target AS gid 
FROM transpo)
SELECT transpo.gid, transpo.id, transpo.name, transpo.cost, transpo.geom, source.id as source, target.id as target
FROM transpo
JOIN 
(SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id, 
nodes.gid AS geom
FROM nodes
GROUP BY nodes.gid)
AS source ON transpo.source = source.geom
JOIN 
(SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id, 
nodes.gid AS geom
FROM nodes
GROUP BY nodes.gid) 
AS target ON transpo.target = target.geom

To be able to find a route with this SQL use the below SQL. Input the name of your table used in the previous SQL where there are the “damennetwork” in red. To change to start and end points, change to “502” and “341” respectfully.

SELECT seq,id1 as node, id2 as edge, route.cost, damennetwork.name as streetname, damennetwork.geom 
FROM pgr_dijkstra('
 SELECT gid AS id,
 source::integer,
 target::integer,
 cost::double precision AS cost
 FROM damennetwork', 502, 341, false, false ) 
AS route
LEFT JOIN damennetwork
ON route.id2 = damennetwork.gid;

What we have been working on is using the CTA stop as a start point and all the venues as the end points. The first thing I did was take all the venues in the same buffer as the network and corresponded them to their closest edge (the target). Different buffers and train stops can be chosen by the the red numbers.

WITH ctastop AS
(SELECT * FROM cta_railstations
WHERE gid = 32),
p AS
(SELECT
venues.gid AS gid, 
venues."legal name" AS company,
venues."doing busi" AS name,
venues.address AS address,
venues.latitude AS lat,
venues.longitude AS lon,
venues.geom AS geom
FROM 
ctastop
LEFT JOIN venues
ON st_within(venues.geom, 
st_buffer(ctastop.geom, 5280)))
SELECT rcost.*, p_to_l.targetID, p_to_l.geom FROM
(SELECT p.gid as restaurantID,
min(st_distance(p.geom,l.geom)) as distance
FROM p, damennetwork AS l
GROUP BY restaurantid) 
AS rcost
LEFT JOIN 
(SELECT p.gid as restaurantID, l.target as targetID, l.geom,
st_distance(p.geom,l.geom) as distance
FROM p, damennetwork AS l) 
AS p_to_l
ON rcost.distance = p_to_l.distance
AND rcost.restaurantID = p_to_l.restaurantID

The same must be done to find the start point (source) which is the CTA stop. Change to red value to select a different trainstop.

SELECT * FROM (WITH ctastop AS
(SELECT * FROM cta_railstations
WHERE gid = 32)
SELECT row_number() OVER (ORDER BY sourceid.distance)::integer AS gid,
sourceid.* FROM (
SELECT p.gid as restaurantID, l.source as sourceID, l.geom,
st_distance(p.geom,l.geom) as distance
FROM ctastop AS p, damennetwork AS l) as sourceid) as source
WHERE source.gid = 1

This gave me a result of a 292 start point, which I will be plugging into the next SQL (in red). The other numbers are the results from the venue target table formed two SQLs ago.

CREATE TABLE routetestwm AS
SELECT seq,id1 as path, id2 as node, id3 as edge, route.cost, damennetwork.name, damennetwork.geom
FROM pgr_kdijkstraPath('
 SELECT gid AS id,
 source::integer,
 target::integer,
 cost::double precision AS cost
 FROM damennetwork', 292, array[336,457,227,179,179,291,351,515,424,244,427,250,349,457,138,291,291,263,230,117,115,208,325,594,284,301,368,116,91,471,543,398,251,251,251,230,230,365,138,463,404,301,356,212,342,203,203,342,478,286,292,519,41,242,242,242,242,242,242,356,57,57,289,310,286,452,277,251,212,427,251,182,242,471,445,251,251,93,291,286,305,262,46,262,46,262,288,288,371,117,398,105,105,286], 
 false, false ) 
AS route
LEFT JOIN damennetwork
ON route.id3 = damennetwork.gid;

This table can be brought into qgis as the network of paths from a CTA stop to any venue.

All Routes

SQL and PG routing

In order to begin pgrouting you must first set up a a table containing all of your nodes and all of your edges.

My process started with taking all of the roads within a mile buffer of the train stop, finding the nodes at the intersections, then combining the data to create a usable network for routing. A simple tutorial on this can be found under the Create a Routable Road Network section on this blog post.

My interpretation is combined into one step below. Where it says damennetwork is where I name my table. This table will be referenced in the pgRouting SQL later. Anywhere that the SQL says 32 is where I am referencing the gid of the damen train stop. This can be changed to any CTA train stop in Chicago.

CREATE TABLE damennetwork AS
(SELECT transpo.gid, transpo.id, transpo.name, transpo.cost, transpo.geom, source.id as source, target.id as target
FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435)))
AS transpo
JOIN
(SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id,
nodes.gid AS geom
FROM (
SELECT DISTINCT transpo.source AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
UNION
SELECT DISTINCT transpo.target AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
) AS nodes
GROUP BY nodes.gid)
AS source ON transpo.source = source.geom
JOIN
(SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id,
nodes.gid AS geom
FROM (
SELECT DISTINCT transpo.source AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
UNION
SELECT DISTINCT transpo.target AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
) AS nodes
GROUP BY nodes.gid)
AS target ON transpo.target = target.geom);

When plugged into QGIS, this is the result:

damennetwork

 

 

…. TO BE CONTINUED

HOW TO IMPORT DATA TO RAPID MINER

You  just copy and paste this code into your rapid miner XML view(remove the previous xml code), click the green check symbol and switch back to the normal Process (diagram) view.

Code:
<?xml version=”1.0″ encoding=”UTF-8″ standalone=”no”?>
<process version=”5.2.008″>
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated=”true” compatibility=”5.2.008″ expanded=”true” name=”Process”>
<process expanded=”true” height=”145″ width=”279″>
<operator activated=”true” compatibility=”5.2.008″ expanded=”true” height=”60″ name=”Read CSV” width=”90″ x=”179″ y=”75″>
<parameter key=”csv_file” value=”C:\iris.data”/>
<parameter key=”column_separators” value=”,”/>
<parameter key=”first_row_as_names” value=”false”/>
<list key=”annotations”/>
<parameter key=”encoding” value=”windows-1252″/>
<list key=”data_set_meta_data_information”>
<parameter key=”0″ value=”att1.true.real.attribute”/>
<parameter key=”1″ value=”att2.true.real.attribute”/>
<parameter key=”2″ value=”att3.true.real.attribute”/>
<parameter key=”3″ value=”att4.true.real.attribute”/>
<parameter key=”4″ value=”att5.true.polynominal.attribute”/>
</list>
</operator>
<connect from_op=”Read CSV” from_port=”output” to_port=”result 1″/>
<portSpacing port=”source_input 1″ spacing=”0″/>
<portSpacing port=”sink_result 1″ spacing=”0″/>
<portSpacing port=”sink_result 2″ spacing=”0″/>
</process>
</operator>
</process>
The second way is that belowhttp://www.youtube.com/watch?v=cVjyJ9Ag0_0

http://rapid-i.com/videos/tutorials/02_rm5_data_import/rm5_data_import.html