Tutorials and Resources for Spatial Analytics in PostGIS SQL and QGIS

This post is a repository of useful tutorials, workshops, demonstrations, and reference for spatial analytics and modeling in the Postgres, PostGIS, and QGIS. Please add links to this post as you discover useful sites (include a brief description of the site).

  1. w3schools – a very useful site on the fundamentals of (non-spatial) SQL queries, including tutorials on joins and SQL functions such as AVG()
  2. boundlessgeo – an excellent primer on many of the PostGIS spatial functions, great place to get your feet wet with PostGIS spatial analytics/modeling
  3. spatialthoughts – this site provides tutorials in a wide range of techniques in QGIS
  4. linfinity – this site provides a number of tutorials clearly illustrating spatial analytic approaches using QGIS
  5. Geospatial Analysis – 4th Edition” by de Smith, Goodchild, Longley – this textbook provides the conceptual and geometric principles of spatial analytics and modeling – an excellent resource.

Linking Point Data to Polygon Data Using QGIS and PostGIS SQL

This post explains two ways to use spatial joins between tables that have point and polygon geom fields. The first approach uses built in tools in QGIS to quickly make basic joins for initial analysis. The second approach uses PostGIS SQL queries with the potential to do far more sophisticated joins and data analysis.

QGIS Vector/Geometry Tools:

Screen Shot 2014-01-26 at 9.47.30 PM

QGIS provides a number of tools for analyzing and rebuilding vector layers in the “Vector” tab. This includes the “Geometry Tools/Multipart to Singleparts” command, which breaks a multipolygon into individual polygon elements. The “Analysis Tools/Points In Polygons” command is shown above. When a polygon layer and a points layer are selected, it will produce a new shapefile with the polygons + a new field that is a count of the number of points in each polygon. Here is a detailed walkthrough for this tool.

PostgreSQL Queries:

These queries use an INNER JOIN on rows where the result of the “st_within()” postgis function == TRUE. This function returns TRUE if the first geometry (“pts.geom”) is completely within the second geometry (“blocks.geom”).

This code will create a new table with all the fields from the point table for all of the points that fall within one of the polygons, and add an additional field that shows an ID for the polygon in which that point falls:

SELECT pts.*, blocks.gid as blockID
FROM “Detroit_BP1995_point” AS pts
INNER JOIN “Detroit_CityMap3_region_individualPoly” AS blocks
ON st_within(pts.geom, blocks.geom) AS result

This code joins the point table created by the SQL above to the original polygon table, generating a new polygon table with a count of points within each polygon. This allows for more sophisticated calculations, for instance, normalizing counts against the size of each polygon, etc. (This SQL needs some more work):

SELECT blockID, number_of_permits, total_cost, average_cost, geom FROM(
SELECT pts.blockID, COUNT(blockID) AS number_of_permits, SUM(“COSTBP”) AS total_cost, AVG(“COSTBP”) AS average_cost
FROM detroitTest04b as pts
LEFT JOIN “Detroit_CityMap3_region_individualPoly” as blocks
ON blocks.gid = pts.blockID
GROUP BY blockID) AS results
LEFT JOIN “Detroit_CityMap3_region_individualPoly” as blocks
ON results.blockID = blocks.gid

Creating Convex Hulls with SQL to QGIS: Part 2

We created clusters (convex hulls) of venues making distinctions between the venues by categories. We started off by creating a mathematical algorithm that combines multiple equations:

  • We needed to determine a method of comparison for all the venues. Our method used the average distance between all the venues. We used the average distance, which we calculated by comparing the distance from one venue to all other venues and doing this for all the venues and then dividing the number of venues. In essence, it is a permutation with repetition calculation.
  • After we calculated the average distance we set a threshold divider of the average distance to determine the size of the clusters that we wanted to use.
  • Then we had to match the venue points of comparison to cluster only by category whereby venue points would only cluster when one venue category was equal to another venue category.
  • The next step was to create the shape of the cluster creating a geometry polygon to import as a PostGIS shapefile.

Step 1:

CREATE TABLE avgdistances2 AS

SELECT * FROM  (SELECT f1.gid as v1id, f1.fsid, f2.gid as v2id, st_distance(f1.geom,f2.geom) AS dist, f1.upperlevelcat as f1ul, f2.upperlevelcat as f2ul

FROM fsvenuewithupperlevelcat2 as f1

CROSS JOIN fsvenuewithupperlevelcat2 as f2  WHERE f1.checkinct > 200 and f1.gid<>f2.gid) AS q1

WHERE q1.dist < ( SELECT SUM(avgdist)/count(hack) as avgdisttotal    

FROM     (SELECT f1.gid, 1 as hack, SUM(st_distance(f1.geom,f2.geom))/COUNT(f1.gid) AS avgdist

FROM fsvenue as f1

CROSS JOIN fsvenue as f2

WHERE  f1.gid <> f2.gid     GROUP BY f1.gid) as subq    GROUP BY subq.hack)/4 AND f1ul=f2ul  ORDER BY q1.v1id;

Step 2:

CREATE TABLE fsvenue_cathulls(   gid serial NOT NULL,   fsid character varying(256),   CONSTRAINT fsvenue_hulls_pkey PRIMARY KEY (gid) ) WITH (   OIDS=FALSE );

ALTER TABLE fsvenue_hulls  OWNER TO poweruser; GRANT ALL ON TABLE hullmp3 TO poweruser;

SELECT AddGeometryColumn(‘fsvenue_hulls’,’geom’,3435,’MULTIPOINT‘,2);

INSERT INTO fsvenue_hulls (fsid,geom) SELECT ad.fsid, st_collect(v.geom)  FROM avgdistances2 AS ad Left JOIN fsvenue AS v ON ad.v2id = v.gid WHERE v1id <> 42 GROUP BY ad.fsid ORDER BY ad.fsid;

CREATE VIEW fsvenue_cathulls AS SELECT v.gid, v.fsid, v.upperlevelcat, st_convexhull(h.geom)::geometry(‘Polygon’, 3435) as geom FROM fsvenue_hulls as h LEFT JOIN fsvenuewithupperlevelcat2 as v ON h.fsid = v.fsid WHERE st_npoints(h.geom) > 3;

SELECT * FROM fsvenue_cathulls;

Step 3:

Import PostGIS file to QGIS

Result from category 1: Arts & Entertainment

Bucktown & Wicker Park

arts

 

SQL and PG routing

In order to begin pgrouting you must first set up a a table containing all of your nodes and all of your edges.

My process started with taking all of the roads within a mile buffer of the train stop, finding the nodes at the intersections, then combining the data to create a usable network for routing. A simple tutorial on this can be found under the Create a Routable Road Network section on this blog post.

My interpretation is combined into one step below. Where it says damennetwork is where I name my table. This table will be referenced in the pgRouting SQL later. Anywhere that the SQL says 32 is where I am referencing the gid of the damen train stop. This can be changed to any CTA train stop in Chicago.

CREATE TABLE damennetwork AS
(SELECT transpo.gid, transpo.id, transpo.name, transpo.cost, transpo.geom, source.id as source, target.id as target
FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435)))
AS transpo
JOIN
(SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id,
nodes.gid AS geom
FROM (
SELECT DISTINCT transpo.source AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
UNION
SELECT DISTINCT transpo.target AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
) AS nodes
GROUP BY nodes.gid)
AS source ON transpo.source = source.geom
JOIN
(SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id,
nodes.gid AS geom
FROM (
SELECT DISTINCT transpo.source AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
UNION
SELECT DISTINCT transpo.target AS gid FROM
(SELECT
row_number() OVER (ORDER BY transportation.gid)::integer AS gid,
transportation.gid AS id,
transportation.street_nam AS name,
transportation.length AS cost,
transportation.geom,
transportation.geom AS source,
transportation.geom AS target
FROM
(SELECT * FROM cta_railstations
WHERE gid = 32) AS damen
LEFT JOIN transportation
ON st_within(transportation.geom,
st_setsrid(st_buffer(damen.geom, 5280), 3435))) AS transpo
) AS nodes
GROUP BY nodes.gid)
AS target ON transpo.target = target.geom);

When plugged into QGIS, this is the result:

damennetwork

 

 

…. TO BE CONTINUED

Group 1: Where are we?

Ideas

The launch point of our conversation regarding the prototype which is focused on tracking people and things, hovered around the idea of “dynamic transportation networks”.  An interesting and ambitious idea to say the least, but perhaps the idea is more solution than design tool.  To open up the proposition and get at some of the fundamentals that might drive the components of such a solution toward design tools, lets abstract back from bus’s + people.

At the very least one would have to know about the following things:

  • where groups of people were at and where they wanted to go
  • the state of the network (roads) on which the buses operated
  • where buses are within the network and what the state of those buses are

To generate dynamic routes for the buses would then involve an algorithm which could find and evaluate paths within the network.  This is not unlike systems currently in use by emergency management services to dispatch people and resources for things such as building fires.

To continue the abstraction we could say that the three things we need to be able to do are

  • identify properties of agents which occupy the network in continuous and discontinuous ways.  What or who is at or near a particular node or edge within the network and what properties are exposed to them.
  • manage the nodes and edges of the network, what are their properties? how do we logically traverse the edges of the network?
  • formulate singular paths or sub networks from within the network.  Consider the way in which a single path through a collection of network nodes, is itself a network, if only a simple one

In this way, the underlying mechanisms that enable us to find say the fastest, or shortest route for a bus might be the same mechanisms that we would use to form ad-hoc networks through nearly any system which can be considered as a network.  Perhaps the prototyped design tool might allow it’s user to generate or identify particular sub networks which a person or thing is connected to based on where they are at in space?  Of course one of the more interesting conditions of this type of network thinking, still has to do with our “position” or “location” in a network, but perhaps those networks might be logical networks but not physical networks.  For instance, where are you located with respect to the social or professional network that is the Architecture community of Chicago, or of the world.  Through forming, collecting and analyzing these sub networks over time perhaps we can begin to understand not only where we are or what we are connected to, but how the super networks might be better organized.  What kinds of feedback loops can be  established here?

Ultimately this question of “Where Is Something”, physically, logically, semantically, is at the heart of what will see in the emerging paradigm of contextual computing.

Technology

Graphs

The graph, a term for network from the area of mathematics referred to as graph theory, describes a collection of nodes and the edges which connect them.  Graph theory is one of the most critical and fundamental aspects to computation and data.  Much of the underlying data structures for modern software rely on graph representations of system components.   Learning how to logically or computationally move or traverse through graphs is essential to searching, sorting, analyzing, and generating: algorithm’s such as Depth-first searchBreadth-first searchDijkstra’s algorithmNearest neighbour algorithm.  More specifically algorithm’s such as A* pathfinding algorithms, which underpin much of the artificial intelligence world from video games to data mining, enable goal or objective based navigation of complex networks.

Networks in Databases
To get started operating on data networks it is imperative that we properly store our networked data.  While we will likely not be utilizing explicitly graph oriented databases, it is worth mentioning that databases organized around the premise of graph’s exist.  In the studio and seminar we will be focusing on the utilization of the postgres/postgis relational database.  Fortunately the prevalence of networks in geospatial thinking has lead to the development of some critical tools which aid in our analysis of networks using postgis data structures.

pgRouting extends the PostGIS / PostgreSQL geospatial database to provide geospatial routing functionality.

Advantages of the database routing approach are:

  • Data and attributes can be modified by many clients, like Quantum GIS and uDigthrough JDBC, ODBC, or directly using Pl/pgSQL. The clients can either be PCs or mobile devices.
  • Data changes can be reflected instantaneously through the routing engine. There is no need for precalculation.
  • The “cost” parameter can be dynamically calculated through SQL and its value can come from multiple fields or tables.

In addition a series of tools for importing data such as street networks have been developed for pgRouting enabling aquisition from sources such as Open Street MapOSM2PGSQL, and OSM2PGROUTING
Tutorials: beginners guide, workshop (extensive)

Startup technologies (first prototype)
Without presuming too much about the prototype, it’s safe to say that the system would take some kind of input (perhaps from the physical world i.e. sensor or maybe through an interactive interface), analysis or computation would be performed and some type of network oriented output produced.

Because of the immediacy to topics of transportation and logistics which spear headed the project, I would suggest that you simply begin with the transportation network data which we have access to through the data portal and OSM.  A first pass tech prototype would therefore include:

  • the process of importing this data into pgRouting friendly structures
  • demonstrable querying and route formation using pgRouting SQL queries
  • the representation of generated routes within qgis or google earth.
  • Additional considerations might include the integration of CTA api data into our database.  This would involve a simple app which could parse the XML data from the CTA’s system into table structures in our PostGres db for inclusion with base network data.

From this point, we can validate functionality and begin to think more laterally about the criteria used to create and/or interact with the data.  We start with shape based networks (road center lines), developing an understanding of how pgRouting is working, then begin to explore other networks and forming networks on the fly rather than as shape file imports.

Research Terms

  • Graph Theory
  • Topology
  • Koenigsberg Bridge Problem
  • Dijkstra’s Algorithm
  • A* Pathfiinding algorithms
  • CTA Transportation API
  • XML to PGSQL
  • pgRouting
  • OSM2PGROUTING, OSM2PGSQL
  • Internet of Things
  • iBeacon, NFC, BlueToothLE
  • Mesh networking