This post is a repository of useful tutorials, workshops, demonstrations, and reference for spatial analytics and modeling in the Postgres, PostGIS, and QGIS. Please add links to this post as you discover useful sites (include a brief description of the site).
- w3schools – a very useful site on the fundamentals of (non-spatial) SQL queries, including tutorials on joins and SQL functions such as AVG()
- boundlessgeo – an excellent primer on many of the PostGIS spatial functions, great place to get your feet wet with PostGIS spatial analytics/modeling
- spatialthoughts – this site provides tutorials in a wide range of techniques in QGIS
- linfinity – this site provides a number of tutorials clearly illustrating spatial analytic approaches using QGIS
- “Geospatial Analysis – 4th Edition” by de Smith, Goodchild, Longley – this textbook provides the conceptual and geometric principles of spatial analytics and modeling – an excellent resource.
Laster semester we utilize two kinds of clustering algorithms to do our analyze. The first one is distance based clustering, the second one is grid based clustering. Although logically they are very similar, both of them are forming clusters based on distances, they are different in doing this, and results can be different. Below is the logic of these 2 algorithms.
A. distance based clustering:
1. Buffering every single points with a distance which can be set by analyzers.
2. Merging circles which have larger overlaps than the setting number into clusters.
B. Grid based clustering
1. Set the distance of grid lines. Divide the target area by grid.
2. Locate points into cells, then look at neighbor cells of target cell. If there is point in theses neighbor cells, merge these points as core of a cluster.
3. Making convex hulls based on these cores of cluster. There is a parameter through which you can control the size of clusters.
Blow is the SQL for Grid based clustering
WITH clstrtags AS ( SELECT *, tag.geom as tgeom FROM gridcluster(30,’urbantag’,’geom’) as grid
JOIN urbantag as tag
ORDER BY rid,cid
counts AS (SELECT count(tagid) as count, clusterid, activity FROM clstrtags GROUP BY clusterid, activity),
countss AS (SELECT count(tagid) as count, clusterid FROM clstrtags GROUP BY clusterid)
select counts.clusterid, counts.activity as act, counts.count as actct,countss.count as tagid, counts.count/(countss.count + 0.00) as percentage
from counts join countss
order by clusterid
This post explains two ways to use spatial joins between tables that have point and polygon geom fields. The first approach uses built in tools in QGIS to quickly make basic joins for initial analysis. The second approach uses PostGIS SQL queries with the potential to do far more sophisticated joins and data analysis.
QGIS Vector/Geometry Tools:
QGIS provides a number of tools for analyzing and rebuilding vector layers in the “Vector” tab. This includes the “Geometry Tools/Multipart to Singleparts” command, which breaks a multipolygon into individual polygon elements. The “Analysis Tools/Points In Polygons” command is shown above. When a polygon layer and a points layer are selected, it will produce a new shapefile with the polygons + a new field that is a count of the number of points in each polygon. Here is a detailed walkthrough for this tool.
These queries use an INNER JOIN on rows where the result of the “st_within()” postgis function == TRUE. This function returns TRUE if the first geometry (“pts.geom”) is completely within the second geometry (“blocks.geom”).
This code will create a new table with all the fields from the point table for all of the points that fall within one of the polygons, and add an additional field that shows an ID for the polygon in which that point falls:
SELECT pts.*, blocks.gid as blockID
FROM “Detroit_BP1995_point” AS pts
INNER JOIN “Detroit_CityMap3_region_individualPoly” AS blocks
ON st_within(pts.geom, blocks.geom) AS result
This code joins the point table created by the SQL above to the original polygon table, generating a new polygon table with a count of points within each polygon. This allows for more sophisticated calculations, for instance, normalizing counts against the size of each polygon, etc. (This SQL needs some more work):
SELECT blockID, number_of_permits, total_cost, average_cost, geom FROM(
SELECT pts.blockID, COUNT(blockID) AS number_of_permits, SUM(“COSTBP”) AS total_cost, AVG(“COSTBP”) AS average_cost
FROM detroitTest04b as pts
LEFT JOIN “Detroit_CityMap3_region_individualPoly” as blocks
ON blocks.gid = pts.blockID
GROUP BY blockID) AS results
LEFT JOIN “Detroit_CityMap3_region_individualPoly” as blocks
ON results.blockID = blocks.gid
PgRouting analyzes graphs to find the shortest path between two nodes through edges. In a street network, you can think of nodes as intersections and edges as the streets themselves. I will be using the Chicago street network, which can be found here. I have also uploaded it to the server, named chicagostreets.
1. In order to develop data capable of using PgRouting a source and target geometry must be set up at the start and end of each road. These will serve as the start and end destination points. To perform this follow the “Creating a routable road network” section of this tutorial (blog also has many helpful QGIS tutorials). Another thing to get familiar with in SQL is the WITH Clause. This allows you to reference other SQLs within one query.
2. Once your network is made you can start routing. Follow this tutorial to get multiple options for routing. pgr_dijkstra and pgr_kdijkstraPath are two that work well for visualizing one path or one startpoint to many paths.
3. By joining the geometries in your network to the results of your routing you can import this into QGIS and see your paths.
To reference my SQL query from the tutorial in class, click link.
We created clusters (convex hulls) of venues making distinctions between the venues by categories. We started off by creating a mathematical algorithm that combines multiple equations:
- We needed to determine a method of comparison for all the venues. Our method used the average distance between all the venues. We used the average distance, which we calculated by comparing the distance from one venue to all other venues and doing this for all the venues and then dividing the number of venues. In essence, it is a permutation with repetition calculation.
- After we calculated the average distance we set a threshold divider of the average distance to determine the size of the clusters that we wanted to use.
- Then we had to match the venue points of comparison to cluster only by category whereby venue points would only cluster when one venue category was equal to another venue category.
- The next step was to create the shape of the cluster creating a geometry polygon to import as a PostGIS shapefile.
CREATE TABLE avgdistances2 AS
SELECT * FROM (SELECT f1.gid as v1id, f1.fsid, f2.gid as v2id, st_distance(f1.geom,f2.geom) AS dist, f1.upperlevelcat as f1ul, f2.upperlevelcat as f2ul
FROM fsvenuewithupperlevelcat2 as f1
CROSS JOIN fsvenuewithupperlevelcat2 as f2 WHERE f1.checkinct > 200 and f1.gid<>f2.gid) AS q1
WHERE q1.dist < ( SELECT SUM(avgdist)/count(hack) as avgdisttotal
FROM (SELECT f1.gid, 1 as hack, SUM(st_distance(f1.geom,f2.geom))/COUNT(f1.gid) AS avgdist
FROM fsvenue as f1
CROSS JOIN fsvenue as f2
WHERE f1.gid <> f2.gid GROUP BY f1.gid) as subq GROUP BY subq.hack)/4 AND f1ul=f2ul ORDER BY q1.v1id;
CREATE TABLE fsvenue_cathulls( gid serial NOT NULL, fsid character varying(256), CONSTRAINT fsvenue_hulls_pkey PRIMARY KEY (gid) ) WITH ( OIDS=FALSE );
ALTER TABLE fsvenue_hulls OWNER TO poweruser; GRANT ALL ON TABLE hullmp3 TO poweruser;
INSERT INTO fsvenue_hulls (fsid,geom) SELECT ad.fsid, st_collect(v.geom) FROM avgdistances2 AS ad Left JOIN fsvenue AS v ON ad.v2id = v.gid WHERE v1id <> 42 GROUP BY ad.fsid ORDER BY ad.fsid;
CREATE VIEW fsvenue_cathulls AS SELECT v.gid, v.fsid, v.upperlevelcat, st_convexhull(h.geom)::geometry(‘Polygon’, 3435) as geom FROM fsvenue_hulls as h LEFT JOIN fsvenuewithupperlevelcat2 as v ON h.fsid = v.fsid WHERE st_npoints(h.geom) > 3;
SELECT * FROM fsvenue_cathulls;
Import PostGIS file to QGIS
Result from category 1: Arts & Entertainment
Bucktown & Wicker Park
In order to get pg routing to work, first a transportation network must be formed with a source and target column created to specify which start and end points are used. The SQL below creates a network in a buffer of a 1 mile radius and examines the Damen CTA Stop. To alter the name of the table created, change the buffer distance, or to change the CTA Stop, change the red values in the SQL below.
CREATE TABLE damennetwork AS WITH ctastop AS (SELECT * FROM cta_railstations WHERE gid = 32), transpo AS (SELECT row_number() OVER (ORDER BY transportation.gid)::integer AS gid, transportation.gid AS id, transportation.street_nam AS name, transportation.length AS cost, transportation.geom, st_startpoint(ST_LineMerge(transportation.geom)) as source, st_endpoint(ST_LineMerge(transportation.geom)) as target FROM ctastop LEFT JOIN transportation ON st_within(transportation.geom, st_setsrid(st_buffer(ctastop.geom, 5280), 3435))), nodes AS (SELECT DISTINCT transpo.source AS gid FROM transpo UNION SELECT DISTINCT transpo.target AS gid FROM transpo) SELECT transpo.gid, transpo.id, transpo.name, transpo.cost, transpo.geom, source.id as source, target.id as target FROM transpo JOIN (SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id, nodes.gid AS geom FROM nodes GROUP BY nodes.gid) AS source ON transpo.source = source.geom JOIN (SELECT row_number() OVER (ORDER BY nodes.gid)::integer AS id, nodes.gid AS geom FROM nodes GROUP BY nodes.gid) AS target ON transpo.target = target.geom
To be able to find a route with this SQL use the below SQL. Input the name of your table used in the previous SQL where there are the “damennetwork” in red. To change to start and end points, change to “502” and “341” respectfully.
SELECT seq,id1 as node, id2 as edge, route.cost, damennetwork.name as streetname, damennetwork.geom FROM pgr_dijkstra(' SELECT gid AS id, source::integer, target::integer, cost::double precision AS cost FROM damennetwork', 502, 341, false, false ) AS route LEFT JOIN damennetwork ON route.id2 = damennetwork.gid;
What we have been working on is using the CTA stop as a start point and all the venues as the end points. The first thing I did was take all the venues in the same buffer as the network and corresponded them to their closest edge (the target). Different buffers and train stops can be chosen by the the red numbers.
WITH ctastop AS (SELECT * FROM cta_railstations WHERE gid = 32), p AS (SELECT venues.gid AS gid, venues."legal name" AS company, venues."doing busi" AS name, venues.address AS address, venues.latitude AS lat, venues.longitude AS lon, venues.geom AS geom FROM ctastop LEFT JOIN venues ON st_within(venues.geom, st_buffer(ctastop.geom, 5280))) SELECT rcost.*, p_to_l.targetID, p_to_l.geom FROM (SELECT p.gid as restaurantID, min(st_distance(p.geom,l.geom)) as distance FROM p, damennetwork AS l GROUP BY restaurantid) AS rcost LEFT JOIN (SELECT p.gid as restaurantID, l.target as targetID, l.geom, st_distance(p.geom,l.geom) as distance FROM p, damennetwork AS l) AS p_to_l ON rcost.distance = p_to_l.distance AND rcost.restaurantID = p_to_l.restaurantID
The same must be done to find the start point (source) which is the CTA stop. Change to red value to select a different trainstop.
SELECT * FROM (WITH ctastop AS (SELECT * FROM cta_railstations WHERE gid = 32) SELECT row_number() OVER (ORDER BY sourceid.distance)::integer AS gid, sourceid.* FROM ( SELECT p.gid as restaurantID, l.source as sourceID, l.geom, st_distance(p.geom,l.geom) as distance FROM ctastop AS p, damennetwork AS l) as sourceid) as source WHERE source.gid = 1
This gave me a result of a 292 start point, which I will be plugging into the next SQL (in red). The other numbers are the results from the venue target table formed two SQLs ago.
CREATE TABLE routetestwm AS SELECT seq,id1 as path, id2 as node, id3 as edge, route.cost, damennetwork.name, damennetwork.geom FROM pgr_kdijkstraPath(' SELECT gid AS id, source::integer, target::integer, cost::double precision AS cost FROM damennetwork', 292, array[336,457,227,179,179,291,351,515,424,244,427,250,349,457,138,291,291,263,230,117,115,208,325,594,284,301,368,116,91,471,543,398,251,251,251,230,230,365,138,463,404,301,356,212,342,203,203,342,478,286,292,519,41,242,242,242,242,242,242,356,57,57,289,310,286,452,277,251,212,427,251,182,242,471,445,251,251,93,291,286,305,262,46,262,46,262,288,288,371,117,398,105,105,286], false, false ) AS route LEFT JOIN damennetwork ON route.id3 = damennetwork.gid;
This table can be brought into qgis as the network of paths from a CTA stop to any venue.