Tutorials and Resources for Spatial Analytics in PostGIS SQL and QGIS

This post is a repository of useful tutorials, workshops, demonstrations, and reference for spatial analytics and modeling in the Postgres, PostGIS, and QGIS. Please add links to this post as you discover useful sites (include a brief description of the site).

  1. w3schools – a very useful site on the fundamentals of (non-spatial) SQL queries, including tutorials on joins and SQL functions such as AVG()
  2. boundlessgeo – an excellent primer on many of the PostGIS spatial functions, great place to get your feet wet with PostGIS spatial analytics/modeling
  3. spatialthoughts – this site provides tutorials in a wide range of techniques in QGIS
  4. linfinity – this site provides a number of tutorials clearly illustrating spatial analytic approaches using QGIS
  5. Geospatial Analysis – 4th Edition” by de Smith, Goodchild, Longley – this textbook provides the conceptual and geometric principles of spatial analytics and modeling – an excellent resource.
Advertisements

UML Modeling + Class Diagrams

UML (Unified Modeling Language) is a standardized notational system for diagramming the relationships between objects. It is designed to help organize programming code, particularly in object-oriented programming languages. It is a useful way to quickly diagram the various components of a code and their relationships to each other. It provides a bridge between the real world elements to be modeled and the logic of the computer code.

We will be using UML Class Diagrams as a way to organize our research and specify the relevant objects in the system we intend to model. Class diagrams describe the structure of objects to be modeled and their relationships to each other. Understood in its simplest form, it describes a taxonomy of objects. For instance, a taxonomy of animal “objects”:

2-classification

The most important question when making a class diagram is: WHAT IS ESSENTIAL?

If we were interested in modeling cats, we might ask what is essential to the family felidae? What characteristics and behaviors are shared by all members of this family? When starting a Class Diagram, begin with the objects themselves, before considering any relationship between objects. An object has three types of components: NameAttributes, and Operations. Attributes are the variables that define the characteristics of a class (size, color, id, etc.), operations are behaviors the class is capable of preforming (move(), eat(), addValue(), etc.).

Screen Shot 2014-01-30 at 1.02.52 AM

The Tiger Class inherits all of the attributes and operations from the Felidae Class, adding additional attributes and operations that are essential and particular to the tiger. The class diagrams moves from the general to the specific by way of inheritance. Inheritance describes the parent-child relationship, and the passing of shared traits. The inheritance relationship is of the form ‘IS-A.’ a Tiger IS A Felidae.

When we define an attribute, we give it a name and a data type. For instance, the attribute “weight” is of data type “double” (a floating point number), and “numberStripes” is an “int” (an integer). So, weight could be 356.45, while number of stripes will always be a whole number, i.e 37. Common data types are:

  1. int: integer (1,2,3,4,5…)
  2. float: floating point number (4.5353)
  3. double: a more accurate kind of floating point number (5.23423545)
  4. character: one character of any kind (“a”, “6”, “%”)
  5. String: a list of characters (“hello world”, “5555”, “test123”)
  6. Boolean: true/false
  7. Other classes!!!

Data type #7 is a special case. Once you start defining attributes in one class as an object from another class, you introduce the second kind of relationship in Class Diagrams. This is called association. If inheritance is a ‘IS-A‘ relationship, association is a ‘HAS-A‘ relationship. In the example below, a person HAS A cup and a cup HAS A owner – The attribute “item” has data type “Cup”:

Screen Shot 2014-01-30 at 12.08.22 AM

In the above diagram, you can start to see the power of inheritance. By defining the data type of “item” as the class “Cup”, we have established a relatively simple structure to allow the “person” to hold many different kinds of cups. Any child of Cup could be used in the attribute “item.” A person could hold a coffee cup or a plastic cup, or both, with all their particular characteristics and behaviors.

As you can see, a Class Diagram with only 4 classes already begins to describe fairly complex relationships. With this in mind, I would encourage you to start your diagrams small and  work through the implications of their attributes, operations and data types. Make a separate class diagram for each of the objects with which you are working . Only start connecting them when an explicit inheritance or association relation will allow you to describe the system you are modeling with more clarity. It is much better to have a number of smaller diagrams that are fully worked out and legible then one big spaghetti diagram that nobody can understand.

Always, the question in your head should be WHAT IS ESSENTIAL? What is essential to your model, and what is essential to each class? Remove anything that does not have a specific, real, verifiable relationship to your model. The goal of UML diagramming is to find the simplest way to model complex situations.

There are many resources on the web for UML modeling. You can find more detailed information on the notation of class diagrams here. Some video tutorials for making class diagrams in the Astah UML modeler can be found here and here.

Linking Point Data to Polygon Data Using QGIS and PostGIS SQL

This post explains two ways to use spatial joins between tables that have point and polygon geom fields. The first approach uses built in tools in QGIS to quickly make basic joins for initial analysis. The second approach uses PostGIS SQL queries with the potential to do far more sophisticated joins and data analysis.

QGIS Vector/Geometry Tools:

Screen Shot 2014-01-26 at 9.47.30 PM

QGIS provides a number of tools for analyzing and rebuilding vector layers in the “Vector” tab. This includes the “Geometry Tools/Multipart to Singleparts” command, which breaks a multipolygon into individual polygon elements. The “Analysis Tools/Points In Polygons” command is shown above. When a polygon layer and a points layer are selected, it will produce a new shapefile with the polygons + a new field that is a count of the number of points in each polygon. Here is a detailed walkthrough for this tool.

PostgreSQL Queries:

These queries use an INNER JOIN on rows where the result of the “st_within()” postgis function == TRUE. This function returns TRUE if the first geometry (“pts.geom”) is completely within the second geometry (“blocks.geom”).

This code will create a new table with all the fields from the point table for all of the points that fall within one of the polygons, and add an additional field that shows an ID for the polygon in which that point falls:

SELECT pts.*, blocks.gid as blockID
FROM “Detroit_BP1995_point” AS pts
INNER JOIN “Detroit_CityMap3_region_individualPoly” AS blocks
ON st_within(pts.geom, blocks.geom) AS result

This code joins the point table created by the SQL above to the original polygon table, generating a new polygon table with a count of points within each polygon. This allows for more sophisticated calculations, for instance, normalizing counts against the size of each polygon, etc. (This SQL needs some more work):

SELECT blockID, number_of_permits, total_cost, average_cost, geom FROM(
SELECT pts.blockID, COUNT(blockID) AS number_of_permits, SUM(“COSTBP”) AS total_cost, AVG(“COSTBP”) AS average_cost
FROM detroitTest04b as pts
LEFT JOIN “Detroit_CityMap3_region_individualPoly” as blocks
ON blocks.gid = pts.blockID
GROUP BY blockID) AS results
LEFT JOIN “Detroit_CityMap3_region_individualPoly” as blocks
ON results.blockID = blocks.gid

Uploading and Visualizing Census Data in QGIS

Census data is accessible in two forms on the US Census site: Tiger/Line Geographic Shapefiles and as Tabulated Data (CSV) from the census proper, including a range of demographic, housing, and economic data. Follow these steps to join these two data types to generate geolocated maps of this data in QGIS (for a slightly outdated, but very detailed step-by-step tutorial, go here):

  1. find and download the census regions Shapefiles you are interested in from the Tiger/Line site. Choose appropriate region and granularity – census blocks are the smallest geographical unit used by the US census. Make sure to download the tiger regions from the year that corresponds to the tabular data you are getting (i.e. 2010, 2000, etc.)
  2. find and download the Tabulated datasets you are interested in from the US census factfinder2 site. (Click on the “Advanced Search” tab; Click the “Geography” tab; Click the “Name” tab; Enter the Name of the Region you are interested in – “Wayne County, Michigan” for instance; In the “Geography Filters” Column pick the appropriate geographical unit – e.g. “block”. After all of this, it should give you a list of regions that meet the search criteria. Select the region(s) you want and press “Add”. Close the Geographies pop-up, and you should see all of the tabulated datasets for this region and geographical unit. Select the dataset you want to download and press “View.” Verify the data is what you want and Press “Download.” In the download dialogue box, check the “Data and Annotations in Separate Files” box and press “OK” to start the download. This will bring up one more dialogue window with a progress bar. Once the progress is complete, press “download” to finally download the Zip file.
  3. Screen Shot 2014-01-21 at 8.05.31 PM
  4. Follow this tutorial to prep and import the two files to QGIS and join them based on the appropriate geometryID column (e.g. the block ID #) – You may want to do some cleaning of the csv file to remove any unwanted columns and to ensure the geoID column data matches with the shapefile. Use the find/replace command in Excel (CTRL-F) to facilitate this work.

Worldwide GIS database

A worldwide map containing downloadable geolocated, data-rich shapefiles is now available on the OpenStreetMap site. The information is exported as an .osm file, a type of XML file that can be read and dynamically imported by CityEngine. While the information contained in the geo-database consists of base layers from mapquest (street centerlines, street widths, building footprints and heights, religious centers, hospitals, bike paths, public transportation lines, parks, etc.), it is a useful resource for generating data-rich base models from actual conditions. A tutorial describing how to import this file type (among other file types) and generate city-wide 3D geometry in CityEngine can be found here:

Small Technologies Make For Big Politics

“Hacktivist” Ricardo Dominguez, head of the Calit2’s b.a.n.g. (bits, atoms, neurons, genes) lab at UCSD, is exploring the ways accessible technologies can create new constituencies around otherwise intractable problem, dislodging institutional and political paralysis, and creating a great deal of controversy along the way. His Transborder Immigration Tool is a simple cell phone app that helps guide illegal immigrants to water stations placed in the desert between Mexico and Southern California, augmented reality as a tool for basic human survival and a lightning rod for political controversy:

7f658183b19cab7f58d1bfc72ff71432

More info on other projects by Dominguez, and a discussion on the controversies they have created here. Domiguez’ website has a number of his articles. CALIT2 supports a number of trans-disciplinary research initiatives that straddle technology, biology, nanotechnology, political agency.

Procedural City Modeling

Esri’s CityEngine has brought procedural (rule-based) modeling to the urban scale. Building out of software originally developed by the film industry (yet again: see Maya) to create and simulate vast cityscapes, CityEngine allows the designer to deploy city form and organization through the generation of rule-sets and a variety of base geometries. This opens up the very-real possibility of speculative city-design at a completely new scale and scope.

CityEngine also allows for the integration of 2D GIS data (shapefiles, geodatabases, etc.) into a 3D modeling environment. On the fly updating of GIS data through the CityEngine platform is also possible, as well as output to an interactive web format. This provides a rich platform for running scenarios in the city, working from real-world, real-time data, comprehensive rule sets, intuitive design, interactive commenting and reporting, etc. These rule sets have the potential encompass a wide range of existing and speculative urban planning paradigms. Esri has already developed a number of schema developed including the use of “Urban Transects” a model developed by New Urbanist Andrés Duany:

youtube tutorials channel:

CityEngineTV

 

links (free signin required):

Overview Video Workshop

Tutorial Gallery

Essential Skills Tutorial

Basic Tutorial on Integrating GIS data into CityEngine