Skip to content

arianagiorgi/postgis-intro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using SQL for GIS analysis with PostGIS

Contents

  1. Importing shapefiles into PostGIS
  2. Viewing tables in QGIS
  3. Spatial queries
  4. Exporting query results

Requirements

  • PostgreSQL
  • PostGIS
  • Postico
  • QGIS 3
  • ogr2ogr (for exporting, if applicable)

Data included

Importing shapefiles into PostGIS

  1. Create database and enable PostGIS in your database. From the terminal, run:
  $ createdb postgis_intro
  $ psql --dbname postgis_intro --command "CREATE EXTENSION postgis;"
  1. Determine your project's SRID (see our glossary).

The most popular SRID is usually EPSG:4326 (WGS 84). You can also look up your shapefile's SRID here by uploading your .prj file (which is where the projection information lives).

Our data files are all EPSG:4326.

  1. Use shp2pgsql command to write a SQL file that will contain table and data information for PostgreSQL to upload:
$ shp2pgsql -s <SRID> -I <path to shp file>.shp \
 <new postgres table name> > <path to output sql file>.sql

So our commands to upload all tables will look like:

$ shp2pgsql -s 4326 -I data/schools/schools.shp \
 schools > schools.sql
$ shp2pgsql -s 4326 -I data/school-districts/school-districts.shp \
 districts > school-districts.sql
$ shp2pgsql -s 4326 -I data/census-tracts/census-tracts.shp \
tracts > census-tracts.sql

Then we'll upload the sql file to the database we created:

$ psql --dbname <database> --file <path to output sql file>.sql

Our commands will look like:

$ psql --dbname postgis_intro --file schools.sql
$ psql --dbname postgis_intro --file school-districts.sql
$ psql --dbname postgis_intro --file census-tracts.sql

Viewing tables in QGIS

  1. In the Browser Panel (along left side), right click "PostGIS" and select "New Connection...".
  2. Enter connection information and then click "OK".
  3. You can now access the database's tables in the database manager, which is accessible by clicking "Database" in the menu and choosing "DB Manager...".
  4. In the DB manager you now have two choices:
    • You can add the entire table, as-is, by double-clicking it.
    • You can query the database using SQL and add the query result by opening the "SQL window". To write a query, click the second icon in the DB manager toolbar (it looks like a piece of paper with a wrench on it). Enter your query in that window and use the options at the bottom of the pane to add it to your project.

Spatial queries

PostGIS adds almost 300 new functions and operators to PostgreSQL that can do everything from basic spatial calculations to complex aggregations. It's easy to recognize these new functions because they all use the ST_ prefix (for spatial type), as you'll see below:

Basic calculations

Many functions perform simple calculations on a single geometry or pair of geometries. For example, we can see the area of each of our census tracts using the ST_Area function:

SELECT name, ST_Area(geom)
FROM tracts
ORDER BY id;

ST_Area and many functions like it that output measurements report them in the coordinate system of the geometries you're using in your query. In our case, that's WGS84, which tells us that the first census tract in our results has an area of 0.046862144283. That's because our coordinate system is in degrees.

If we had been using, say, NAD83 / Texas South Central, which is in feet, we would've gotten our answer in feet. We can see that in action if we transform our geometries to that coordinate system with ST_Transform:

SELECT name, ST_Area(ST_Transform(geom, 2278))
FROM tracts
ORDER BY id;

We now get an answer of 5296730440.84863 (square feet).

We can also solve this problem using Geographys. Geographies, unlike geometries, operate on spherical coordinates like the latitude and longitude values we're using here. Most common functions, when passed geographies instead of geometries, will return values in meters:

SELECT name, ST_Area(geom::GEOGRAPHY)
FROM tracts
ORDER BY id;

We now get an answer of 491172828.83503 (square meters).

Deriving new geometries

Some functions can calculate new geometries based on your existing ones. For example, we can get the centroid of all of our census tracts using:

SELECT name, ST_Centroid(geom)
FROM tracts
ORDER BY id;

While the original column contains POLYGON data, the value returned from ST_Centroid will be a POINT.

Similarly, we can create buffers around our school locations using ST_Buffer, which will derive polygons from the points:

SELECT campname, ST_Buffer(geom, 1)
FROM schools
ORDER BY campname;

The size of the buffer (the second function argument) in the example above is in the units of the coordinate system for the geometry. So the above would create a one degree buffer. To buffer by a kilometer, we can cast to a geography just like we did with ST_Area:

SELECT campname, ST_Buffer(geom::GEOGRAPHY, 1000)
FROM schools
ORDER BY campname;

Spatial relationships

We really start to see the power of spatial queries when we start to use PostGIS to join and filter out tables based on spatial relationship. For example, we can perform a point-in-polygon query to determine the district that each of our schools is in using the ST_Contains function:

SELECT schools.campname, districts.name as district_name
FROM schools
LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom);

Combined with a GROUP BY, we can now easily count the number of schools in each district:

SELECT districts.name, COUNT(*) AS num_schools
FROM schools
LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom)
GROUP BY districts.name
ORDER BY num_schools DESC;

ST_Contains is just one of several functions that can be used to characterize the spatial relationship between two geometries. To determine the correct function to use, you should consult the function's documentation, which contains helpful images that show how the function evaluates.

For example, the ST_Contains documentation says the below geometries would return TRUE:

ST_Contains TRUE

And the below would return FALSE:

ST_Contains FALSE

We can really see the difference between the spatial functions when we try to use them with two polygon layers - our census tract table and our school districts table. For example, this ST_Contains query tells us that there are 2,731 census tracts that are fully contained by our districts:

SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Contains(districts.geom, tracts.geom);

But using ST_Overlaps we can see that there are more than 8,000 times where a tract is partially, but not wholly contained by, a district:

SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Overlaps(districts.geom, tracts.geom);

And there are 11,000 times, according to ST_Intersects that a tract "shares any portion of space". This basically combines the results of ST_Contains and ST_Overlaps:

SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Intersects(districts.geom, tracts.geom);

Note that there are only about 5,000 census tracts in the state. But the above queries are capturing each pair of overlaps between our districts and census tracts.

See here for a complete list of the relationship functions.

Aggregations

We can take what we've learned above about spatial relationships and use it to relate data about our census tracts to our school districts. For example, we can take tract-level data and use it to calculate district-level data.

The below adds a column to our ST_Intersects query from above that includes the percentage overlap between the district and the tract:

SELECT
  districts.name,
  tracts.id,
  ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) AS overlap_pct
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom);

We can then allocate that percentage of the under-18-year-olds in the census table to the school district:

SELECT
  districts.name,
  tracts.id,
  ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18 AS under_18_pop
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom);

And by adding a GROUP BY we can get a district-wide estimate of the under-18 population:

SELECT
  districts.name,
  SUM(ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18) AS under_18_pop
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom)
GROUP BY districts.name;

Exporting query results

Exporting in QGIS

  1. Right click layer in Layer Panel. Select "Export" > "Save Features As..."
  2. Under "Format" you can chose from shapefile, geojson, CSV, etc. Make sure the CRS is populated with what you want, or you can change it if you'd like.

Exporting with pgsql2shp

The pgsql2shp command to call from the terminal will look like:

$ pgsql2shp -f <path to new shapefile> -g <geometry column in table> <database> "<query>"

Here is an example of what ours might look like, if we wanted only Dallas and Forth Worth school districts from the districts table:

$ pgsql2shp -f filter-query.shp -g geom postgis_intro \
 "select * from districts where name in ('Dallas ISD', 'Fort Worth ISD')"

Exporting with ogr2ogr

This is a good option if you want to go straight from PostGIS to a file other than a shapefile (think GeoJSON for a D3 viz or to load into Mapbox).

Here is a typical command:

$ ogr2ogr -f "<filetype>" <path to output file> \
 PG:"host=<host> dbname='<database>'" \
 -sql "<query>";"

If we wanted to export the entire districts table:

$ ogr2ogr -f "GeoJSON" school-districts.json \
 PG:"host=localhost dbname='postgis_intro'" \
 -sql "select * from districts;"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •