Using SQL for GIS analysis with PostGIS

Requirements

PostgreSQL
PostGIS
Postico
QGIS 3
ogr2ogr (for exporting, if applicable)

Data included

Texas schools from Texas Education Agency
Texas school districts from Texas Education Agency
Census Tracts: Shapefiles and ACS data from Census Bureau

Importing shapefiles into PostGIS

Create database and enable PostGIS in your database. From the terminal, run:

  $ createdb postgis_intro
  $ psql --dbname postgis_intro --command "CREATE EXTENSION postgis;"

Determine your project's SRID (see our glossary).

The most popular SRID is usually EPSG:4326 (WGS 84). You can also look up your shapefile's SRID here by uploading your .prj file (which is where the projection information lives).

Our data files are all EPSG:4326.

Use shp2pgsql command to write a SQL file that will contain table and data information for PostgreSQL to upload:

$ shp2pgsql -s <SRID> -I <path to shp file>.shp \
 <new postgres table name> > <path to output sql file>.sql

So our commands to upload all tables will look like:

$ shp2pgsql -s 4326 -I data/schools/schools.shp \
 schools > schools.sql

$ shp2pgsql -s 4326 -I data/school-districts/school-districts.shp \
 districts > school-districts.sql

$ shp2pgsql -s 4326 -I data/census-tracts/census-tracts.shp \
tracts > census-tracts.sql

Then we'll upload the sql file to the database we created:

$ psql --dbname <database> --file <path to output sql file>.sql

Our commands will look like:

$ psql --dbname postgis_intro --file schools.sql

$ psql --dbname postgis_intro --file school-districts.sql

$ psql --dbname postgis_intro --file census-tracts.sql

Viewing tables in QGIS

In the Browser Panel (along left side), right click "PostGIS" and select "New Connection...".
Enter connection information and then click "OK".
You can now access the database's tables in the database manager, which is accessible by clicking "Database" in the menu and choosing "DB Manager...".
In the DB manager you now have two choices:
- You can add the entire table, as-is, by double-clicking it.
- You can query the database using SQL and add the query result by opening the "SQL window". To write a query, click the second icon in the DB manager toolbar (it looks like a piece of paper with a wrench on it). Enter your query in that window and use the options at the bottom of the pane to add it to your project.

Spatial queries

PostGIS adds almost 300 new functions and operators to PostgreSQL that can do everything from basic spatial calculations to complex aggregations. It's easy to recognize these new functions because they all use the ST_ prefix (for spatial type), as you'll see below:

Basic calculations

Many functions perform simple calculations on a single geometry or pair of geometries. For example, we can see the area of each of our census tracts using the ST_Area function:

SELECT name, ST_Area(geom)
FROM tracts
ORDER BY id;

ST_Area and many functions like it that output measurements report them in the coordinate system of the geometries you're using in your query. In our case, that's WGS84, which tells us that the first census tract in our results has an area of 0.046862144283. That's because our coordinate system is in degrees.

If we had been using, say, NAD83 / Texas South Central, which is in feet, we would've gotten our answer in feet. We can see that in action if we transform our geometries to that coordinate system with ST_Transform:

SELECT name, ST_Area(ST_Transform(geom, 2278))
FROM tracts
ORDER BY id;

We now get an answer of 5296730440.84863 (square feet).

We can also solve this problem using Geographys. Geographies, unlike geometries, operate on spherical coordinates like the latitude and longitude values we're using here. Most common functions, when passed geographies instead of geometries, will return values in meters:

SELECT name, ST_Area(geom::GEOGRAPHY)
FROM tracts
ORDER BY id;

We now get an answer of 491172828.83503 (square meters).

Deriving new geometries

Some functions can calculate new geometries based on your existing ones. For example, we can get the centroid of all of our census tracts using:

SELECT name, ST_Centroid(geom)
FROM tracts
ORDER BY id;

While the original column contains POLYGON data, the value returned from ST_Centroid will be a POINT.

Similarly, we can create buffers around our school locations using ST_Buffer, which will derive polygons from the points:

SELECT campname, ST_Buffer(geom, 1)
FROM schools
ORDER BY campname;

The size of the buffer (the second function argument) in the example above is in the units of the coordinate system for the geometry. So the above would create a one degree buffer. To buffer by a kilometer, we can cast to a geography just like we did with ST_Area:

SELECT campname, ST_Buffer(geom::GEOGRAPHY, 1000)
FROM schools
ORDER BY campname;

Spatial relationships

We really start to see the power of spatial queries when we start to use PostGIS to join and filter out tables based on spatial relationship. For example, we can perform a point-in-polygon query to determine the district that each of our schools is in using the ST_Contains function:

SELECT schools.campname, districts.name as district_name
FROM schools
LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom);

Combined with a GROUP BY, we can now easily count the number of schools in each district:

SELECT districts.name, COUNT(*) AS num_schools
FROM schools
LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom)
GROUP BY districts.name
ORDER BY num_schools DESC;

ST_Contains is just one of several functions that can be used to characterize the spatial relationship between two geometries. To determine the correct function to use, you should consult the function's documentation, which contains helpful images that show how the function evaluates.

For example, the ST_Contains documentation says the below geometries would return TRUE:

And the below would return FALSE:

We can really see the difference between the spatial functions when we try to use them with two polygon layers - our census tract table and our school districts table. For example, this ST_Contains query tells us that there are 2,731 census tracts that are fully contained by our districts:

SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Contains(districts.geom, tracts.geom);

But using ST_Overlaps we can see that there are more than 8,000 times where a tract is partially, but not wholly contained by, a district:

SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Overlaps(districts.geom, tracts.geom);

And there are 11,000 times, according to ST_Intersects that a tract "shares any portion of space". This basically combines the results of ST_Contains and ST_Overlaps:

SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Intersects(districts.geom, tracts.geom);

Note that there are only about 5,000 census tracts in the state. But the above queries are capturing each pair of overlaps between our districts and census tracts.

See here for a complete list of the relationship functions.

Aggregations

We can take what we've learned above about spatial relationships and use it to relate data about our census tracts to our school districts. For example, we can take tract-level data and use it to calculate district-level data.

The below adds a column to our ST_Intersects query from above that includes the percentage overlap between the district and the tract:

SELECT
  districts.name,
  tracts.id,
  ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) AS overlap_pct
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom);

We can then allocate that percentage of the under-18-year-olds in the census table to the school district:

SELECT
  districts.name,
  tracts.id,
  ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18 AS under_18_pop
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom);

And by adding a GROUP BY we can get a district-wide estimate of the under-18 population:

SELECT
  districts.name,
  SUM(ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18) AS under_18_pop
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom)
GROUP BY districts.name;

Exporting query results

Exporting in QGIS

Right click layer in Layer Panel. Select "Export" > "Save Features As..."
Under "Format" you can chose from shapefile, geojson, CSV, etc. Make sure the CRS is populated with what you want, or you can change it if you'd like.

Exporting with pgsql2shp

The pgsql2shp command to call from the terminal will look like:

$ pgsql2shp -f <path to new shapefile> -g <geometry column in table> <database> "<query>"

Here is an example of what ours might look like, if we wanted only Dallas and Forth Worth school districts from the districts table:

$ pgsql2shp -f filter-query.shp -g geom postgis_intro \
 "select * from districts where name in ('Dallas ISD', 'Fort Worth ISD')"

Exporting with ogr2ogr

This is a good option if you want to go straight from PostGIS to a file other than a shapefile (think GeoJSON for a D3 viz or to load into Mapbox).

Here is a typical command:

$ ogr2ogr -f "<filetype>" <path to output file> \
 PG:"host=<host> dbname='<database>'" \
 -sql "<query>";"

If we wanted to export the entire districts table:

$ ogr2ogr -f "GeoJSON" school-districts.json \
 PG:"host=localhost dbname='postgis_intro'" \
 -sql "select * from districts;"

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
images		images
.gitignore		.gitignore
Glossary.md		Glossary.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Using SQL for GIS analysis with PostGIS

Contents

Requirements

Data included

Importing shapefiles into PostGIS

Viewing tables in QGIS

Spatial queries

Basic calculations

Deriving new geometries

Spatial relationships

Aggregations

Exporting query results

Exporting in QGIS

Exporting with pgsql2shp

Exporting with ogr2ogr

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

arianagiorgi/postgis-intro

Folders and files

Latest commit

History

Repository files navigation

Using SQL for GIS analysis with PostGIS

Contents

Requirements

Data included

Importing shapefiles into PostGIS

Viewing tables in QGIS

Spatial queries

Basic calculations

Deriving new geometries

Spatial relationships

Aggregations

Exporting query results

Exporting in QGIS

Exporting with pgsql2shp

Exporting with ogr2ogr

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages