- PostgreSQL
- PostGIS
- Postico
- QGIS 3
- ogr2ogr (for exporting, if applicable)
- Texas schools from Texas Education Agency
- Texas school districts from Texas Education Agency
- Census Tracts: Shapefiles and ACS data from Census Bureau
- Create database and enable PostGIS in your database. From the terminal, run:
$ createdb postgis_intro
$ psql --dbname postgis_intro --command "CREATE EXTENSION postgis;"
- Determine your project's SRID (see our glossary).
The most popular SRID is usually EPSG:4326 (WGS 84). You can also look up your shapefile's SRID here by uploading your .prj
file (which is where the projection information lives).
Our data files are all EPSG:4326.
- Use
shp2pgsql
command to write a SQL file that will contain table and data information for PostgreSQL to upload:
$ shp2pgsql -s <SRID> -I <path to shp file>.shp \
<new postgres table name> > <path to output sql file>.sql
So our commands to upload all tables will look like:
$ shp2pgsql -s 4326 -I data/schools/schools.shp \
schools > schools.sql
$ shp2pgsql -s 4326 -I data/school-districts/school-districts.shp \
districts > school-districts.sql
$ shp2pgsql -s 4326 -I data/census-tracts/census-tracts.shp \
tracts > census-tracts.sql
Then we'll upload the sql file to the database we created:
$ psql --dbname <database> --file <path to output sql file>.sql
Our commands will look like:
$ psql --dbname postgis_intro --file schools.sql
$ psql --dbname postgis_intro --file school-districts.sql
$ psql --dbname postgis_intro --file census-tracts.sql
- In the Browser Panel (along left side), right click "PostGIS" and select "New Connection...".
- Enter connection information and then click "OK".
- You can now access the database's tables in the database manager, which is accessible by clicking "Database" in the menu and choosing "DB Manager...".
- In the DB manager you now have two choices:
- You can add the entire table, as-is, by double-clicking it.
- You can query the database using SQL and add the query result by opening the "SQL window". To write a query, click the second icon in the DB manager toolbar (it looks like a piece of paper with a wrench on it). Enter your query in that window and use the options at the bottom of the pane to add it to your project.
PostGIS adds almost 300 new functions and operators to PostgreSQL that can do everything from basic spatial calculations to complex aggregations. It's easy to recognize these new functions because they all use the ST_
prefix (for spatial type), as you'll see below:
Many functions perform simple calculations on a single geometry or pair of geometries. For example, we can see the area of each of our census tracts using the ST_Area
function:
SELECT name, ST_Area(geom)
FROM tracts
ORDER BY id;
ST_Area
and many functions like it that output measurements report them in the coordinate system of the geometries you're using in your query. In our case, that's WGS84, which tells us that the first census tract in our results has an area of 0.046862144283. That's because our coordinate system is in degrees.
If we had been using, say, NAD83 / Texas South Central, which is in feet, we would've gotten our answer in feet. We can see that in action if we transform our geometries to that coordinate system with ST_Transform
:
SELECT name, ST_Area(ST_Transform(geom, 2278))
FROM tracts
ORDER BY id;
We now get an answer of 5296730440.84863 (square feet).
We can also solve this problem using Geography
s. Geographies, unlike geometries, operate on spherical coordinates like the latitude and longitude values we're using here. Most common functions, when passed geographies instead of geometries, will return values in meters:
SELECT name, ST_Area(geom::GEOGRAPHY)
FROM tracts
ORDER BY id;
We now get an answer of 491172828.83503 (square meters).
Some functions can calculate new geometries based on your existing ones. For example, we can get the centroid of all of our census tracts using:
SELECT name, ST_Centroid(geom)
FROM tracts
ORDER BY id;
While the original column contains POLYGON
data, the value returned from ST_Centroid
will be a POINT
.
Similarly, we can create buffers around our school locations using ST_Buffer
, which will derive polygons from the points:
SELECT campname, ST_Buffer(geom, 1)
FROM schools
ORDER BY campname;
The size of the buffer (the second function argument) in the example above is in the units of the coordinate system for the geometry. So the above would create a one degree buffer. To buffer by a kilometer, we can cast to a geography just like we did with ST_Area
:
SELECT campname, ST_Buffer(geom::GEOGRAPHY, 1000)
FROM schools
ORDER BY campname;
We really start to see the power of spatial queries when we start to use PostGIS to join and filter out tables based on spatial relationship. For example, we can perform a point-in-polygon query to determine the district that each of our schools is in using the ST_Contains
function:
SELECT schools.campname, districts.name as district_name
FROM schools
LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom);
Combined with a GROUP BY
, we can now easily count the number of schools in each district:
SELECT districts.name, COUNT(*) AS num_schools
FROM schools
LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom)
GROUP BY districts.name
ORDER BY num_schools DESC;
ST_Contains
is just one of several functions that can be used to characterize the spatial relationship between two geometries. To determine the correct function to use, you should consult the function's documentation, which contains helpful images that show how the function evaluates.
For example, the ST_Contains
documentation says the below geometries would return TRUE
:
And the below would return FALSE
:
We can really see the difference between the spatial functions when we try to use them with two polygon layers - our census tract table and our school districts table. For example, this ST_Contains
query tells us that there are 2,731 census tracts that are fully contained by our districts:
SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Contains(districts.geom, tracts.geom);
But using ST_Overlaps
we can see that there are more than 8,000 times where a tract is partially, but not wholly contained by, a district:
SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Overlaps(districts.geom, tracts.geom);
And there are 11,000 times, according to ST_Intersects
that a tract "shares any portion of space". This basically combines the results of ST_Contains
and ST_Overlaps
:
SELECT COUNT(*)
FROM tracts
INNER JOIN districts ON ST_Intersects(districts.geom, tracts.geom);
Note that there are only about 5,000 census tracts in the state. But the above queries are capturing each pair of overlaps between our districts and census tracts.
See here for a complete list of the relationship functions.
We can take what we've learned above about spatial relationships and use it to relate data about our census tracts to our school districts. For example, we can take tract-level data and use it to calculate district-level data.
The below adds a column to our ST_Intersects
query from above that includes the percentage overlap between the district and the tract:
SELECT
districts.name,
tracts.id,
ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) AS overlap_pct
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom);
We can then allocate that percentage of the under-18-year-olds in the census table to the school district:
SELECT
districts.name,
tracts.id,
ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18 AS under_18_pop
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom);
And by adding a GROUP BY
we can get a district-wide estimate of the under-18 population:
SELECT
districts.name,
SUM(ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18) AS under_18_pop
FROM districts
INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom)
GROUP BY districts.name;
- Right click layer in Layer Panel. Select "Export" > "Save Features As..."
- Under "Format" you can chose from shapefile, geojson, CSV, etc. Make sure the CRS is populated with what you want, or you can change it if you'd like.
The pgsql2shp command to call from the terminal will look like:
$ pgsql2shp -f <path to new shapefile> -g <geometry column in table> <database> "<query>"
Here is an example of what ours might look like, if we wanted only Dallas and Forth Worth school districts from the districts
table:
$ pgsql2shp -f filter-query.shp -g geom postgis_intro \
"select * from districts where name in ('Dallas ISD', 'Fort Worth ISD')"
This is a good option if you want to go straight from PostGIS to a file other than a shapefile (think GeoJSON for a D3 viz or to load into Mapbox).
Here is a typical command:
$ ogr2ogr -f "<filetype>" <path to output file> \
PG:"host=<host> dbname='<database>'" \
-sql "<query>";"
If we wanted to export the entire districts
table:
$ ogr2ogr -f "GeoJSON" school-districts.json \
PG:"host=localhost dbname='postgis_intro'" \
-sql "select * from districts;"