Geospatial Analytics using Python and SQL

Basic PostGIS

What is PostGIS and Why Do We Use it?

According to the documentation directly from PostGIS, it is defined as the following: "PostGIS is a spatial database extender for PostgreSQL object-relational database. It adds support for geographic objects allowing location queries to be run in SQL."

In layman’s terms, PostGIS is SQL functionality that supports geospatial transformation, analysis, and manipulation. With PostGIS, you can execute commands such as:

PostGIS Command Definition

ST_Area

Returns the area of a polygonal geometry

ST_Distance

Measures the distance (cartesian for geometry; geodetic for geography) of a given shape

ST_Contains

Boolean. Geometry A contains Geometry B if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A.

ST_Intersects

Boolean. If a geometry or geography shares any portion of space then they intersect.

These commands—​as well as the hundreds not listed here—​can help us answer questions, such as:

  • How far away is Paris from Los Angeles?

    • ST_Distance

  • Where are overpasses located in our highway network?

    • ST_Intersect

  • What is the area of the city of Chicago, IL?

    • ST_Area

How can I use PostGIS?

Whether in a native SQL GUI or in a Python session, you can run PostGIS using a variety of methods. To do this, the following steps are required:

  • Install and set up PostgreSQL

  • Install PostGIS

  • Configure PostgreSQL to run PostGIS

  • Run PostgreSQL (either via CLI or GUI on your local machine)

We will walk through the steps to do these below.

Install and set up PostgreSQL

The following is the download link for PostgreSQL: www.enterprisedb.com/downloads/postgres-postgresql-downloads

<td class="icon"> <i class="fa icon-tip" title="Tip"></i> </td> <td class="content"> Mac users can simply run the following command in any terminal session: brew install --cask postgres </td>

If you are a Windows user, you will need to do the following:

pg1
Figure 1. Start the PostgreSQL wizard
pg2
Figure 2. Specify installation directory
pg3
Figure 3. Select which components to install
pg4
Figure 4. Specify a data directory
pg5
Figure 5. Specify a username and password

<td class="icon"> <i class="fa icon-tip" title="Tip"></i> </td> <td class="content"> The default port is 5432 </td>

pg6
Figure 6. Specify the port
pg7
Figure 7. Specify any advanced options, if desired
pg8
Figure 8. Ensure the pre-installation instructions are correct
pg9
Figure 9. Begin installation

Once the installation finishes, restart your computer. Upon restarting your computer, you can use PostgreSQL either via GUI (pgAdmin) or CLI (psql):

To use CLI (access psql) – this is the preferred method:

  • On Windows:

    • On Windows, you can find psql in the Program Files, and you should be able to launch it in a command prompt simply by clicking on it.

*On Mac: **You should be able to run psql directly from a terminal of your choice. BE SURE TO SPECIFCY -U flag for postgres user

Using PostgreSQL:

To use default database (called postgres):

psql –U USERNAME

Once you are “inside” the postgres database called “postgres”:

CREATE database NAME;

Now, to use psql and connect to a specific database:

psql –U USERNAME –d DB_NAME

<td class="icon"> <i class="fa icon-tip" title="Tip"></i> </td> <td class="content"> If you have a password, upon running, you will be asked to enter it </td>

Install PostGIS

Installing PostGIS is easy. For Mac users, simply run the following:

brew install postgis

For Windows users, complete the following:

pgis1
Figure 10. Select the proper installation for your PostgreSQL build
pgis5
Figure 11. Select the applications you would like to install
pgis9
Figure 12. Select the PostGIS components to install
pgis11
Figure 13. You will need to authenticate using your PostgreSQL DB’s credentials
pgis14
Figure 14. Select yes to register the PROJ_LIB environment variable
pgis15
Figure 15. Select yes to register the GDAL_DATA environment variable
pgis16
Figure 16. Select yes to enable raster drivers
pgis17
Figure 17. Select yes to enable OUTDB rasters

Configure PostgreSQL to run PostGIS

Once PostGIS is installed, you need to activate it on your PostgreSQL DB. You can easily do this by using the following SQL command:

-- Enable PostGIS (as of 3.0 contains just geometry/geography)
CREATE EXTENSION postgis;
-- enable raster support (for 3+)
CREATE EXTENSION postgis_raster;
-- Enable Topology
CREATE EXTENSION postgis_topology;
-- Enable PostGIS Advanced 3D
-- and other geoprocessing algorithms
-- sfcgal not available with all distributions
CREATE EXTENSION postgis_sfcgal;
-- fuzzy matching needed for Tiger
CREATE EXTENSION fuzzystrmatch;
-- rule based standardizer
CREATE EXTENSION address_standardizer;
-- example rule data set
CREATE EXTENSION address_standardizer_data_us;
-- Enable US Tiger Geocoder
CREATE EXTENSION postgis_tiger_geocoder;

<td class="icon"> <i class="fa icon-tip" title="Tip"></i> </td> <td class="content"> You only need to run this command once upon creation of your PostgreSQL DB. </td>

Accessing PostgreSQL on Jupyter Notebook

Accessing your Postgres instance within a Jupyter Notebook is very simple by leveraging sqlalchemy. If you’re running a local instance on your machine, ensure Postgres is running and exposed.

You will need the following packages installed and imported:

# Required Packages
import geopandas as gpd
from sqlalchemy import create_engine, text
from sqlalchemy_utils import create_database, database_exists, drop_database
import pandas as pd
from pandasql import *
pysqldf = lambda q: sqldf(q, globals())

From there, establish the connection to the database:

#Variables
db_type = "postgres"
username = "YOUR_USERNAME"
password = "YOUR_PASSWORD"
host = "YOUR_HOST" #localhost if on your local machine
port = "YOUR_PORT"
db_name = "YOUR_DB_NAME"

#Put it together
engine = create_engine(f"{db_type}://{username}:{password}@{host}:{port}/{db_name}")

Once you’ve created the connection engine, you can easily run scripts such as:

ql = """
SELECT
    city,
    COUNT(city) AS num_records,
    SUM(ST_NumGeometries(geom)) AS numpoly_before,
    ST_NumGeometries(ST_Multi(ST_Union(geom))) AS num_poly_after,
    ST_PointFromText('POINT(0 0)') AS dummy
FROM
    ch11.cities
GROUP BY
    city, dummy
HAVING
    COUNT(city) > 1;
"""

ex_11_1_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="dummy")

This table (ex_11_1_1_2) lists cities that straddle multiple records, how many polygons each city straddles, and how many polygons you’ll be left with after dissolving boundaries within each city.

If you’re a student at Purdue, you can access a Postgres DB on our Geddes Cluster. Contact Justin Gould ([email protected]) for more information on gaining access.