# Geospatial Analytics using Python and SQL

## A Crash Course in PostGIS

This adapted from Purdue’s HONR 39900-054: Foundations in Geospatial Analytics. For more information and examples, please head to the course’s GitHub site: github.com/gouldju1/honr39900-foundations-of-geospatial-analytics.

##### Required Packages

*In[1]:*

```
import geopandas as gpd
from sqlalchemy import create_engine, text
from sqlalchemy_utils import create_database, database_exists, drop_database
import pandas as pd
import sqlalchemy
```

##### Load Data #1

Today, we will be working with the `boroughs`

data from Chapter 11 of
our textbook.

*In[2]:*

`nyc = gpd.read_file("./nyc_boroughs/geo_export_b38c64b9-7e50-43c7-8880-994b22440dbf.shp")`

*In[3]:*

`nyc.head()`

*Out[3]:*

[cols=",,,,,",options="header",] |=== | |boro_code |boro_name |shape_area |shape_leng |geometry |0 |1.0 |Manhattan |9.442947e+08 |203803.216852 |MULTIPOLYGON (((-74.04388 40.69019, -74.04351 ... |1 |2.0 |Bronx |1.598380e+09 |188054.198841 |POLYGON ((-73.86477 40.90201, -73.86305 40.901... |2 |3.0 |Brooklyn |2.684411e+09 |234928.658563 |POLYGON ((-73.92722 40.72533, -73.92654 40.724... |3 |4.0 |Queens |3.858050e+09 |429586.630985 |POLYGON ((-73.77896 40.81171, -73.76371 40.793... |4 |5.0 |Staten Island |2.539686e+09 |212213.139971 |POLYGON ((-74.05581 40.64971, -74.05619 40.639... |===

*In[4]:*

`nyc.plot()`

## Connect to Postgres

*In[5]:*

```
#Variables
db_type = "postgres"
username = "postgres"
password = ""
host = "localhost"
port = "5432"
db_name = "demo"
#Put it together
engine = create_engine(f"{db_type}://{username}:{password}@{host}:{port}/{db_name}")
```

*In[21]:*

```
#Write nyc to PostgreSQL
nyc.to_postgis(name="boroughs_chapter_11", con=engine)
```

*In[6]:*

`engine.table_names() #We see that our table was added to our database`

*Out[6]:*

```
['spatial_ref_sys',
'london',
'airbnb_homework_4',
'streets_chapter_11',
'boroughs_chapter_11']
```

*In[7]:*

```
#List Schemas
insp = sqlalchemy.inspect(engine)
insp.get_schema_names()
```

*Out[7]:*

##### Review: Accessing our Postgres Data

Whenever we use Jupyter Notebooks to access and use our data from postgres, you can use the beginning of this notebook as a standard boilerplate. Be sure to take the notebook with you from class! :)

*In[8]:*

```
#SQL query
sql = "SELECT * FROM public.boroughs_chapter_11"
```

#Specify name of column which stores our geometry! In table `streets_chapter_11`

, the geometry is stored in a col called `geometry`

geom_col = "geometry"

#Execute query to create GeoDataFrame nyc_from_db = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col=geom_col)

*In[9]:*

`nyc_from_db.head() #Yay!`

*Out[9]:*

boro_code | boro_name | shape_area | shape_leng | geometry | |
---|---|---|---|---|---|

0 |
1.0 |
Manhattan |
9.442947e+08 |
203803.216852 |
MULTIPOLYGON (((-74.04388 40.69019, -74.04351 … |

1 |
2.0 |
Bronx |
1.598380e+09 |
188054.198841 |
POLYGON ((-73.86477 40.90201, -73.86305 40.901… |

2 |
3.0 |
Brooklyn |
2.684411e+09 |
234928.658563 |
POLYGON ((-73.92722 40.72533, -73.92654 40.724… |

3 |
4.0 |
Queens |
3.858050e+09 |
429586.630985 |
POLYGON ((-73.77896 40.81171, -73.76371 40.793… |

4 |
5.0 |
Staten Island |
2.539686e+09 |
212213.139971 |
POLYGON ((-74.05581 40.64971, -74.05619 40.639… |

## Agenda

We have a **LOT** to cover today…here’s a snapshot:

Topic Name | Textbook Chapter/Section |
---|---|

Creating a single |
11.1.1 |

Creating a |
11.1.2 |

Clipping |
11.2.1 |

Splitting |
11.2.2 |

Tessellating |
11.2.3 |

Sharding |
11.2.3 |

Segmentizing `LINESTRING`s |
11.3.1 |

Scaling |
11.4.2 |

Rotating |
11.4.3 |

## Creating a single `MULTIPOLYGON`

from many

Textbook Chapter/Section: 11.1.1

Textbook Start Page: 310

In many cases, you may have a city where records are broken out by
districts, neighborhoods, boroughs, or precincts because you often need
to view or report on each neighborhood separately. Sometimes, however,
for reporting purposes you need to view the city as a single unit. In
this case you can use the `ST_UNION`

aggregate function to amass one
single multipolygon from constituent multipolygons.

For example, the largest city in the United States is New York, made up of the five storied boroughs of Manhattan, Bronx, Queens, Brooklyn, and Staten Island. To aggregate New York, you first need to create a boroughs table with five records—one multipolygon for each of the boroughs with littorals:

Then you can use the `ST_Union`

spatial aggregate function to group all
the boroughs into a single city, as follows:

*In[10]:*

```
sql = """
SELECT
ST_Union(geometry) AS city
FROM
public.boroughs_chapter_11;
"""
```

example_11_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="city") #Note the change in geom_col!

*In[11]:*

`example_11_1_1.head()`

*Out[11]:*

city | |
---|---|

0 |
MULTIPOLYGON (((-74.03844 40.55674, -74.04955 … |

*In[12]:*

`example_11_1_1.plot()`

Let’s work through an example in the San Francisco area using our table of cities. This example lists cities that straddle multiple records, how many polygons each city straddles, and how many polygons you’ll be left with after dissolving boundaries within each city.

*In[13]:*

```
sql = """
SELECT
city,
COUNT(city) AS num_records,
SUM(ST_NumGeometries(geom)) AS numpoly_before,
ST_NumGeometries(ST_Multi(ST_Union(geom))) AS num_poly_after,
ST_PointFromText('POINT(0 0)') AS dummy
FROM
ch11.cities
GROUP BY
city, dummy
HAVING
COUNT(city) > 1;
"""
```

ex_11_1_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="dummy") #Note the change in geom_col!

*In[14]:*

`ex_11_1_1_2`

*Out[14]:*

city | num_records | numpoly_before | num_poly_after | dummy | |
---|---|---|---|---|---|

0 |
ALAMEDA |
4 |
4 |
4 |
POINT (0.00000 0.00000) |

1 |
BELVEDERE TIBURON |
2 |
2 |
2 |
POINT (0.00000 0.00000) |

2 |
BRISBANE |
2 |
2 |
1 |
POINT (0.00000 0.00000) |

3 |
GREENBRAE |
2 |
2 |
2 |
POINT (0.00000 0.00000) |

4 |
LARKSPUR |
2 |
2 |
2 |
POINT (0.00000 0.00000) |

5 |
REDWOOD CITY |
2 |
2 |
2 |
POINT (0.00000 0.00000) |

6 |
SAN FRANCISCO |
7 |
7 |
6 |
POINT (0.00000 0.00000) |

7 |
SAN MATEO |
2 |
2 |
2 |
POINT (0.00000 0.00000) |

8 |
SOUTH SAN FRANCISCO |
2 |
2 |
2 |
POINT (0.00000 0.00000) |

9 |
SUISUN CITY |
2 |
2 |
2 |
POINT (0.00000 0.00000) |

From the code above, you know that ten cities have multiple records, but you’ll only be able to dissolve the boundaries of Brisbane and San Francisco, because only these two have fewer polygons per geometry than what you started out with.

Below, you will aggregate and insert the aggregated records into a table
called `ch11.distinct_cities`

. You then add a primary key to each city
to ensure that you have exactly one record per city…

*In[15]:*

```
#New Table
sql = """SELECT city, ST_Multi(
ST_Union(geom))::geometry(multipolygon,2227) AS geom
FROM ch11.cities
GROUP BY city, ST_SRID(geom);
"""
new_ex_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note the change in geom_col!
```

*In[16]:*

`new_ex_1.plot()`

*In[18]:*

```
#Add to db
new_ex_1.to_postgis(name="distinct_cities", schema="ch11", con=engine)
```

*In[19]:*

```
#New Table
sql = """
SELECT
*
FROM
ch11.distinct_cities
"""
new_ex_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note the change in geom_col!
```

*In[20]:*

`new_ex_1_1.head()`

*Out[20]:*

city | geom | |
---|---|---|

0 |
ALAMEDA |
MULTIPOLYGON (((6062440.660 2099431.510, 60623… |

1 |
ALAMO |
MULTIPOLYGON (((6142769.437 2132420.993, 61418… |

2 |
ALBANY |
MULTIPOLYGON (((6045760.680 2154419.700, 60461… |

3 |
ALVISO |
MULTIPOLYGON (((6133654.370 1993195.000, 61330… |

4 |
AMERICAN CANYON |
MULTIPOLYGON (((6072029.630 2267404.510, 60811… |

*In[21]:*

`new_ex_1_1.plot()`

In the above code, you create and populate a new table called
`ch11.distinct_cities`

. You use the `ST_Multi`

function to ensure that
all the resulting geometries will be multipolygons and not polygons. If
a geometry has a single polygon, `ST_Multi`

will upgrade it to be a
multipolygon with a single polygon. Then you cast the geometry using
typmod to ensure that the geometry type and spatial reference system are
correctly registered in the `geometry_columns`

view. For good measure we
also put in a primary key and a spatial index.

## Creating a `LINESTRING`

from `POINT`s

Textbook Chapter/Section: 11.1.2

Textbook Start Page: 312

In the past two decades, the use of GPS devices has gone mainstream. GPS Samaritans spend their leisure time visiting points of interest (POI), taking GPS readings, and sharing their adventures via the web. Common venues have included local taverns, eateries, fishing holes, and filling stations with the lowest prices. A common follow-up task after gathering the raw positions of the POIs is to connect them to form an unbroken course.

In this exercise, you’ll use Australian track points to create linestrings. These track points consist of GPS readings taken during a span of about ten hours from afternoon to early morning on a wintry July day. We have no idea of what the readings represent. Let’s say a zoologist fastened a GPS around the neck of a roo and tracked her for an evening. The readings came in every ten seconds or so, but instead of creating one line-string with more than two thousand points, you’ll divide the readings into 15-minute intervals and create separate linestrings for each of the intervals.

`ST_MakeLine`

is the spatial aggregate function that takes a set of
points and forms a linestring out of them. You can add an `ORDER BY`

clause to aggregate functions; this is particularly useful when you need
to control the order in which aggregation occurs. In this example,
you’ll order by the input time of the readings.

*In[22]:*

```
sql = """
SELECT
DATE_TRUNC('minute',time) -
CAST( mod(
CAST(DATE_PART('minute',time) AS integer),15
) ||' minutes' AS interval
) AS track_period,
MIN(time) AS t_start,
MAX(time) AS t_end,
ST_MakeLine(geom ORDER BY time) AS geom
INTO ch11.aussie_run
FROM ch11.aussie_track_points
GROUP BY track_period
HAVING COUNT(time) > 1;
SELECT
CAST(track_period AS timestamp),
CAST(t_start AS timestamp) AS t_start,
CAST(t_end AS timestamp) AS t_end,
ST_NPoints(geom) AS number_of_points,
CAST(ST_Length(geom::geography) AS integer) AS dist_m,
(t_end - t_start) AS dur,
geom
FROM ch11.aussie_run;
"""
```

example_11_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom")

*In[23]:*

`example_11_1_2.head()`

*Out[23]:*

track_period | t_start | t_end | number_of_points | dist_m | dur | geom | |
---|---|---|---|---|---|---|---|

0 |
2009-07-18 04:30:00 |
2009-07-18 04:30:00 |
2009-07-18 04:44:59 |
133 |
2705 |
0 days 00:14:59 |
LINESTRING (152.95709 -27.16209, 152.95704 -27… |

1 |
2009-07-18 04:45:00 |
2009-07-18 04:45:05 |
2009-07-18 04:55:20 |
87 |
1720 |
0 days 00:10:15 |
LINESTRING (152.94813 -27.17278, 152.94828 -27… |

2 |
2009-07-18 05:00:00 |
2009-07-18 05:02:00 |
2009-07-18 05:14:59 |
100 |
1530 |
0 days 00:12:59 |
LINESTRING (152.94259 -27.17509, 152.94265 -27… |

3 |
2009-07-18 05:15:00 |
2009-07-18 05:15:05 |
2009-07-18 05:29:59 |
149 |
3453 |
0 days 00:14:54 |
LINESTRING (152.94276 -27.17723, 152.94263 -27… |

4 |
2009-07-18 05:30:00 |
2009-07-18 05:30:05 |
2009-07-18 05:43:58 |
132 |
3059 |
0 days 00:13:53 |
LINESTRING (152.94757 -27.18276, 152.94754 -27… |

*In[24]:*

`example_11_1_2.plot()`

First you create a column called `track_period`

specifying quarter-hour
slots starting on the hour, 15 minutes past, 30 minutes past, and 45
minutes past. You allocate each GPS point into the slots and create
separate linestrings from each time slot via the `GROUP BY`

clause. Not
all time slots need to have points, and some slots may have a single
point. If a slot is devoid of points, it won’t be part of the output. If
a slot only has one point, it’s removed. For the allocation, you use the
data_part function and the modulo operator.

Within each slot, you create a `linestring`

using `ST_MakeLine`

. You
want the line to follow the timing of the measurements, so you add an
`ORDER BY`

clause to `ST_MakeLine`

.

The `SELECT`

inserts directly into a new table called `aussie_run`

. (If
you’re not running this code for the first time, you’ll need to drop the
`aussie_run`

table first.) Finally, you query `aussie_run`

to find the
number of points in each `linestring`

using `ST_NPoints`

, subtracting
the time of the last point from the time of the first point to get a
duration, and using `ST_Length`

to compute the distance covered between
the first and last points within the 15-minute slot. Note that you cast
the geometry in longitude and latitude to geography to ensure you have a
measurable unit—meters.

## Clipping

Textbook Chapter/Section: 11.2.1

Textbook Start Page: 314

Clipping uses one geometry to cut another during intersection.Clipping uses one geometry to cut another during intersection. In this section, we’ll explore other functions available to you for clipping and splitting.In this section, we’ll explore other functions available to you for clipping and splitting.

As the name implies, *clipping* is the act of removing unwanted sections
of a geometry, leaving behind only what’s of interest. Think of clipping
coupons from a newspaper, clipping hair from someone’s head, or the moon
clipping the sun in a solar eclipse.

*Difference* and *symmetric difference* are operations closely related
to intersection. They both serve to return the remainder of an
intersection. `ST_Difference`

is a non—commutative function, whereas
`ST_SymDifference`

is, as the name implies, commutative.As the name
implies, clipping is the act of removing unwanted sections of a
geometry, leaving behind only what’s of interest. Think of clipping
coupons from a newspaper, clipping hair from someone’s head, or the moon
clipping the sun in a solar eclipse.

Difference and symmetric difference are operations closely related to
intersection. They both serve to return the remainder of an
intersection. `ST_Difference`

is a non—commutative function, whereas
ST_SymDifference is, as the name implies, commutative.

Difference functions return the geometry of what’s left out when two
geometries intersect. When given geometries `A`

and `B`

,
`ST_Difference(A,B)`

returns the portion of `A`

that’s not shared with
`B`

, whereas `ST_SymDifference(A,B)`

returns the portion of `A`

and `B`

that’s not shared. Difference functions return the geometry of what’s
left out when two geometries intersect. When given geometries `A`

and
`B`

, `ST_Difference(A,B)`

returns the portion of `A`

that’s not shared
with `B`

, whereas `ST_SymDifference(A,B)`

returns the portion of `A`

and
`B`

that’s not shared.

ST_SymDifference(A,B) = Union(A,B) - Intersection(A,B) ST_Difference(A,B) = A - Intersection(A,B)

##### Example: What’s left of the polygon and line after clipping

Here, you’re getting the difference between a linestring and polygon

*In[25]:*

```
#This is what our polygon looks like
sql = """
SELECT ST_GeomFromText(
'POLYGON((
2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
0.056 3.222,-1.5 4.2,2 6.5,2 4.5
))'
) AS geom1
"""
```

poly = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom1") #Note change of geom_col! poly.plot()

*In[26]:*

```
#This is what our polygon looks like
sql = """
SELECT ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2
"""
```

LS = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom2") #Note change of geom_col! LS.plot()

*In[27]:*

```
#The difference between the polygon and linestring is a polygon
sql = """
SELECT
ST_Intersects(g1.geom1,g1.geom2) AS they_intersect,
GeometryType(
ST_Difference(g1.geom1,g1.geom2) ) AS intersect_geom_type,
ST_Difference(g1.geom1,g1.geom2) AS intersect
FROM (
SELECT ST_GeomFromText(
'POLYGON((
2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
0.056 3.222,-1.5 4.2,2 6.5,2 4.5
))'
) AS geom1,
ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2
) AS g1;
"""
```

example_11_2_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="intersect") #Note change of geom_col! example_11_2_1_1.head()

*Out[27]:*

they_intersect | intersect_geom_type | intersect | |
---|---|---|---|

0 |
True |
POLYGON |
POLYGON ((2.00000 4.50000, 3.00000 2.60000, 3…. |

*In[28]:*

`example_11_2_1_1.plot()`

*In[29]:*

```
#The difference between the linestring and polygon is a multilinestring
sql = """
SELECT
ST_Intersects(g1.geom1,g1.geom2) AS they_intersect,
GeometryType(
ST_Difference(g1.geom2,g1.geom1) ) AS intersect_geom_type,
ST_Difference(g1.geom2,g1.geom1) AS intersect
FROM (
SELECT ST_GeomFromText(
'POLYGON((
2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
0.056 3.222,-1.5 4.2,2 6.5,2 4.5
))'
) AS geom1,
ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2) AS g1;
"""
```

example_11_2_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="intersect") #Note change of geom_col! example_11_2_1_2.head()

*Out[29]:*

they_intersect | intersect_geom_type | intersect | |
---|---|---|---|

0 |
True |
MULTILINESTRING |
MULTILINESTRING ((-0.62000 5.84000, -0.65724 4… |

*In[30]:*

`example_11_2_1_2.plot()`

*In[31]:*

```
#The symmetric difference is a geometry collection
sql = """
SELECT
ST_Intersects(g1.geom1,g1.geom2) AS they_intersect,
GeometryType(
ST_SymDifference(g1.geom1,g1.geom2)
) AS intersect_geom_type,
ST_SymDifference(g1.geom2,g1.geom1) AS intersect
FROM (
SELECT ST_GeomFromText(
'POLYGON((
2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,
0.056 3.222,-1.5 4.2,2 6.5,2 4.5
))'
) AS geom1,
ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2) AS g1;
"""
example_11_2_1_3 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="intersect") #Note change of geom_col!
example_11_2_1_3.head()
```

*Out[31]:*

they_intersect | intersect_geom_type | intersect | |
---|---|---|---|

0 |
True |
GEOMETRYCOLLECTION |
GEOMETRYCOLLECTION (LINESTRING (-0.62000 5.840… |

*In[32]:*

`example_11_2_1_3.plot()`

In the preceding listing, the first `SELECT`

returns a polygon #1, which
is pretty much the same polygon you started out with. The second
`SELECT`

returns a `multilinestring`

composed of three ```
linestring`s
where the `polygon
```

cuts through #2. Finally, the third `SELECT`

returns
a `geometrycollection`

as expected, composed of a `multilinestring`

and
a `polygon`

#3.

## Splitting

Textbook Chapter/Section: 11.2.2

Textbook Start Page: 316

You just learned that using a `linestring`

to slice a `polygon`

with
`ST_Difference`

doesn’t work. For that, PostGIS offers another function
called `ST_Split`

. The `ST_Split`

function can only be used with single
geometries, not collections, and the blade you use to cut has to be one
dimension lower than what you’re cutting up.

The following listing demonstrates the use of `ST_Split`

.

*In[35]:*

```
sql = """SELECT gd.path[1] AS index, gd.geom AS geom
FROM (
SELECT
ST_GeomFromText(
'POLYGON((
2 4.5,3 2.6,3 1.8,2 0,-1.5 2.2,0.056
3.222,-1.5 4.2,2 6.5,2 4.5
))'
) AS geom1,
ST_GeomFromText('LINESTRING(-0.62 5.84,-0.8 0.59)') AS geom2
) AS g1,
ST_Dump(ST_Split(g1.geom1, g1.geom2)) AS gd"""
example_11_2_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col!
example_11_2_2.head()
```

*Out[35]:*

index | geom | |
---|---|---|

0 |
1 |
POLYGON ((2.00000 4.50000, 3.00000 2.60000, 3…. |

1 |
2 |
POLYGON ((-0.76073 1.73532, -1.50000 2.20000, … |

2 |
3 |
POLYGON ((-0.69361 3.69315, -1.50000 4.20000, … |

*In[36]:*

`example_11_2_2.plot()`

The `ST_Split(A,B)`

function always returns a geometry collection
consisting of all parts of geometry `A`

that result from splitting it
with geometry `B`

, even when the result is a single geometry.

Because of the inconvenience of geometry collections, you’ll often see
`ST_Split`

combined with `ST_Dump`

, as in the above listing, or with
`ST_CollectionExtract`

to simplify down to a single geometry where
possible.

## Tessellating

Textbook Chapter/Section: 11.2.3

Textbook Start Page: 318

Dividing your polygon into regions using shapes such as rectangles, hexagons, and triangles is called tessellating. It’s often desirable to divide your regions into areas that have equal area or population for easier statistical analysis. In this section, we’ll demonstrate techniques for achieving equal-area regions.

Tessellating looks like this:

##### CREATING A GRID AND SLICING TABLE GEOMETRIES WITH THE GRID

In this example, you’ll slice the New York city boroughs into small rectangular blocks. Your map will look like this:

We will first start off with `ST_SquareGrid`

. `ST_SquareGrid`

is a set
returning function that returns a table consisting of 3 columns
`(i,j,geom)`

. The `i`

is the row number along the grid and `j`

is the
column number along the grid.

*In[40]:*

```
sql = """
WITH bounds AS (
SELECT ST_SetSRID(ST_Extent(geom), ST_SRID(geom)) AS geom -- Creates a bounding box geometry covering the extent of the boroughs
FROM ch11.boroughs
GROUP BY ST_SRID(geom)
),
grid AS (SELECT g.i, g.j, g.geom
FROM bounds, ST_SquareGrid(10000,bounds.geom) AS g
) -- Creates a square grid each square grid is of 10000 length/width units (feet)
SELECT b.boroname, grid.i, grid.j,
CASE WHEN ST_Covers(b.geom,grid.geom) THEN grid.geom
ELSE ST_Intersection(b.geom, grid.geom) END AS geom -- Clips the boroughs by the squares
FROM ch11.boroughs AS b
INNER JOIN grid ON ST_Intersects(b.geom, grid.geom);
"""
```

example_11_2_3_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_2_3_1.head()

*Out[40]:*

boroname | i | j | geom | |
---|---|---|---|---|

0 |
Staten Island |
91 |
11 |
POLYGON ((920000.000 117907.700, 918916.902 11… |

1 |
Staten Island |
91 |
12 |
POLYGON ((912498.141 120000.000, 912287.069 12… |

2 |
Staten Island |
91 |
13 |
POLYGON ((915779.150 130000.000, 915731.347 13… |

3 |
Staten Island |
91 |
14 |
POLYGON ((916233.310 140000.000, 917776.001 14… |

4 |
Staten Island |
92 |
11 |
POLYGON ((930000.000 116834.215, 928239.005 11… |

*In[38]:*

`example_11_2_3_1.plot()`

*In[39]:*

```
#Add to db
example_11_2_3_1.to_postgis(name="boroughs_square_grid", schema="ch11", con=engine)
```

The above code uses a grid where each square is 10,000 units in length/width and spans the full extent of NYC boroughs. Since the boroughs are in NY State plane feet (SRID=2263), the units of measure are square feet. When you clip the bouroughs by the grid, the resulting tiles have various shapes and sizes when they are on bourough boundaries. This may not be ideal if you want all your tiles to be the same size. In those cases you may want to forgo the clipping and just return the squares as is. Note that <3> uses a CASE statement. This is equivalent in result to just doing ST_Intersection, but because ST_Intersection is an intensive operation, you save a lot of processing cycles by just returning the square if it is completely covered by the borough.

The above code defined a bounds that covered the whole area of interest and split that into squares and then clipped the geometries using those squares. One feature of the ST_SquareGrid function that is not obvious from the above code is that for any given SRID and size there is a unique dicing of grids that can be formed across all space. Which means for any given SRID and size a particular point will have exactly the same i,j, geom tile it intersects in.

*In[41]:*

```
#Dividing the NYC boroughs bounds into rectangular blocks
#This code creates a square grid each square grid is of 10000 sq units (feet)
sql = """
SELECT b.boroname, grid.i, grid.j,
CASE WHEN ST_Covers(b.geom,grid.geom) THEN grid.geom
ELSE ST_Intersection(b.geom, grid.geom) END AS geom -- Creates a bounding box geometry covering the extent of the boroughs
FROM ch11.boroughs AS b
INNER JOIN ST_SquareGrid(10000,b.geom) AS grid -- Clips the boroughs by the squares
ON ST_Intersects(b.geom, grid.geom);
"""
```

example_11_2_3_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_2_3_2.head()

*Out[41]:*

boroname | i | j | geom | |
---|---|---|---|---|

0 |
Staten Island |
91 |
11 |
POLYGON ((920000.000 117907.700, 918916.902 11… |

1 |
Staten Island |
91 |
12 |
POLYGON ((912498.141 120000.000, 912287.069 12… |

2 |
Staten Island |
91 |
13 |
POLYGON ((915779.150 130000.000, 915731.347 13… |

3 |
Staten Island |
91 |
14 |
POLYGON ((916233.310 140000.000, 917776.001 14… |

4 |
Staten Island |
92 |
11 |
POLYGON ((930000.000 116834.215, 928239.005 11… |

*In[42]:*

`example_11_2_3_2.plot() #Note it is the same result as before`

*In[43]:*

```
#Add to db
example_11_2_3_2.to_postgis(name="boroughs_square_grid2", schema="ch11", con=engine)
```

If this is a grid you’ll be using often, it’s best to make a physical table (materialize it) out of the bounds and use it to chop up your area as needed. However if this is one off chunking for specific areas or your area is huge and you want to do it in bits, then it is better to follow the listing 11.5 model.

Often times a hexagonal grid works better for splitting space evenly as each hexagon has 6 neighbors it is equidistant to. This is not the case with squares. A popular gridding system for the world is the H3 scheme created by Uber for dividing driving regions. It uses hexagons eng.uber.com/h3/ to divide the earth and a dymaxion projection en.wikipedia.org/wiki/Dymaxion_map. This has the feature of allowing large hexagons to more or less contain smaller ones.

## Sharding

Textbook Chapter/Section: 11.2.3

Textbook Start Page: 327

One common reason for breaking a geometry into smaller bits is for
performance. The reason is that operations such as intersections and
intersects work much faster on geometries with fewer points or smaller
area. If you have such a need, the fastest sharding function is the
`ST_Subdivide`

(postgis.net/docs/ST_Subdivide.html) function.
`ST_Subdivide`

is a set returning function that returns a set of
geometries where no geometry has more than max_vertices designated. It
can work with areal, linear, and point geometries.

This next example divides Queens such that no polygon has more than 1/4th total vertices of original.

*In[44]:*

```
sql = """
SELECT row_number() OVER() AS bucket,x.geom,
ST_Area(x.geom) AS area,
ST_NPoints(x.geom) AS npoints,
(ref.npoints/4)::integer As max_vertices
FROM (SELECT geom, -- Geometry to slice
ST_NPoints(geom) AS npoints
FROM ch11.boroughs WHERE boroname = 'Queens') AS ref
, LATERAL ST_Subdivide(ref.geom,
(ref.npoints/4)::integer -- Max number shards
) AS x(geom) ;
"""
```

example_11_2_3_3 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_2_3_3.head()

*Out[44]:*

bucket | geom | area | npoints | max_vertices | |
---|---|---|---|---|---|

0 |
1 |
POLYGON ((1016187.550 141290.140, 1011914.750 … |
3.266832e+08 |
26 |
239 |

1 |
2 |
POLYGON ((1039907.726 150741.984, 1034839.909 … |
5.422707e+08 |
73 |
239 |

2 |
3 |
POLYGON ((1048784.526 167596.255, 1048727.701 … |
2.061526e+08 |
166 |
239 |

3 |
4 |
POLYGON ((1059790.715 184428.326, 1059759.660 … |
2.899369e+08 |
88 |
239 |

4 |
5 |
POLYGON ((1031087.953 184428.326, 1022870.118 … |
5.098495e+08 |
161 |
239 |

*In[45]:*

`example_11_2_3_3.plot()`

Here is a color-coded example of the above plot:

You will often find `ST_Subdivide`

paired with `ST_Segmentize`

.
`ST_Segmentize`

is used to even out the segments so that when
`ST_Subdivide`

does the work of segmentizing, the areas have a better
chance of being equal.

In the above queries with `ST_Subdivide`

use `LATERAL`

join constructs.
In the case of set-returning functions, the `LATERAL`

keyword is
optional so you will find people often ommitting `LATERAL`

even though
they are doing a `LATERAL`

join.

## Segmentizing `LINESTRING`s

Textbook Chapter/Section: 11.3.1

Textbook Start Page: 330

There are several reasons why you might want to break up a linestring into segments: - To improve the use of spatial indexes—a smaller linestring will have a smaller bounding box. - To prevent linestrings from stretching beyond one unit measure. - As a step toward topology and routing to determine shared edges.

If you have long linestrings where the vertices are fairly far apart,
you can inject intermediary points using the `ST_Segmentize`

function.
`ST_Segmentize`

adds points to your linestring to make sure that no
individual segments in the linestring exceed a given length.
`ST_Segmentize`

exists for both geometry and geography types.

For the geometry version of `ST_Segmentize`

, the measurement specified
with the max length is in the units of the spatial reference system. For
geography, the units are always meters.

In the following listing, you’re segmentizing a 4 vertex linestring into 10,000-meter segments.

*In[46]:*

```
#This is our linestring
sql = """
SELECT
geog::geometry AS geom,
ST_NPoints(geog::geometry) AS np_before,
ST_NPoints(ST_Segmentize(geog,10000)::geometry) AS np_after
FROM ST_GeogFromText(
'LINESTRING(-117.16 32.72,-71.06 42.35,3.3974 6.449,120.96 23.70)'
) AS geog;
"""
```

example_11_3_1_1 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col! example_11_3_1_1.head()

*Out[46]:*

geom | np_before | np_after | |
---|---|---|---|

0 |
LINESTRING (-117.16000 32.72000, -71.06000 42…. |
4 |
3585 |

*In[47]:*

`example_11_3_1_1.plot()`

In this listing, you start with a 4-point linestring. After
segmentizing, you end up with a 3,585-point linestring where the
distance between any two adjacent points is no more than 10,000 meters.
You cast the geography object to a geometry to use the `ST_NPoints`

function. The `ST_NPoints`

function does not exist for geography.

As mentioned earlier, `ST_Subdivide`

is often used with `ST_Segmentize`

.
This is because `ST_Subdivide`

cuts at vertices, so if you wanted to
then break apart your linestring into separate linestrings, you can use
`ST_Subdivide`

, but need to first cast to geometry and then back to
geography as follows:

*In[49]:*

```
#This is our linestring
sql = """
SELECT
sd.geom::geography AS geo,
ST_NPoints(sd.geom) AS np_after
FROM ST_GeogFromText(
'LINESTRING(-117.16 32.72,-71.06 42.35,3.3974 6.449,120.96 23.70)'
) AS geog,
LATERAL ST_Subdivide( ST_Segmentize(geog,10000)::geometry,
3585/8) AS sd(geom);
"""
```

example_11_3_1_2 = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geo") #Note change of geom_col! example_11_3_1_2.head()

*Out[49]:*

geo | np_after | |
---|---|---|

0 |
LINESTRING (-117.16000 32.72000, -117.08387 32… |
270 |

1 |
LINESTRING (-94.55382 40.08841, -94.49497 40.1… |
236 |

2 |
LINESTRING (-71.94764 42.35359, -71.84901 42.3… |
152 |

3 |
LINESTRING (-57.63000 40.52939, -57.54303 40.5… |
367 |

4 |
LINESTRING (-27.86500 29.73958, -27.81249 29.7… |
230 |

*In[50]:*

`example_11_3_1_2.plot()`

## Scaling

Textbook Chapter/Section: 11.4.2

Textbook Start Page: 337

The scaling family of functions comes in four overloads, one for 2D
`ST_Scale(geometry, xfactor, yfactor)`

, one for 3D
`ST_Scale(geometry, xfactor, yfactor, zfactor)`

, one for any geometry
dimension but will scales the coordinates based on a factor specified as
a point `ST_Scale(geometry, factor)`

, and the newest addition introduced
in PostGIS 2.5 is a version that will scale same amount but about a
specfied point `ST_Scale(geom, factor, geometry origin)`

.
(postgis.net/docs/ST_Scale.html).

Scaling takes a coordinate and multiplies it by the factor parameters. If you pass in a factor between 1 and -1, you shrink the geometry. If you pass in negative factors, the geometry will flip in addition to scaling. Below shows an example of scaling a hexagon.

*In[68]:*

```
sql = """
SELECT
xfactor, yfactor, zfactor, geom, -- Scale in x and y direction
ST_Scale(hex.geom, xfactor, yfactor) AS scaled_geometry, -- Scaling values
ST_Scale(hex.geom, ST_MakePoint(xfactor,yfactor, zfactor) ) AS scaled_using_pfactor -- Scale using point factor
FROM (
SELECT ST_GeomFromText(
'POLYGON((0 0,64 64,64 128,0 192, -64 128,-64 64,0 0))'
) AS geom -- Original hexagon
) AS hex
CROSS JOIN
(SELECT x*0.5 AS xfactor FROM generate_series(1,4) AS x) AS xf
CROSS JOIN
(SELECT y*0.5 AS yfactor FROM generate_series(1,4) AS y) AS yf
CROSS JOIN
(SELECT z*0.5 AS zfactor FROM generate_series(0,1) AS z) AS zf;
"""
hexagon_orig = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="geom") #Note change of geom_col!
hexagon_orig.head()
```

*Out[68]:*

xfactor | yfactor | zfactor | geom | scaled_geometry | scaled_using_pfactor | |
---|---|---|---|---|---|---|

0 |
0.5 |
0.5 |
0.0 |
POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1… |
0103000000010000000700000000000000000000000000… |
0103000000010000000700000000000000000000000000… |

1 |
1.0 |
0.5 |
0.0 |
POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1… |
0103000000010000000700000000000000000000000000… |
0103000000010000000700000000000000000000000000… |

2 |
1.5 |
0.5 |
0.0 |
POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1… |
0103000000010000000700000000000000000000000000… |
0103000000010000000700000000000000000000000000… |

3 |
2.0 |
0.5 |
0.0 |
POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1… |
0103000000010000000700000000000000000000000000… |
0103000000010000000700000000000000000000000000… |

4 |
0.5 |
0.5 |
0.5 |
POLYGON ((0.000 0.000, 64.000 64.000, 64.000 1… |
0103000000010000000700000000000000000000000000… |
0103000000010000000700000000000000000000000000… |

*In[69]:*

```
#Original Hexagon
hexagon_orig.plot()
```

*In[70]:*

```
#Scaled Hexagon
hexagon_scaled = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="scaled_geometry") #Note change of geom_col!
hexagon_scaled.plot()
```

*In[71]:*

```
#Scaled Hexagon Point Factor
hexagon_point = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="scaled_using_pfactor") #Note change of geom_col!
hexagon_point.plot()
```

You start with a hexagonal polygon #1 and shrink and expand the geometry in the X and Y directions from 50% of its size to twice its size by using a cross join that generates numbers from 0 to 2 in X and 0 to 2 in Y, incrementing .5 for each step #2. The results are shown below:

The scaling multiplies the coordinates. Because the hexagon starts at the origin, all scaled geometries still have their bases at the origin. Normally when you scale, you want to keep the centroid constant. See below for an exampe:

*In[72]:*

```
sql = """
SELECT xfactor, yfactor,
ST_Scale(hex.geom, ST_MakePoint(xfactor, yfactor), ST_Centroid(hex.geom) )
AS scaled_geometry
FROM (
SELECT ST_GeomFromText(
'POLYGON((0 0,64 64,64 128,0 192,-64 128, -64 64,0 0))'
) AS geom
) AS hex
CROSS JOIN
(SELECT x*0.5 AS xfactor FROM generate_series(1,4) AS x) AS xf
CROSS JOIN
(SELECT y*0.5 AS yfactor FROM generate_series(1,4) AS y) AS yf;
"""
centroid = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="scaled_geometry") #Note change of geom_col!
centroid.plot()
```

Here you scale a hexagon from half to twice its size in the X and Y directions about the centroid so the centroid remains unaltered. See below:

## Rotating

Textbook Chapter/Section: 11.4.3

Textbook Start Page: 340

`ST_RotateX`

, `ST_RotateY`

, `ST_RotateZ`

, and `ST_Rotate`

rotate a
geometry about the `X`

, `Y`

, or `Z`

axis in radian units. `ST_Rotate`

and `ST_RotateZ`

are the same because the default axis of rotation is
`Z`

. `ST_RotateX`

, `ST_RotateY`

, `ST_RotateZ`

, and `ST_Rotate`

rotate a
geometry about the `X`

, `Y`

, or `Z`

axis in radian units. `ST_Rotate`

and `ST_RotateZ`

are the same because the default axis of rotation is
`Z`

.

These functions are rarely used in isolation because their default
behavior is to rotate the geometry about the `(0,0)`

origin rather than
about the centroid. You can pass in an optional point argument called
`pointOrigin`

. When this argument is specified, rotation is about that
point; otherwise, rotation is about the origin.

The following listing rotates a hexagon about its centroid in increments of 45 degrees.

*In[78]:*

```
#This is our original hexagon
sql = """
SELECT
hex.geom as orig_geom,
rotrad/pi()*180 AS deg,
ST_Rotate(hex.geom,rotrad,
ST_Centroid(hex.geom)) AS rotated_geometry
FROM (
SELECT ST_GeomFromText(
'POLYGON((0 0,64 64,64 128,0 192,-64 128,-64 64,0 0))'
) AS geom
) AS hex
CROSS JOIN (
SELECT 2*pi()*x*45.0/360 AS rotrad
FROM generate_series(0,6) AS x
) AS xf;
"""
orig_item = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="orig_geom") #Note change of geom_col!
orig_item.plot()
```

*In[80]:*

```
rotate = gpd.GeoDataFrame.from_postgis(sql=sql, con=engine, geom_col="rotated_geometry") #Note change of geom_col!
rotate.plot()
```

The above plot looks like a big blob…here is a better representation: