Skip to content

Commit e1c61f7

Browse files
authored
Merge pull request #222 from aws-samples/feature/cartopy
Add GEOS bootstrap action
2 parents 30f7bc7 + f352a10 commit e1c61f7

File tree

2 files changed

+117
-0
lines changed

2 files changed

+117
-0
lines changed

geos/README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# GEOS Installation on EMR
2+
3+
[GEOS](https://trac.osgeo.org/geos/) is a popular library for Geospatial analysis and is used by Python libraries like Shapely and GeoPandas for manipulating geographic data.
4+
5+
Let's see how to install another Python package, [Cartopy](https://scitools.org.uk/cartopy/docs/latest/index.html), for geospatial data processing.
6+
7+
## Bootstrap Action
8+
9+
Our bootstrap action needs to perform two main functions:
10+
11+
1. Install the GEOS library itself
12+
2. Install our Python packages
13+
14+
Given that Cartopy requires GEOS 3.7.2 or greater, we unfortunately need to build it from source.
15+
16+
This also requires that we build proj from source as well as sqlite3 (instructions here: https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html).
17+
18+
Once those are installed, we can `pip3 install cartopy` with a minor caveat that your sudo shell needs `/usr/local/bin` in its path because that's where the `geos-config` binary is.
19+
20+
## Running
21+
22+
Upload `install-geos.sh` from this repository to an S3 bucket and use `aws emr create-cluster`!
23+
24+
```bash
25+
S3_BUCKET=dcortesi-demo-code-us-west-2
26+
AWS_REGION=us-west-2
27+
28+
aws s3 cp geos/install-geos.sh s3://${S3_BUCKET}/code/bootstrap/geos/
29+
aws emr create-cluster \
30+
--name "emr-cartopy" \
31+
--region ${AWS_REGION} \
32+
--bootstrap-actions Path="s3://${S3_BUCKET}/code/bootstrap/geos/install-geos.sh" \
33+
--log-uri "s3n://${S3_BUCKET}/logs/emr/" \
34+
--release-label "emr-6.10.0" \
35+
--use-default-roles \
36+
--applications Name=Spark Name=Livy Name=JupyterEnterpriseGateway \
37+
--instance-fleets '[{"Name":"Primary","InstanceFleetType":"MASTER","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"InstanceTypeConfigs":[{"InstanceType":"c5a.2xlarge"},{"InstanceType":"m5a.2xlarge"},{"InstanceType":"r5a.2xlarge"}]},{"Name":"Core","InstanceFleetType":"CORE","TargetOnDemandCapacity":0,"TargetSpotCapacity":1,"InstanceTypeConfigs":[{"InstanceType":"c5a.2xlarge"},{"InstanceType":"m5a.2xlarge"},{"InstanceType":"r5a.2xlarge"}],"LaunchSpecifications":{"OnDemandSpecification":{"AllocationStrategy":"lowest-price"},"SpotSpecification":{"TimeoutDurationMinutes":10,"TimeoutAction":"SWITCH_TO_ON_DEMAND","AllocationStrategy":"capacity-optimized"}}}]' \
38+
--scale-down-behavior "TERMINATE_AT_TASK_COMPLETION" \
39+
--auto-termination-policy '{"IdleTimeout":14400}'
40+
```
41+
42+
The cluster will take about 20 minutes to boot up due to needing to compile several projects from source.
43+
44+
## Docker
45+
46+
Another option is building a Docker container containing the necessary dependencies.

geos/install-geos.sh

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
#!/bin/bash
2+
3+
set -e
4+
5+
# Install build dependencies
6+
sudo yum install -y automake \
7+
bzip2 \
8+
cmake3 \
9+
curl-devel \
10+
expect \
11+
gcc \
12+
gcc-c++ \
13+
gzip \
14+
libtiff-devel \
15+
make \
16+
python3-devel \
17+
tar
18+
19+
# Install GEOS
20+
curl -O https://download.osgeo.org/geos/geos-3.7.5.tar.bz2
21+
tar xjvf geos-3.7.5.tar.bz2
22+
cd geos-3.7.5
23+
./configure && make && sudo make install
24+
cd ..
25+
rm -rf geos-3.7.5
26+
27+
# Install sqlite>=3.11 (needed for proj)
28+
curl -O https://www.sqlite.org/src/tarball/sqlite.tar.gz
29+
tar xzf sqlite.tar.gz
30+
cd sqlite/
31+
export CFLAGS="-DSQLITE_ENABLE_FTS3 \
32+
-DSQLITE_ENABLE_FTS3_PARENTHESIS \
33+
-DSQLITE_ENABLE_FTS4 \
34+
-DSQLITE_ENABLE_FTS5 \
35+
-DSQLITE_ENABLE_JSON1 \
36+
-DSQLITE_ENABLE_LOAD_EXTENSION \
37+
-DSQLITE_ENABLE_RTREE \
38+
-DSQLITE_ENABLE_STAT4 \
39+
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \
40+
-DSQLITE_SOUNDEX \
41+
-DSQLITE_TEMP_STORE=3 \
42+
-DSQLITE_USE_URI \
43+
-O2 \
44+
-fPIC"
45+
export PREFIX="/usr/local"
46+
LIBS="-lm" ./configure --disable-tcl --enable-shared --enable-tempstore=always --prefix="$PREFIX"
47+
make
48+
sudo make install
49+
cd ..
50+
rm -rf sqlite
51+
52+
# Install proj
53+
curl -O https://download.osgeo.org/proj/proj-9.2.1.tar.gz
54+
tar xzvf proj-9.2.1.tar.gz
55+
cd proj-9.2.1
56+
mkdir build
57+
cd build
58+
cmake3 -DSQLITE3_INCLUDE_DIR=/usr/local/include/ -DSQLITE3_LIBRARY=/usr/local/lib/libsqlite3.so ..
59+
cmake3 --build .
60+
sudo cmake3 --build . --target install
61+
cd ../../
62+
rm -rf proj-9.2.1
63+
64+
# We need to let Python know where libproj and libgeos are installed
65+
echo -e '/usr/local/lib/\n/usr/local/lib64/' | sudo tee -a /etc/ld.so.conf.d/proj-x86_64.conf > /dev/null
66+
sudo ldconfig
67+
68+
# Now install cartopy
69+
# The path=$path portion adds /usr/local/bin to the sudo environment path so that cartopy can find geos
70+
# shapely==1.8.5 is needed due to https://github.com/SciTools/cartopy/issues/2076
71+
sudo env "PATH=$PATH" pip3 install shapely==1.8.5 cartopy

0 commit comments

Comments
 (0)