kandy 0.8.0 Help

Geo Plotting

Kandy-Geo adds extensions which allow plotting a GeoDataFrame — a DataFrame-based structure for geospatial datasets. All the plotting principles and features here are the same as Kandy, the only difference is that you don't have to perform positional mapping — instead geometries will be automatically mapped to Kandy layers.

Kandy-Geo and DataFrame-Geo Usage

To integrate Kandy-Geo and DataFrame-Geo into an interactive notebook, use the following commands:


// Fetches the latest versions %useLatestDescriptors // Adds both the kandy-geo and the dataframe-geo libraries with the latest versions %use kandy-geo
// Adds the kandy-geo and the dataframe-geo libraries with a specific versions %use kandy-geo(kandyVersion=0.8.0, dataframeVersion=0.15.0)

To include Kandy-Geo and DataFrame-Geo in your Gradle project, add the following to your dependencies:

repositories { // Maven repository with GeoTools releases maven("https://repo.osgeo.org/repository/release") } dependencies { implementation("org.jetbrains.kotlinx:kandy-geo:0.8.0") implementation("org.jetbrains.kotlinx:dataframe-geo:0.15.0") }

Geometries

Geo plotting is essentially the visualization of geographic geometries on a map or coordinate system. GeoJSON is the most widely used standard for representing geospatial data. It defines a set of geometry types that are simple yet powerful for modeling geographic features:

  • Point: Represents a specific location as a single coordinate.

  • MultiPoint: A collection of multiple Point geometries.

  • LineString: A sequence of connected points, forming a path or linear feature.

  • MultiLineString: A collection of multiple LineString geometries.

  • Polygon: A closed shape with an outer boundary and optional inner holes.

  • MultiPolygon: A collection of multiple Polygon geometries.

  • GeometryCollection: A container for any combination of the above geometries.

JTS (Java Topology Suite) is a library that works seamlessly with these geometries, adding a variety of operations. It allows you to perform tasks like combining geometries, finding intersections, creating buffers, or simplifying shapes.

All classes for the aforementioned geometries are provided in JTS and inherit from the base class Geometry. GeoDataFrame is a wrapper around a standard DataFrame with a geometry column of type Geometry, enabling convenient handling of geospatial datasets.

Reading GeoDataFrame

Currently, the GeoDataFrame supports two of the most popular formats: Shapefile and GeoJSON. These formats can be read into a GeoDataFrame using the corresponding GeoDataFrame.read..() functions. Each of these functions returns a GeoDataFrame.

GeoJSON

GeoJSON is a widely used format for encoding geographic data structures. It represents spatial features such as points, lines, and polygons, along with their properties, using JSON. Here's an example of GeoJSON:

{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } }

Let's load a GeoJSON file that contains polygons representing the boundaries of US states:

val usaStates = GeoDataFrame.readGeoJson("https://raw.githubusercontent.com/AndreiKingsley/datasets/refs/heads/main/USA.json")

We can directly access the underlying DataFrame to take a closer look at its contents:

usaStates.df
usaStatesDf.png

This DataFrame is required to have a geometry column of type org.locationtech.jts.geom.Geometry:

usaStates.df.geometry.type()

Output:

org.locationtech.jts.geom.Geometry

We can also check the exact types of these geometries:

usaStates.df.geometry.map { it::class }.distinct().toList()

Output:

[class org.locationtech.jts.geom.Polygon, class org.locationtech.jts.geom.MultiPolygon]

As expected, these are Polygon and MultiPolygon.

The GeoDataFrame also contains a .crs field for the coordinate reference system (CRS). In GeoJSON, this field is not explicitly defined* and is read as null. If this field is not explicitly set in the GeoDataFrame, it is assumed by default to use WGS84 — the standard CRS for working with geospatial data.

* According to the GeoJSON specification, all coordinates are defined in WGS84. In the future, we may remove the nullability of the crs field, and WGS84 will be explicitly set as the CRS when reading GeoJSON files.

usaStates.crs

Output:

null

Shapefile

Shapefile is a popular geospatial vector data format developed by ESRI. It stores geometric features such as points, lines, and polygons, along with their attributes, across multiple files. A Shapefile requires at least three parts: .shp (geometry), .shx (spatial index), and .dbf (attributes), and it typically uses a defined coordinate reference system.

To load a Shapefile, you need to specify the path to the file with the .shp extension. The other required files must be in the same directory and share the same base name.

Let's load a Shapefile with the most populated cities in the world:

val worldCities = GeoDataFrame.readShapefile("https://github.com/AndreiKingsley/datasets/raw/refs/heads/main/ne_10m_populated_places_simple/ne_10m_populated_places_simple.shp")

Take a look inside the DataFrame:

worldCities.df
worldCitiesDf.png

This GeoDataFrame contains only Point geometries:

worldCities.df.geometry.type()

Output:

org.locationtech.jts.geom.Point

And has explicitly specified CRS:

worldCities.crs

Output:

GEOGCS["GCS_WGS_1984", DATUM["D_WGS_1984", SPHEROID["WGS_1984", 6378137.0, 298.257223563]], PRIMEM["Greenwich", 0.0], UNIT["degree", 0.017453292519943295], AXIS["Longitude", EAST], AXIS["Latitude", NORTH]]

Plot

Geo plotting in Kandy is not significantly different from usual plotting. The main distinction is that you need to provide the aforementioned geometries instead of specifying positional mappings.

To facilitate this, Kandy-Geo introduces geo layers, which, unlike regular layers, accept geometries. These can be provided as DataFrame columns, Iterable, or single instances. If a layer is built in the context of a GeoDataFrame dataset, it is not necessary to explicitly specify the geometry, as the geometry column will be used by default.

geoPolygon

The geoPolygon() adds a layer of polygons constructed using Polygon and MultiPolygon geometries.

Let's plot US states from usaStates:

usaStates.plot { // `geoPolygon` uses polygons and multipolygons // from the `geometry` column of `usaStates` inner DataFrame geoPolygon() }
usaStatesPlotGeoPolygon

The customization process for such a layer is no different from a regular one. The function optionally opens a block where you can configure all polygon aesthetic attributes as usual using mappings and settings. For example, you can color each state by mapping the name column to fillColor and customize the borderLine as shown below:

usaStates.plot { geoPolygon { fillColor(name) { legend.type = LegendType.None } // Hide legend borderLine { width = 0.1 color = Color.BLACK } } }
usaStatesGeoPolygonPlotCustomized

Mercator coordinates transformation

The Mercator projection is widely used for map visualizations because it preserves angles and shapes locally, making it ideal for navigation and geographical applications. It is particularly useful for rendering maps on flat surfaces, such as screens or paper. The Mercator projection is compatible with coordinates in the WGS84 coordinate system, as it uses latitude and longitude values to project the curved surface of the Earth onto a 2D plane. In this projection, only the axes of the plot are transformed, while the actual values of the points remain unchanged.

usaStates.plot { geoPolygon() coordinatesTransformation = CoordinatesTransformation.mercator() }
usaStatesPlotWithMercator

geoMap

geoMap() is a basically geoPolygon() but it also applies coordinates transformation based on the provided CoordinateReferenceSystem (GeoDataFrame.crs). Now only WGS84 is supported (where the mercator projection is applied by default).

// This plot is identical // to the previous one. usaStates.plot { geoMap() }
usaStatesPlotGeoMap

When the Mercator projection is applied, we can still set axis limits as usual. However, there are inherent boundaries at 180 and 90 degrees due to the nature of geographic coordinates.

usaStates.plot { geoMap() x.axis.limits = -127..-65 y.axis.limits = 23..50 }
usaStatesPlotWithAxisLimits

geoPoints

The geoPoints() adds a layer of points constructed using Point and MultiPoint geometries.

Let's add worldCities points over usaStates polygons:

usaStates.plot { // `geoMap` takes polygons from the `geometry` // column of `usaStates` inner DataFrame geoMap() // Add a new dataset using the `worldCities` GeoDataFrame. // Layers created within this scope will use it as their base dataset // instead of the initial one withData(worldCities) { // `geoPoints` takes points from the `geometry` // column of `worldCities` inner DataFrame geoPoints { size = 1.5 } } }
usaStatesPlotWithWorldCities

GeoDataFrame modifying

Before plotting, it is often necessary to modify the geo- dataframe. For example, you might filter points within a specific area, translate or scale certain geometries, and so on. GeoDataFrame allows direct updates to its inner DataFrame using the familiar DataFrame Operations API.

DataFrame operations

The function GeoDataFrame<T>.modify(block: DataFrame<T>.() -> DataFrame<T>): GeoDataFrame<T> opens a new scope where the receiver is the inner DataFrame of this GeoDataFrame. This allows you to perform operations such as filter, take, sort, update, and others directly on it. The function returns a GeoDataFrame with the modified DataFrame resulting from the block, while keeping the CRS unchanged.

Let's filter the points in worldCities, keeping only those located within the US. To do this, we will first combine all polygons from usaStates into a single polygon for convenience:

// import mergePolygons utility import org.jetbrains.kotlinx.kandy.letsplot.geo.util.mergePolygons // Experimental function that merges a collection of polygons and // multipolygons into a single multipolygon val usaPolygon: MultiPolygon = usaStates.df.geometry.mergePolygons()
plot { // `geoPolygon` and `geoMap` can accept a single `Polygon` or `MultiPolygon` geoMap(usaPolygon) }
usaStatesPlotMergedPolygon

Now, let's create a GeoDataFrame usaCities containing only the cities located within the United States. To avoid over plotting, we will select the 30 most populous cities. For this, we will modify worldCities:

val usaCities = worldCities.modify { // Filter the DataFrame to include only points inside the `usaPolygon` filter { // `usaPolygon.contains(geometry)` checks if the `geometry` (a Point) // from the current row of `worldCities` is within the `usaPolygon` usaPolygon.contains(geometry) } // Take 30 most populous cities. // Sort the remaining rows by population size in descending order .sortByDesc { pop_min } // Select the top 30 rows. .take(30) }

Now we can visualize the result by overlaying the points representing these cities on the polygons of the states (as above):

usaStates.plot { geoMap() withData(usaCities) { geoPoints { tooltips(title = value(name)) { line("population", value(pop_min)) } } } }
usaStatesPlotWithTopCities

As you can see, the map of the US is significantly stretched by distant territories such as Puerto Rico, Hawaii, and Alaska. We can remove these regions, keeping only the continental part (48 states):

val usa48 = usaStates.modify { filter { name !in listOf("Alaska", "Hawaii", "Puerto Rico") } } usa48.plot { geoMap() }
usaStatesFilterContiguous

Geometry operations

Another, more elegant way to improve the appearance of the US map is to scale and reposition these polygons, making the plot more compact.

The DataFrame-Geo library provides Kotlin-style extensions for JTS geometries. For instance, Geometry.translate(x, y) shifts a geometry by a specified vector, while Geometry.scaleAroundCenter(factor) scales a geometry relative to its centroid.

val usaAdjusted = usaStates.modify { // Custom extensions for `Geometry` based on JTS API. // Scale and move Alaska: update { geometry }.where { name == "Alaska" }.with { it.scaleAroundCenter(0.5).translate(40.0, -40.0) } // Move Hawaii and Puerto Rico: .update { geometry }.where { name == "Hawaii" }.with { it.translate(65.0, 0.0) } .update { geometry }.where { name == "Puerto Rico" }.with { it.translate(-10.0, 5.0) } } usaAdjusted.plot { geoMap() }
usaStatesAdjusted

An example of the states maps with their centroids:

usa48.plot { geoMap() withData(usa48.modify { update { geometry }.with { it.centroid } }) { geoPoints() } }
usaStatesPlotWithCentroids

Datasets Join

In geo-plotting, separate datasets are often used—one containing the geometries and others with specific data. To combine them, you can join them using modify. Let's load a DataFrame with the results of the 2024 US presidential election:

val usa2024electionResults = DataFrame.readCSV("https://gist.githubusercontent.com/AndreiKingsley/348687222aecc4f0eb39e3d81acd515b/raw/a9914352dbdfb426f9146dda633ee382d936b000/usa_2024_election_states.csv") usa2024electionResults
electionResults.png

And join it to the US states GeoDataFrame:

val usaStatesWithElectionResults = usaAdjusted.modify { innerJoin(usa2024electionResults) { name } } usaStatesWithElectionResults.df
usStatesWithElectionResults.png

Now we can create a geo plot with a color scale based on state election results:

usaStatesWithElectionResults.plot { geoMap { fillColor(winner) { scale = categorical( "Republican" to Color.hex("#CC3333"), "Democrat" to Color.hex("#3366CC") ) } tooltips(name, winner) } layout { title = "USA 2024 President Election Results" size = 700 to 500 style(Style.Void) { legend.position = LegendPosition.Top } } }
electionResultsPlotByParty

Applying new CRS

A new coordinate system can be applied to a GeoDataFrame by projecting all geometries into it (note that this is not always possible, so proceed with caution).

The CONUS (Conterminous United States) Albers projection is a widely used coordinate reference system tailored for the contiguous United States (48 states). It is an equal-area projection, meaning it preserves area proportions while slightly distorting shapes and distances. This projection is ideal for visualizing geographic features across large regions of the continental US.

Let's apply the CONUS Albers projection to the state polygons:

val conusAlbersCrs = CRS.decode("EPSG:5070", true) val usaAlbers = usa48.applyCrs(conusAlbersCrs) usaAlbers.crs println("CRS.equalsIgnoreMetadata(usaAlbers.crs, CRS.decode("EPSG:5070", true)) is ${CRS.equalsIgnoreMetadata(usaAlbers.crs, CRS.decode("EPSG:5070", true))}") // true

Output :

PROJCS["NAD83 / Conus Albers", GEOGCS["NAD83", DATUM["North American Datum 1983", SPHEROID["GRS 1980", 6378137.0, 298.257222101, AUTHORITY["EPSG","7019"]], ...
usaAlbers.plot { // Polygons will work exactly the same - // no special coordinates transformation is applied // for GeoDF with unsupported CRS geoMap() }
usaStatesPlotWithAlbersCrs

geoPath

The geoPath() adds a layer of a path constructed using LineString and MultiLineString geometries.

The following function constructs the shortest path on the Earth's surface, known as a great-circle line. A great-circle line represents the shortest distance between two points on a sphere, following the curvature of the Earth. The path is approximated using a LineString with a specified number of points n for precision.

import org.locationtech.jts.geom.* import kotlin.math.* fun greatCircleLineString(start: Point, end: Point, n: Int = 100): LineString { val factory = GeometryFactory() val startLat = Math.toRadians(start.y) val startLon = Math.toRadians(start.x) val endLat = Math.toRadians(end.y) val endLon = Math.toRadians(end.x) val deltaLon = endLon - startLon val cosStartLat = cos(startLat) val cosEndLat = cos(endLat) val sinStartLat = sin(startLat) val sinEndLat = sin(endLat) val a = cosStartLat * cosEndLat * cos(deltaLon) + sinStartLat * sinEndLat val angularDistance = acos(a) if (angularDistance == 0.0) { return factory.createLineString(arrayOf(start.coordinate, end.coordinate)) } val coordinates = mutableListOf<Coordinate>() for (i in 0..n) { val fraction = i.toDouble() / n val sinAngularDistance = sin(angularDistance) val A = sin((1 - fraction) * angularDistance) / sinAngularDistance val B = sin(fraction * angularDistance) / sinAngularDistance val x = A * cosStartLat * cos(startLon) + B * cosEndLat * cos(endLon) val y = A * cosStartLat * sin(startLon) + B * cosEndLat * sin(endLon) val z = A * sinStartLat + B * sinEndLat val lat = atan2(z, sqrt(x * x + y * y)) val lon = atan2(y, x) coordinates.add(Coordinate(Math.toDegrees(lon), Math.toDegrees(lat))) } return factory.createLineString(coordinates.toTypedArray()) }

This convenient function finds a city in usaCities by name and returns its geometry (point):

fun takeCity(name: String) = usaCities.df.filter { it.name == name }.single().geometry

Use it to take points of New York and Los Angeles:

val newYork = takeCity("New York") val losAngeles = takeCity("Los Angeles")

Count the shortest path between them:

val curveNY_LA = greatCircleLineString(newYork, losAngeles)

Now, let's plot this curve using geoPath, overlaying it on top of the state polygons and highlighting the points corresponding to the cities:

usa48.plot { geoMap { alpha = 0.5 } geoPath(curveNY_LA) { width = 1.5 } geoPoints(listOf(newYork, losAngeles)) { size = 8.0 color = Color.RED } }
usaStatesPlotWithGreatCircle

geoRectangles

The geoRectangles() adds a layer of rectangles constructed using Envelope. The Envelope class represents a rectangular region in the coordinate space, defined by its minimum and maximum coordinates. It is commonly used for bounding boxes, spatial indexing, and efficient geometric calculations.

Let's get usa48 common bounding box:

// `.bounds()` function calculates the minimum bounding box // of all geometries in the `geometry` column of a `GeoDataFrame`, // returning it as an `Envelope` val usa48Bounds: Envelope = usa48.bounds().also { // Use JTS API for in-place envelope expansion it.expandBy(1.0) }

And plot it with the polygon plot:

usa48.plot { geoMap() geoRectangles(usa48Bounds) { alpha = 0.0 borderLine { width = 2.0 color = Color.GREY } } }
usaStatesPlotWithBounds

In addition, geoRectangles also works with polygons and multipolygon. In such cases, the bounding box of each geometry will be calculated and used individually:

usa48.plot { geoMap() geoRectangles() }
usaStatesPlotWithDefaultBounds

Write GeoDataFrame

A GeoDataFrame can be saved to a file in both GeoJSON and Shapefile formats using the GeoDataFrame.write..(filename) functions.

GeoJSON

Let's save the modified GeoDataFrame containing US cities, which was initially in Shapefile format, to a GeoJSON file.

usaCities.writeGeoJson("usa_cities.geojson")
GeoDataFrame.readGeoJson("usa_cities.geojson").plot { geoPoints() }
writeGeoJson2usaCitiesPlotFromGeoJson

Shapefile

Unlike GeoJSON, Shapefile supports only one type of geometry.

Let's save the GeoDataFrame containing the boundaries of US states, which was initially in GeoJSON format and included both polygons and multipolygons, to a Shapefile. To do this, we will first cast all geometries to MultiPolygon.

// All geometries should be the same type (Shapefile restriction), // but we have both `Polygon` and `MultiPolygon`. // Cast them all into MultiPolygons usa48.modify { convert { geometry }.with { when (it) { // Cast `Polygon` to `MultiPolygon` with a single entity is Polygon -> it.toMultiPolygon() is MultiPolygon -> it else -> error("not a polygonal geometry") } } } // All files comprising the Shapefile will be saved to // a directory named "usa_48" and will have the same base name .writeShapefile("usa_48")
GeoDataFrame.readShapefile("usa_48/usa_48.shp").plot { geoMap() }
writeShapefile2usaStatesPlotFromShapefile
Last modified: 17 February 2025