Skip to content

Switching OSM Carto to use the osm2pgsql flex output #4977

Closed
@joto

Description

@joto

Osm2pgsql has been moving away from the old "pgsql" output for years now. The new output can do everything the old code can do and much much more. All new development is there, the old code will not get any new features. The OSM Carto project is the last major user of the "pgsql" output.

We want to get rid of the "pgsql" output in osm2pgsql at some point, which allows us to simplify osm2pgsql internally. This will not happen tomorrow, we'll leave plenty of time for OSM Carto and other users to switch. But we have to get started on moving installations over to the flex output.

Advantages of the switch include:

  • Potential for more flexible OSM Carto setup. Of course the OSM Carto project can decide whether they want to make use of those features.
  • Potential for OSM Carto derived styles to use new features even if OSM Carto itself doesn't use them.
  • Allows bringing OSM Carto, Nominatim and other data layouts (for instance for vector tiles) into the same database.

Instead of the openstreetmap-carto.style and openstreetmap-carto.lua config files there is now a single config file openstreetmap-carto-flex.lua. The command line for osm2pgsql will change to use the flex output and the new config file. Everything else should be pretty much the same. The database layout is 100% compatible. No changes to the styles or SQL queries are needed.

Updates are totally seemless. You can keep an existing database created with the pgsql output and keep updating it now with the new flex-based configuration.

The two versions of the config files can be used side-by-side for a while if that's what OSM Carto maintainers want. The documentation can explain both options. Or we can switch over at some point.

Osm2pgsql version needed

You need at least version 1.8.0 of osm2pgsql which is available in Debian Stable, Ubuntu 24.04 has version 1.11.0.

Command line

The command line used will change. Only the output type (-O flex) and the config file have to be set.

Old command line (from INSTALL.md):

osm2pgsql -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua -d gis ~/path/to/data.osm.pbf

New command line:

osm2pgsql -O flex --style openstreetmap-carto-flex.lua -d gis ~/path/to/data.osm.pbf

Changes in database layout

The database layout have very little changes. The id columns (osm_id) and geometry columns (way) on all tables will get the NOT NULL flag when using the flex output. These have always been NOT NULL in practice anyway, so this isn't a problem.

Indexes

Currently several custom indexes have to be generated after import, see the indexes.yml and indexes.sql files.

The flex output can be configured to create those indexes. This means we can get rid of some more of the config files and the scripts/indexes.py script. If osm2pgsql is configured to create those indexes it will do so after the import is finished, running several CREATE INDEX commands concurrently (how many depends on command line options).

Open issues:

  • Indexes can currently not be named in the flex config, PostgreSQL will name them with generic name (something like planet_osm_polygon_way_idx3 instead of planet_osm_polygon_way_area_z10. A change for osm2pgsql to allow setting the name is being worked on.
  • Difference to the pgsql output if manual indexes are set: The fillfactor on the "main" geometry index is not set any more. For some background see Making indexes more flexible osm2pgsql-dev/osm2pgsql#1780 .

Question: Do we want to keep the old way of generating indexes or let osm2psqgl handle them? We can also make this optional in some way, having a flag in the config file that will trigger creation of the indexes.

Changes in database content

The content of the resulting tables look the same as before. The only exception is that in some cases rounding for the way_area column is different, so you'll get slightly different values. This should not affect the use in any major way.

Tags named z_order are handled slightly different, but those tags are bogus anyway and this should not have any effect. (I removed all z_order tags from the planet a few days ago now anyway...)

The old setup would allow objects with a layer tag and either no other tags or only tags that are ignored (such as fixme) to show up as database entries with all columns NULL or empty. This is no longer the case.

I have verified that the resulting database is the same by running both old and new configurations side by side on all of the planet data and not seen any differences beyond those described above.

Setting layer column

Most tags are used "as is" in their respective database columns. An exception is the layer which is an integer column. It gets some special treatment in the Lua code. The current code does the same as before, but it doesn't have to.

It would be a small change to use layer 0 instead of NULL when the layer is not set. This would allow the SQL queries to be simplified a little bit: We don't need COALESCE(layer,0) any more which is used in several places.

We'd probably want to keep the SQL code as it is for now, so users are not forced to re-import.

Themepark spport

Themepark is a framework for writing osm2pgsql Lua configs. It allows mixing several configurations so that one database can support several different table layouts and use cases at the same time.

The OSM Carto configuration is written in a way that it can be used with or without the Lua framework. Using it without the framework is just as easy as with the pgsql output before, you just specify the Lua config file on the command line as described above.

If you want to use it with the framework the setup is slightly more involved, but you have the advantage that you can then have tables of different layout in the same database.

Performance

From my measurements performance is about 20% to 25% better than before. I have measured this by importing various planet extracts without the --slim option and without creating all the extra indexes. Because index creation takes a lot of time, numbers will not be as good with --slim and the indexes.

Open Question: Derived styles

Some styles are derived from OSM Carto, such as OSM Carto Germany. How are these affected? What can we do to make life easier for these kind of styles?

@giggls @hholzgra

History

The changes proposed here are based on the efforts started by @pnorman in #4112 (see also the PR #4431). Those efforts have stalled since. One reason, I believe, was that those efforts switched not only from the "pgsql" to the "flex" output, but contained also other changes. That's why this change goes to quite some lengths to keep everything as compatible as possible.

Thank you, Paul, for starting this effort so many years ago. I used your code as a starting point, but there are a lot of changes due to my more limited goal, changes in osm2pgsql since then, and some performance improvements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions