Skip to content

Add a utility to sync address formats from OpenCage. Update address formats. #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

b-gran
Copy link

@b-gran b-gran commented May 2, 2025

Per discussion in #25 (comment), this PR adds a utility to keep the address formats in sync with OpenCage.

I couldn't find a straightforward way to convert the templates from mustache to jinja with regex or similar. Parsing the mustache seemed like the only way to convert.

The rigour address templates are missing the hamlet and place address fields. I don't know if that is by design. The sync script supports dropping those fields.

It seems like there have been quite a few changes to the OpenCage templates since they were last synced.

@pudo
Copy link
Member

pudo commented May 4, 2025

Did you just PR what is essentially a full AST parser for mustache, just for fun? Damn, I love open source development sometimes...

If I may, I'll leave some review comments now but I'm happy to work these in, too - definitely going to merge this.

Copy link
Member

@pudo pudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok the parser is fun to read ... I'd vaguely gotten to "this is probably a more complex problem than a regex" in my head, but didn't anticipate that parsing it fully was a possibility. Nice. We should make sure the OpenCage people know about this.

Comment on lines 367 to 371

OPENCAGE_FORMAT_YAML_TEMPLATE = 'https://raw.githubusercontent.com/OpenCageData/address-formatting/{hash}/conf/countries/worldwide.yaml'
OPENCAGE_REF = 'refs/heads/master'
FORMATS_DEST_PATH = CODE_PATH / "addresses" / "formats.yml"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to the top?

Comment on lines 383 to 388
commit_hash = subprocess.check_output(
["git", "ls-remote", "--heads", "https://github.com/OpenCageData/address-formatting.git", OPENCAGE_REF]
).decode("utf-8").split()[0]

github_raw_url = OPENCAGE_FORMAT_YAML_TEMPLATE.format(hash=commit_hash)
opencage_yaml_text = requests.get(github_raw_url).text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost feels a bit overengineered (seeing subprocess imported at the top made me tingly): do we need to seek the hash, or can we just fetch https://raw.githubusercontent.com/OpenCageData/address-formatting/refs/heads/master/conf/countries/worldwide.yaml (even in the Makefile)?

Comment on lines 12 to 13
# these don't exist in the rigour yaml -- by design?
DROP_VARS = {"hamlet", "place"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall nixing them, but that may be my brain. In general there's a light discrepancy between the vocab in the YAML and what we do in our code, and the idea has been to expose more parameters as we see them in the wild.

@b-gran
Copy link
Author

b-gran commented May 4, 2025

I got rid of the subprocess stuff and moved the fetching to the Makefile, following the pattern of the rest of genscripts.

@pudo
Copy link
Member

pudo commented May 4, 2025

Amazing, Bill! I'll play with a bit more tomorrow and then get it into the next release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants