-
Notifications
You must be signed in to change notification settings - Fork 8
Add a utility to sync address formats from OpenCage. Update address formats. #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…args is non-empty
Did you just PR what is essentially a full AST parser for mustache, just for fun? Damn, I love open source development sometimes... If I may, I'll leave some review comments now but I'm happy to work these in, too - definitely going to merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok the parser is fun to read ... I'd vaguely gotten to "this is probably a more complex problem than a regex" in my head, but didn't anticipate that parsing it fully was a possibility. Nice. We should make sure the OpenCage people know about this.
|
||
OPENCAGE_FORMAT_YAML_TEMPLATE = 'https://raw.githubusercontent.com/OpenCageData/address-formatting/{hash}/conf/countries/worldwide.yaml' | ||
OPENCAGE_REF = 'refs/heads/master' | ||
FORMATS_DEST_PATH = CODE_PATH / "addresses" / "formats.yml" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to the top?
commit_hash = subprocess.check_output( | ||
["git", "ls-remote", "--heads", "https://github.com/OpenCageData/address-formatting.git", OPENCAGE_REF] | ||
).decode("utf-8").split()[0] | ||
|
||
github_raw_url = OPENCAGE_FORMAT_YAML_TEMPLATE.format(hash=commit_hash) | ||
opencage_yaml_text = requests.get(github_raw_url).text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost feels a bit overengineered (seeing subprocess
imported at the top made me tingly): do we need to seek the hash, or can we just fetch https://raw.githubusercontent.com/OpenCageData/address-formatting/refs/heads/master/conf/countries/worldwide.yaml
(even in the Makefile)?
# these don't exist in the rigour yaml -- by design? | ||
DROP_VARS = {"hamlet", "place"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall nixing them, but that may be my brain. In general there's a light discrepancy between the vocab in the YAML and what we do in our code, and the idea has been to expose more parameters as we see them in the wild.
I got rid of the |
Amazing, Bill! I'll play with a bit more tomorrow and then get it into the next release :) |
Per discussion in #25 (comment), this PR adds a utility to keep the address formats in sync with OpenCage.
I couldn't find a straightforward way to convert the templates from mustache to jinja with regex or similar. Parsing the mustache seemed like the only way to convert.
The
rigour
address templates are missing thehamlet
andplace
address fields. I don't know if that is by design. The sync script supports dropping those fields.It seems like there have been quite a few changes to the OpenCage templates since they were last synced.