Skip to content

Add check if required functions and tables for layer are existing #370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 17, 2021

Conversation

Falke-Design
Copy link
Contributor

@Falke-Design Falke-Design commented Sep 11, 2021

Add support for layers to require external SQL functions and tables.
Fixes openmaptiles/openmaptiles#1220

This PR changes the meaning of the layer / requires field in the <layer>.yaml file:

layer:
  requires:
     # each value can be a string or a list of strings
     layers: 'building'
     tables: ['table1','table2']
     functions: 'fnc1(TEXT, TEXT)'

For backward compatibility, requires can still be a string or a list of strings, in which case that value will be treated as a list of required layers.

Sample SQL that gets generated before all schema sql. PostgreSQL will raise an error if the required table/function is missing.

-- Assert table1 exists
SELECT 'table1'::regclass;

-- Assert fnc1 exists
SELECT 'fnc1(TEXT, TEXT)'::regprocedure;

Copy link
Member

@nyurik nyurik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! There are a few things I think need changing, but this is a great start.

  • I don't think we should create global functions just to validate something. All SQL here is dynamically generated, so we can write out any validation logic without using functions. For example, SELECT 'table'::regclass; would validate that the table exists and raise an error if it does not. Also, validating functions should include function signature, e.g. myfunc(TEXT) See comment with proposed code change.
  • we need to validate yaml during the initial parsing
  • naming - i'm not sure what we should name these vars. One idea is to call them external_tables -- implying that these tables come from another layer. We already use requires to indicate external layers, so having required_tables might be confusing? Or not.

@Falke-Design
Copy link
Contributor Author

How should validating while initial parsing work? At this moment no tables / functions are generated, so it can't checked if they are existing. The checks can only be made while calling the sql of the layer and all steps before are done (like create table).

What if we make a breaking change and make the structure like that:

required:
  layers: []
  tables: []
  functions: []

@nyurik
Copy link
Member

nyurik commented Sep 12, 2021

By "validate" i meant check that yaml is correct. Layer.__init__ does all yaml validation. I just realized that all properties are also tested this way using validate_properties(). So lets keep the properties, but maybe just name them a little bit different - there should be no get in a property name (properties are like nouns, and functions are like verbs) - requires_tables, requires_functions.

I'm ok to redefine requires to be a dictionary. We can easily make it backwards compatible -- if requires is a string, treat it as a list with a single element. If requires is a list, treat it as a dictionary with a list of layers. See how __init__ parses requires.

@Falke-Design Falke-Design requested a review from nyurik September 13, 2021 15:02
Copy link
Member

@nyurik nyurik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a few more comments. Overall, I think it will be simpler to create a helper function that tests and normalizes just one property, and call it from the __init__:

requires = self.definition['layer'].get('requires', {})
if not isinstance(requires, dict):
    requires['layers'] = requires
else:
    requires = requires.copy()  # dict will be modified to detect unrecognized properties

err = "If set, 'requires' parameter must be a map with optional 'layers', 'tables', and 'functions' sub-elements. Each sub-element must be a string or a list of strings. If 'requires' is a list or a string itself, it is treated as a list of layers. "

self.requires_layers = get_requires_prop(requires, 'layers', err + "'requires.layers' must be an ID of another layer, or a list of layer IDs.")
self.requires_tables = get_requires_prop(requires, 'tables', err + "'requires.tables' must be the name of a PostgreSQL table or a view, or a list of tables/views")
self.requires_functions = get_requires_prop(requires, 'functions', err + "'requires.functions' must be a PostgreSQL function name with parameters or a list of functions. Example: 'myfunc(TEXT, TEXT)' ")

if requires:
  # get_requires_prop will delete the key it handled. Remaining keys are errors.
  raise ValueError("Unrecognized sub-elements in the 'requires' parameter:" + str(list(requires.keys()))

The get_requires_prop would do these steps:

  • normalize string -> list
  • verify all values are non-empty
  • raise an error with the given error message if needed
  • delete property from the dictionary
  • return normalized value

In the future it would be possible to add a regex as a parameter to validate the actual values

@Falke-Design
Copy link
Contributor Author

I will change it.
Do you now want that the layer.yaml looks like that:

requires:
  layers: []
  tables: []
  functions: []

or that?:

requires: deprecated warning and internal mapping to requires_layers
requires_layers: []
requires_tables: []
requires_functions: []

What I not really understand is why there are @propertys if we don't define all possible properties? Like you says requires_* should now checked in __init__ but then they are not defined as property anymore?

@nyurik
Copy link
Member

nyurik commented Sep 14, 2021

sorry for confusion - I think it is cleaner to have YAML as a subtree -- makes it more readable and easier to change:

requires:
  layers: []
  tables: []
  functions: []

but in python, it is better to have dedicated class fields for each one -- fields are easier to work with and more predictable than having a generic dictionary - hence requires_layers, ...

There is little difference between a @property and a class field from the usage perspective -- both are used with a dot -- layer.some_property. I think in this specific case a field is better than a property because you want to validate that requires has only known sub-elements, so might as well do it all at once, but you are welcome to do them as properties too.

@Falke-Design
Copy link
Contributor Author

I updated now the code and the wanted chages. Thanks for the review, it is now much cleaner!

In my eyes I would change one thing. You suggested to move the requires parsing into the __init__ I think it would be better to do this in a own function too. Because now we have this code twice, one time for tileset.py and the other one in the test_sql.py.
With a own function it would be possible to call it from test_sql.py.

Also I tried to add tests if it throws Exception with wrong inputs but it was not possible for me.

@nyurik
Copy link
Member

nyurik commented Sep 14, 2021

why do you need to duplicate code in a test?

@Falke-Design
Copy link
Contributor Author

In the test_sql.py the function query needs the parsed requires fields to be added to the Case. Currently the only way to parse requires is to copy the __init__ logic, because there are tests with the old, new, combined requires field.

It would be cleaner to get the parsed data from the tileset-layer and add it to the Case but i have not found a way. (Maybe the id can be used to find the correct layer in the tileset. But now I go to bed)

@nyurik
Copy link
Member

nyurik commented Sep 15, 2021

ok, thx, i see the issue. I think the simplest solution would be to refactor the test file a bit to cover this usecase - I can do it myself tomorrow if its ok with you. The rest of the code looks good except for one thing -- please add a property requires:

    @property
    @deprecated(version='5.4.0', reason='use requires_layers property instead')
    def requires(self) -> List[str]:
        return self.requires_layers

@Falke-Design Falke-Design requested a review from nyurik September 15, 2021 16:26
@nyurik
Copy link
Member

nyurik commented Sep 15, 2021

I reworked your tests a bit, please let me know if you are ok with the changes, and I can merge things once all tests pass (optionally we might want another person to review)

@nyurik nyurik requested a review from TomPohys September 15, 2021 21:35
@nyurik
Copy link
Member

nyurik commented Sep 16, 2021

nope, found another bug (fixed) -- to_sql gets called for each schema (SQL file) in a layer. We don't want to assert for each schema, we just want to do it once per layer.

self._test("a17", [c11], dict(c11=[c11]))
self._test("a18", [c12], dict(c12=[c12]))

def _parse_reqs(self, reqs, expected_layers, expected_tables, expected_funcs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be _parse_props not better instead _parse_reqs because you test here all properties not only requires

@Falke-Design
Copy link
Contributor Author

@nyurik Nice work! Thank you for your help

@nyurik
Copy link
Member

nyurik commented Sep 16, 2021

@Falke-Design thx, updated. Could you update the first comment to describe the current design? I will copy that comment into the squash commit message when ready.

@Falke-Design
Copy link
Contributor Author

@nyurik Done. Is this ok so? Else can you please change it?

@TomPohys
Copy link
Member

Thank you both for working on this! It looks great!

@TomPohys TomPohys merged commit c0353a7 into openmaptiles:master Sep 17, 2021
TomPohys pushed a commit to openmaptiles/openmaptiles that referenced this pull request Sep 21, 2021
Because of the parallel processing of `import-sql` the error message is printed in the middle of the output. With this PR the error is displayed again at the end:
![grafik](https://user-images.githubusercontent.com/19800037/133897373-656ed3d0-e580-4f67-9290-0e83949d63d7.png)

Issue: openmaptiles/openmaptiles-tools#370

```
awk '1{print; fflush()} $$0~".*ERROR" {txt=$$0} END{ if(txt){print "\n*** ERROR detected, aborting:"; print txt; exit(1)} }'
```
Explanation:
- `1{print; fflush()}` means if true then print output -> `1{}` same as `if(true){}`
- `$$0~".*ERROR" {txt=$$0}` get first argument `$$0` (line of output) and check if it contains (regex string) `ERROR`. If true save line to var `txt`: `{txt=$$0}`
- `END{ ... }` if last line of output is reached
- `if(txt){print "\n*** ERROR detected, aborting:"; print txt; exit(1)}` if error was found in a line / var `txt` is existing print it out
TomPohys pushed a commit to openmaptiles/openmaptiles that referenced this pull request Nov 5, 2021
**NOTE** this can only be merged after the next tools version is released.

Added required Postgres tables to the `<layer>.yaml` definition.
Close: #1220

PR of tools: openmaptiles/openmaptiles-tools#370
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raise a helpful error if Natural Earth data is not imported
3 participants