Skip to content

Filter invalid Singer Messages in the worker #236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 12, 2020

Conversation

cgardens
Copy link
Contributor

What

  • As a compromise in this PR Add postgres tap bugfixes & integration tests #229 @sherifnada turned on allowing additionalProperties in the singer models in the singer protocol. Due to some idiosyncrasies in Json2Pojo, it means effective nothing gets validated by its parser anymore. We want to retain validation that the fields we care about still look right.

How

  • This PR reused the validator that we use in the API to run these validations.
  • It wraps it in a predicate for testing purposes that is used in the tap stream.

Gotcha

  • There's a really awful hack having to do with java resources see inline comment for more details.

@@ -7,6 +7,7 @@ dependencies {
}

jsonSchema2Pojo {
source = files("${sourceSets.main.output.resourcesDir}/singer_json")
Copy link
Contributor Author

@cgardens cgardens Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hack: The problem here is that if this path is just json when we execute code in dataline-workers, the invocation of prepareSchemas that runs in SingerConfigSchema, only finds the resources from dataline-config:models, NOT dataline-singer. It seems like because they store their configs in the same relative path in their resources, one clobbers the other.

This seems to be an idiosyncrasy with resources that I don't fully understand. This unblocks us for now. I am going to read more carefully about java resources and hopefully come up with a less spooky solution. In the meanwhile, it seems like we need our config style packages to not store their json configs in paths with the same names.

@cgardens cgardens merged commit c516ef2 into master Sep 12, 2020
@swyxio swyxio deleted the cgardens/validate_messages2 branch October 11, 2022 15:53
yasir1brahim pushed a commit to yasir1brahim/airbyte that referenced this pull request Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants