-
Notifications
You must be signed in to change notification settings - Fork 3.2k
fix(ingest/rest): out-of-date structured report being sent #13866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is ❌ Unsupported file formatUpload processing failed due to unsupported file format. Please review the parser error message:
For more help, visit our troubleshooting guide.
📢 Thoughts on this report? Let us know! |
f603f8f
to
4fe28bd
Compare
@@ -805,6 +805,7 @@ | |||
"datahub-gc = datahub.ingestion.source.gc.datahub_gc:DataHubGcSource", | |||
"datahub-debug = datahub.ingestion.source.debug.datahub_debug:DataHubDebugSource", | |||
"datahub-apply = datahub.ingestion.source.apply.datahub_apply:DataHubApplySource", | |||
"datahub-mock-data = datahub.ingestion.source.mock_data.datahub_mock_data:DataHubMockDataSource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having mechanisms to generate mock data is definitely valuable.
What concerns me is coupling that logic directly with an ingestion source.
Would it make more sense to have a general-purpose DataHub source that ingests arbitrary MCE file(s)? That feels more flexible and reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ingesting arbitrary MCE can already be done using
datahub ingest mcps FILE_NAME
- There is already a datahub source which is for migration.
Using the ingestion framework for this allows us to test for various bugs in here and iterate on ingestion framework items which isn't really possible otherwise. This is a separate isolated source which is clearly marked as being there for testing. Unless there are very strong opinions would request this be allowed in.
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thats what I meant, that we should have something as generic as
datahub ingest mcps FILE_NAME` in the ingestion framework.
While I think that would be a more generic approach, I'm approving to unblock.
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
…roject#13866) Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
Fix a bug which was causing structured report to have outdated information. Fix is explained here https://www.loom.com/share/122e6a79cd574f7baae3003a982bd378 As this is a multi-threading bug in the underlying structures being used not sure how to add tests for this.
metadata-ingestion/src/datahub/ingestion/api/sink.py
andmetadata-ingestion/src/datahub/ingestion/sink/datahub_rest.py
which is quite small.Attempt to add an initial
CLAUDE.MD
so AI agents can work with the repository effectivelyAdd
datahub-mock-data
source to be able to produce mock data which helped in reproducing this bug reliably. This will be used later on for more testing of ingestion sources. Heavily unit tested