-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Add DoclingDocument API in new docling-core module #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||||||||||||||
|
HTML test reports are available as workflow artifacts (zipped HTML). • Download: Artifacts for this run |
Fixes gh-98 Signed-off-by: Thomas Vitale <[email protected]>
|
Thanks @ThomasVitale!
I agree. I think we can still make breaking changes for a bit. I'm also going to be OOO the rest of this week, so don't let me be a bottleneck if you are planning to work on this. I trust your judgement - feel free to merge things. |
|
HTML test reports are available as workflow artifacts (zipped HTML). • Download: Artifacts for this run |
Signed-off-by: Thomas Vitale <[email protected]>
|
Thanks for the review, @edeandrea! |
|
HTML test reports are available as workflow artifacts (zipped HTML). • Download: Artifacts for this run |
Fixes gh-98
I tried to map the structure of the Docling project, which uses a dedicated Docling Core package for data types models: https://github.com/docling-project/docling-core. So I introduced a new
docling-coremodule, including some basic documentation. Since we are using theai.docling.api.serveandai.docling.client.servepackages for the other two modules, I went withai.docling.api.corefor consistency. But I'm open to other solutions (e.g.ai.docling.core,ai.docling.serve.apiandai.docling.serve.client).I have added an explicit
DoclingDocumentclass following the same approach used for the Serve APIs: Lombok + Jackson 2 and 3 compatibility. This is probably gonna be one of the hardest classes to maintain manually, but until we come up with a reliable automated process, I guess we'll have to. I initially thought about making it a bit more generic (e.g. using simple Strings rather than enums), but that would reduce its usefulness by a lot. In order to make it useful for downstream scenarios (e.g. integrations with LangChain4j or Spring AI for RAG and agent workflows), we need to have a full type-safe representation.Finally, I considered how to switch from
MaptoDoclingDocumentin a backward compatible way in the context ofDocumentResponse, but in the end I thought it wasn't worth the effort considering we're at very beginning and using the returned Map is pointless without a proper data type. So, I switchedDocumentResponse.getJsonContent()to return aDoclingDocumentinstead ofMap<String, Object>.Thoughts? @edeandrea @lordofthejars