Skip to content

Conversation

@ThomasVitale
Copy link
Contributor

@ThomasVitale ThomasVitale commented Nov 24, 2025

Fixes gh-98


I tried to map the structure of the Docling project, which uses a dedicated Docling Core package for data types models: https://github.com/docling-project/docling-core. So I introduced a new docling-core module, including some basic documentation. Since we are using the ai.docling.api.serve and ai.docling.client.serve packages for the other two modules, I went with ai.docling.api.core for consistency. But I'm open to other solutions (e.g. ai.docling.core, ai.docling.serve.api and ai.docling.serve.client).

I have added an explicit DoclingDocument class following the same approach used for the Serve APIs: Lombok + Jackson 2 and 3 compatibility. This is probably gonna be one of the hardest classes to maintain manually, but until we come up with a reliable automated process, I guess we'll have to. I initially thought about making it a bit more generic (e.g. using simple Strings rather than enums), but that would reduce its usefulness by a lot. In order to make it useful for downstream scenarios (e.g. integrations with LangChain4j or Spring AI for RAG and agent workflows), we need to have a full type-safe representation.

Finally, I considered how to switch from Map to DoclingDocument in a backward compatible way in the context of DocumentResponse, but in the end I thought it wasn't worth the effort considering we're at very beginning and using the returned Map is pointless without a proper data type. So, I switched DocumentResponse.getJsonContent() to return a DoclingDocument instead of Map<String, Object>.

Thoughts? @edeandrea @lordofthejars

@ThomasVitale ThomasVitale marked this pull request as draft November 24, 2025 20:59
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

TestsPassed ✅SkippedFailed
Gradle Test Results (all modules & JDKs)336 ran336 passed0 skipped0 failed
TestResult
No test annotations available

@github-actions
Copy link

HTML test reports are available as workflow artifacts (zipped HTML).

• Download: Artifacts for this run

@edeandrea
Copy link
Contributor

Thanks @ThomasVitale!

Finally, I considered how to switch from Map to DoclingDocument in a backward compatible way in the context of DocumentResponse, but in the end I thought it wasn't worth the effort considering we're at very beginning and using the returned Map is pointless without a proper data type. So, I switched DocumentResponse.getJsonContent() to return a DoclingDocument instead of Map<String, Object>.

I agree. I think we can still make breaking changes for a bit.

I'm also going to be OOO the rest of this week, so don't let me be a bottleneck if you are planning to work on this. I trust your judgement - feel free to merge things.

@github-actions
Copy link

HTML test reports are available as workflow artifacts (zipped HTML).

• Download: Artifacts for this run

Signed-off-by: Thomas Vitale <[email protected]>
@ThomasVitale
Copy link
Contributor Author

Thanks for the review, @edeandrea!

@ThomasVitale ThomasVitale marked this pull request as ready for review November 24, 2025 22:09
@github-actions
Copy link

HTML test reports are available as workflow artifacts (zipped HTML).

• Download: Artifacts for this run

@edeandrea edeandrea merged commit f874deb into main Nov 25, 2025
21 checks passed
@edeandrea edeandrea deleted the gh-98 branch November 25, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add DoclingDocument core API

3 participants