You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix getTextBetweenParagraphs to check for invalid page numbers in outline items
Fix ParagraphPdfDocumentReader to reliably extract text from PDFs with imperfect outlines and coordinate edge cases
Test : Add test to validate ParagraphPdfDocumentReader to skip Invalid Outline
Auto-cherry-pick to 1.0.x
Fixes#3421
Signed-off-by: WOONBE <[email protected]>
Copy file name to clipboardExpand all lines: document-readers/pdf-reader/src/main/java/org/springframework/ai/reader/pdf/ParagraphPdfDocumentReader.java
+46-48Lines changed: 46 additions & 48 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
/*
2
-
* Copyright 2023-2024 the original author or authors.
2
+
* Copyright 2023-2025 the original author or authors.
3
3
*
4
4
* Licensed under the Apache License, Version 2.0 (the "License");
5
5
* you may not use this file except in compliance with the License.
@@ -46,6 +46,7 @@
46
46
* The paragraphs are grouped into {@link Document} objects.
Copy file name to clipboardExpand all lines: document-readers/pdf-reader/src/test/java/org/springframework/ai/reader/pdf/ParagraphPdfDocumentReaderTests.java
+55-1Lines changed: 55 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
/*
2
-
* Copyright 2023-2024 the original author or authors.
2
+
* Copyright 2023-2025 the original author or authors.
3
3
*
4
4
* Licensed under the Apache License, Version 2.0 (the "License");
5
5
* you may not use this file except in compliance with the License.
0 commit comments