Fix page_range stopping at page 32 when start >= 30 #2658
+144
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
page_rangeparameter stops prematurely at page 32 when the range starts from page 30 or higher. For example,page_range=(30, 35)extracts only pages 30-32 instead of 30-35.Root Cause
The drain loop uses a hardcoded
batch_size = 32to pull processed pages from the output queue. This creates an effective ceiling at page_no 31 (page 32 in 1-indexed terms) when combined with the page range filtering logic.Changes
Core fix:
batch_size: int = 32tobatch_size: int = total_pagesin the drain loopsdocling/pipeline/standard_pdf_pipeline.py:548docling/experimental/pipeline/threaded_layout_vlm_pipeline.py:255Tests:
tests/test_page_range_bug.pywith coverage for:Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
astral.shcurl -LsSf REDACTED(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
page_rangeparameter stops prematurely at page 32 when starting from page 30+ #2655💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.