Skip to content

PdfReader unable to read PDF files larger than 2GB #1291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mvgian opened this issue Apr 18, 2025 · 1 comment
Open

PdfReader unable to read PDF files larger than 2GB #1291

mvgian opened this issue Apr 18, 2025 · 1 comment
Labels

Comments

@mvgian
Copy link

mvgian commented Apr 18, 2025

Describe the bug

When attempting to read PDF file larger than 2GB, the below exception is thrown. Would it be possible for OpenPDF to load large PDF files?

com.lowagie.text.pdf.PdfException: The PDF file is too large. Max 2GB. Size: 2264230616 at com.lowagie.text.pdf.MappedRandomAccessFile.init(MappedRandomAccessFile.java:137) at com.lowagie.text.pdf.MappedRandomAccessFile.<init>(MappedRandomAccessFile.java:88) at com.lowagie.text.pdf.RandomAccessFileOrArray.<init>(RandomAccessFileOrArray.java:139) at com.lowagie.text.pdf.RandomAccessFileOrArray.<init>(RandomAccessFileOrArray.java:90) at com.lowagie.text.pdf.PRTokeniser.<init>(PRTokeniser.java:111) at com.lowagie.text.pdf.PdfReader.<init>(PdfReader.java:179) at com.lowagie.text.pdf.PdfReader.<init>(PdfReader.java:167)

To Reproduce

Invoke new PDFReader() to a pdf with file size larger than 2GB

System

  • OS: MacOS 15.4
  • OpenPDF version: 2.0.3
@mvgian mvgian added the bug label Apr 18, 2025
@StevenStreasick
Copy link
Contributor

Looking at the source code, it appears that the 2GB limit is selected, no matter the device, as the max PDF size when loading large PDF files. It appears that this was selected because of how the MappedRandomAccessFile class works, storing the FileChannel containing the PDF as a MappedByteBuffer, which both have a 2GB limit (size of an int) imposed by the Java library.

One potential solution would be to chunk the PDF into several MappedByteBuffers, each with a size less than 2GB, within the MappedRandomAccessFile. This is probably the 'simplest' fix.

Another potential solution could be to 'Stream' the PDF, but this would lose the ability to random read the file. However, I am not familiar enough with this code base to determine the effects that this option would have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants