Closed
Description
Summary
I'm encountering an issue where pymupdf4llm.to_markdown()
returns an empty string for a specific PDF in version 0.0.25
(and also 0.0.24
). However, the same file works correctly in version 0.0.17
.
Environment
pymupdf4llm
: 0.0.25- Python: 3.11.2
PDF Characteristics
Unfortunately, I cannot share the exact file here due to confidentiality (I can provide it privately). However, here are some relevant details:
- The PDF has images in the header.
- The body contains text (visibly selectable in a PDF viewer).
- The output of
pdffonts
is:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Courier Type 1 WinAnsi no no no 22 0
Courier-Bold Type 1 WinAnsi no no no 23 0
Metadata
Metadata
Assignees
Labels
No labels