Skip to content

pymupdf4llm.to_markdown() returns empty output in 0.0.25 (worked in 0.0.17) #289

Closed
@azhurb

Description

@azhurb

Summary

I'm encountering an issue where pymupdf4llm.to_markdown() returns an empty string for a specific PDF in version 0.0.25 (and also 0.0.24). However, the same file works correctly in version 0.0.17.

Environment

  • pymupdf4llm: 0.0.25
  • Python: 3.11.2

PDF Characteristics

Unfortunately, I cannot share the exact file here due to confidentiality (I can provide it privately). However, here are some relevant details:

  • The PDF has images in the header.
  • The body contains text (visibly selectable in a PDF viewer).
  • The output of pdffonts is:
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
Courier                              Type 1            WinAnsi          no  no  no      22  0
Courier-Bold                         Type 1            WinAnsi          no  no  no      23  0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions