Closed as not planned
Description
Bug
When extracting tables from the attached PDF (table_inside_cell.pdf) using pymupdf4llm, I observed that information is duplicated in the output (pymupdf4llm-table_inside_cell.md). Specifically, when there is a table nested within a cell of another table, the content appears multiple times in the extracted result.
Steps to reproduce
- Run pymupdf4llm on the attached table_inside_cell.pdf.
- Review the output.
- Observe that the content from the nested table is duplicated in the output.
import pymupdf4llm
def extract_with_pymupdf4llm(file_name):
text = pymupdf4llm.to_markdown(file_name)
return text
if name == "main":
file = "PDF_PATH"
text = extract_with_pymupdf4llm(file)
print(text)