|
| 1 | +# PDF to Markdown API Documentation |
| 2 | + |
| 3 | +Convert PDF documents and images to high-quality markdown format using vision-language models. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | +- [Features](#features) |
| 7 | +- [Getting Started](#getting-started) |
| 8 | + - [Quickstart](#quickstart) |
| 9 | + - [Installation](#installation) |
| 10 | + - [Web Interface](#web-interface) |
| 11 | + - [API Access](#api-access) |
| 12 | +- [Requirements](#requirements) |
| 13 | +- [Supported Models & Platforms](#supported-models--platforms) |
| 14 | + - [Models with vLLM (Linux)](#models-with-vllm-linux) |
| 15 | + |
| 16 | +## Features |
| 17 | + |
| 18 | +- **LaTeX Equation Recognition**: Convert both inline and block LaTeX equations in images to markdown. |
| 19 | +- **Intelligent Image Description**: Generate a detailed description for all images in the document within `<img></img>` tags. |
| 20 | +- **Signature Detection**: Detect and mark signatures and watermarks in the document. Signatures text are extracted within `<signature></signature>` tags. |
| 21 | +- **Watermark Detection**: Detect and mark watermarks in the document. Watermarks text are extracted within `<watermark></watermark>` tags. |
| 22 | +- **Page Number Detection**: Detect and mark page numbers in the document. Page numbers are extracted within `<page_number></page_number>` tags. |
| 23 | +- **Checkboxes and Radio Buttons**: Converts form checkboxes and radio buttons into standardized Unicode symbols (☐, ☑, ☒). |
| 24 | +- **Table Detection**: Convert complex tables into html tables. |
| 25 | + |
| 26 | +## Getting Started |
| 27 | +### Quickstart |
| 28 | +- [Colab notebook for onprem deployment](https://colab.research.google.com/drive/1uKO70sctH8G59yYH_rLW6CPK4Vj2YmI6?usp=sharing) |
| 29 | + |
| 30 | +### Installation |
| 31 | +```bash |
| 32 | +# create a virtual environment |
| 33 | +## install uv if not installed |
| 34 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 35 | +## create a virtual environment with python 3.11 |
| 36 | +uv venv --python=3.11 |
| 37 | +source .venv/bin/activate |
| 38 | + |
| 39 | +# Install from PyPI |
| 40 | +uv pip install docext |
| 41 | + |
| 42 | +# Or install from source |
| 43 | +git clone https://github.com/nanonets/docext.git |
| 44 | +cd docext |
| 45 | +uv pip install -e . |
| 46 | +``` |
| 47 | + |
| 48 | +### Web Interface |
| 49 | + |
| 50 | +docext includes a Gradio-based web interface for easy document processing: |
| 51 | + |
| 52 | +```bash |
| 53 | +# Start the web interface with default configs |
| 54 | +python -m docext.app.app --model_name hosted_vllm/nanonets/Nanonets-OCR-s |
| 55 | + |
| 56 | +# Start the web interface with custom configs |
| 57 | +python -m docext.app.app --model_name hosted_vllm/nanonets/Nanonets-OCR-s --max_img_size 1024 --concurrency_limit 16 # `--help` for more options |
| 58 | +``` |
| 59 | + |
| 60 | +The interface will be available at `http://localhost:7860` with default credentials: (You can change the port by using `--ui_port` flag) |
| 61 | + |
| 62 | +- Username: `admin` |
| 63 | +- Password: `admin` |
| 64 | + |
| 65 | +Check [Supported Models]() section for more options for the model. |
| 66 | + |
| 67 | +### API Access |
| 68 | + |
| 69 | +```python |
| 70 | +import time |
| 71 | +from gradio_client import Client, handle_file |
| 72 | + |
| 73 | +def convert_pdf_to_markdown( |
| 74 | + client_url: str, |
| 75 | + username: str, |
| 76 | + password: str, |
| 77 | + file_paths: list[str], |
| 78 | + model_name: str = "hosted_vllm/nanonets/Nanonets-OCR-s" |
| 79 | +): |
| 80 | + """ |
| 81 | + Convert PDF/images to markdown using the API |
| 82 | +
|
| 83 | + Args: |
| 84 | + client_url: URL of the docext server |
| 85 | + username: Authentication username |
| 86 | + password: Authentication password |
| 87 | + file_paths: List of file paths to convert |
| 88 | + model_name: Model to use for conversion |
| 89 | +
|
| 90 | + Returns: |
| 91 | + str: Converted markdown content |
| 92 | + """ |
| 93 | + client = Client(client_url, auth=(username, password)) |
| 94 | + |
| 95 | + # Prepare file inputs |
| 96 | + file_inputs = [{"image": handle_file(file_path)} for file_path in file_paths] |
| 97 | + |
| 98 | + # Convert to markdown (non-streaming) |
| 99 | + result = client.predict( |
| 100 | + images=file_inputs, |
| 101 | + api_name="/process_markdown_streaming" |
| 102 | + ) |
| 103 | + |
| 104 | + return result |
| 105 | + |
| 106 | +# Example usage |
| 107 | +# client url can be the local host or the public url like `https://6986bdd23daef6f7eb.gradio.live/` |
| 108 | +CLIENT_URL = "http://localhost:7860" |
| 109 | + |
| 110 | +# Single image conversion |
| 111 | +markdown_content = convert_pdf_to_markdown( |
| 112 | + CLIENT_URL, |
| 113 | + "admin", |
| 114 | + "admin", |
| 115 | + ["assets/invoice_test.pdf"] |
| 116 | +) |
| 117 | +print(markdown_content) |
| 118 | + |
| 119 | +# Multiple files conversion |
| 120 | +markdown_content = convert_pdf_to_markdown( |
| 121 | + CLIENT_URL, |
| 122 | + "admin", |
| 123 | + "admin", |
| 124 | + ["assets/invoice_test.jpeg", "assets/invoice_test.pdf"] |
| 125 | +) |
| 126 | +print(markdown_content) |
| 127 | +``` |
| 128 | +## Requirements |
| 129 | + |
| 130 | +- Python 3.11+ |
| 131 | +- CUDA-compatible GPU (for optimal performance). Use Google Colab for free GPU access. |
| 132 | +- Dependencies listed in requirements.txt |
| 133 | + |
| 134 | +## Supported Models & Platforms |
| 135 | +### Models with vLLM (Linux) |
| 136 | + |
| 137 | +We recommend using the `hosted_vllm/nanonets/Nanonets-OCR-s` model for best performance. The model is trained to do OCR with semantic tagging. But, you can use any other VLM models supported by vLLM. Also, it is a 3B model, so it can run on a GPUs with small VRAM. |
| 138 | + |
| 139 | +Examples: |
| 140 | +| Model | `--model_name` | |
| 141 | +|-------|--------------| |
| 142 | +| Nanonets-OCR-s | `hosted_vllm/nanonets/Nanonets-OCR-s` | |
| 143 | +| Qwen/Qwen2.5-VL-7B-Instruct-AWQ | `hosted_vllm/Qwen/Qwen2.5-VL-7B-Instruct-AWQ` | |
| 144 | +| Qwen/Qwen2.5-VL-7B-Instruct | `hosted_vllm/Qwen/Qwen2.5-VL-7B-Instruct` | |
| 145 | +| Qwen/Qwen2.5-VL-32B-Instruct | `hosted_vllm/Qwen/Qwen2.5-VL-32B-Instruct` | |
0 commit comments