Skip to content
Change the repository type filter

All

    Repositories list

    • sdk

      Public
      Python
      7534Updated Dec 19, 2025Dec 19, 2025
    • chandra

      Public
      OCR model that handles complex tables, forms, handwriting with full layout.
      Python
      4484k301Updated Dec 19, 2025Dec 19, 2025
    • datalab-on-prem

      Public
      Scripts to run Datalab's self-service on-prem container
      Shell
      1100Updated Dec 13, 2025Dec 13, 2025
    • marker

      Public
      Convert PDF to markdown + JSON quickly with high accuracy
      Python
      2.1k31k30846Updated Nov 19, 2025Nov 19, 2025
    • surya

      Public
      OCR, layout analysis, reading order, table recognition in 90+ languages
      Python
      1.3k19k12811Updated Oct 21, 2025Oct 21, 2025
    • Python
      1000Updated Oct 2, 2025Oct 2, 2025
    • Python
      1301Updated Aug 13, 2025Aug 13, 2025
    • docext

      Public
      An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
      Python
      4600Updated Jun 18, 2025Jun 18, 2025
    • pdftext

      Public
      Extract structured text from pdfs quickly
      Python
      60639126Updated Jun 11, 2025Jun 11, 2025