Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
245 changes: 245 additions & 0 deletions docs/docs/integrations/document_loaders/polaris_ai_datainsight.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "433f5422ad8e1efa",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"# Polaris AI DataInsight\n",
"\n",
"> [Polaris AI DataInsight](https://datainsight.polarisoffice.com/playground) is a document parser that extracts document elements (text, images, complex tables, charts, etc.) from various file formats into structured JSON, making them easy to integrate into RAG systems.\n",
"\n",
"This notebook covers how to get started with `PolarisAIDatainsightLoader`.\n",
"\n",
"\n",
"## Installation\n",
"\n",
"Install `langchain-polaris-ai-datainsight` package."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0ae97af4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.1.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU --pre langchain-polaris-ai-datainsight"
]
},
{
"cell_type": "markdown",
"id": "e6e5941c",
"metadata": {},
"source": [
"## Environment Setup\n",
"\n",
"Make sure to set the following environment variables:\n",
"\n",
"- `POLARIS_AI_DATA_INSIGHT_API_KEY`: Your Polaris AI DataInsight API key. Read [Polaris AI DataInsight Documentation](https://datainsight.polarisoffice.com/api/keys) to get your API key."
]
},
{
"cell_type": "markdown",
"id": "21e72f3d",
"metadata": {},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a05efd34",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"POLARIS_AI_DATA_INSIGHT_API_KEY\"] = getpass.getpass(\n",
" \"Enter your PolarisAIDataInsight API key: \"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2b914a7b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" --------- < Page Content > --------- \n",
"2025 Seed Program Application \n",
"\n",
"I. Funding Information by Track\n",
"\n",
"1. Beginning and Advanced Track Comparison Overview\n",
"\n",
"<table><tbody><tr><td>Category</td><td>Beginning Track*</td><td>Advanced Track*</td></tr><tr><td>Funding target</td><td>A university located outside Korea that has a Central Grant Management Department, an existing Korean Studies infrastructure, and plans to establish an education foundation.</td><td>A non-Korean university with a Central Grant Management Department, at least one full-time Korean Studies faculty member, an undergraduate Korean Studies major or department, and commitment to supporting Korean Studies.</td></tr><tr><td>Funding period</td><td>3 years</td><td>5 years<3+2years></td></tr><tr><td>Funding size</td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 200 million</td></tr><tr><td>B</td><td>Up to KRW 50 million</td></tr></tbody></table></td><td>Maximum possible funding depends on the applicant university’s country<br><table><tbody><tr><td>Country Group*</td><td>Maximum Funding**</td></tr><tr><td>A</td><td>Up to KRW 150 million</td></tr><tr><td>B</td><td>Up to KRW 90 million</td></tr></tbody></table></td></tr><tr><td>Required project content</td><td>·\tFund 2 or more scholarship students<br>·\tOffer 1 or more regular Korean Studies lecture courses (Excluding Korean language courses)<br>·\tHold 1 or more workshops per year in which that students may participate</td><td>·\tHire 1 or more Korean Studies full-time faculty<br>·\tFund 1 or more scholarship student for Korean Studies<br>·\tOffer 2 or more regular graduate-level Korean Studies lecture courses (Excluding Korean language courses)<br>·\tHold 1 or more international Korean Studies conference<br>·\tEstablish and manage a website, blog, or social media relating to the program </td></tr><tr><td>Recommended content</td><td>·\tFoster talent (education)<br>·\tEstablish a Korean Studies research institute/center<br>·\tEstablish Korean Studies undergraduate department/major & program<br>·\tDevelop Korean Studies textbooks<br>·\tHold academic activities</td><td>·\tFoster talent (education)<br>·\tEstablish a Korean Studies research institute/center<br>·\tEstablish Korean Studies M.A/Ph.D. department/major & program<br>·\tDevelop Korean Studies textbooks<br>·\tHold academic activities</td></tr></tbody></table>\n",
"\n",
"<img id=\"di.image.im12\" data-category=\"image\"/>\n",
"\n",
" 2 / 3\n",
"\n",
"\n",
" --------- < Metadata > --------- \n",
"{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te0': {'id': 'di.text.te0', 'type': 'text'}, 'di.text.te2': {'id': 'di.text.te2', 'type': 'text'}, 'di.table.ta9': {'id': 'di.table.ta9', 'type': 'table'}, 'di.image.im12': {'id': 'di.image.im12', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image12.png'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}\n",
"\n",
"\n",
" --------- < Page Content > --------- \n",
"2025 Seed Program Application \n",
"\n",
"II. Review and Selection \n",
"\n",
"1. Review Process\n",
"\n",
"<img id=\"di.image.im13\" data-category=\"image\"/>\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"Review of whether the basic requirements for application have been met\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"Review of the Project Proposal\n",
"\n",
"Admistered by the Expert Review Team\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"Final review and decision\n",
"\n",
"Admistered by the Comprehensive Review Committee\n",
"\n",
"\n",
"\n",
"1. Preliminary Review\n",
"\n",
"\n",
"\n",
"2. Content Review (80 pts)\n",
"\n",
"\n",
"\n",
"3. Comprehensive Review (20 pts)\n",
"\n",
"2. Review Stages and Content\n",
"\n",
"Stage 1: Preliminary Review\n",
"\n",
"Conducted by Main Department\n",
"\n",
"●\tVerifies document submission, eligibility, and overlapping support.\n",
"\n",
"●\tApplications missing required documents, signatures, or failing to meet eligibility do not proceed.\n",
"\n",
"●\tApplications with Indirect Expenses over 10% of Direct Expenses (including Labor Expenses) are rejected.\n",
"\n",
"Stage 2: Content Review\n",
"\n",
"Conducted by Expert Review Team\n",
"\n",
"●\tOnline review: Points given individually\n",
"\n",
"●\tPanel review: Points determined by consensus\n",
"\n",
"●\tAssesses leadership potential, capacity, and project plans.\n",
"\n",
"●\tItems and scores assigned for evaluation.\n",
"\n",
"<table><tbody><tr><td>Areas</td><td>Items (Points)</td><td>Content</td></tr></tbody></table>\n",
"\n",
" 2 / 3\n",
"\n",
"\n",
" --------- < Metadata > --------- \n",
"{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.text.te10': {'id': 'di.text.te10', 'type': 'text'}, 'di.text.te12': {'id': 'di.text.te12', 'type': 'text'}, 'di.image.im13': {'id': 'di.image.im13', 'type': 'image', 'src': '/home/jenkins_agent/Project/langchain/docs/docs/integrations/document_loaders/example_data/tmp/tmpaynkptxx/polaris_ai_example.docx_image13.png'}, 'di.text.sh15': {'id': 'di.text.sh15', 'type': 'text'}, 'di.text.sh16': {'id': 'di.text.sh16', 'type': 'text'}, 'di.text.sh16te0': {'id': 'di.text.sh16te0', 'type': 'text'}, 'di.text.sh17': {'id': 'di.text.sh17', 'type': 'text'}, 'di.text.sh18': {'id': 'di.text.sh18', 'type': 'text'}, 'di.text.sh19': {'id': 'di.text.sh19', 'type': 'text'}, 'di.text.sh19te0': {'id': 'di.text.sh19te0', 'type': 'text'}, 'di.text.sh19te1': {'id': 'di.text.sh19te1', 'type': 'text'}, 'di.text.sh20': {'id': 'di.text.sh20', 'type': 'text'}, 'di.text.sh21': {'id': 'di.text.sh21', 'type': 'text'}, 'di.text.sh22': {'id': 'di.text.sh22', 'type': 'text'}, 'di.text.sh22te0': {'id': 'di.text.sh22te0', 'type': 'text'}, 'di.text.sh22te1': {'id': 'di.text.sh22te1', 'type': 'text'}, 'di.text.sh23': {'id': 'di.text.sh23', 'type': 'text'}, 'di.text.sh23te0': {'id': 'di.text.sh23te0', 'type': 'text'}, 'di.text.sh24': {'id': 'di.text.sh24', 'type': 'text'}, 'di.text.sh24te0': {'id': 'di.text.sh24te0', 'type': 'text'}, 'di.text.sh25': {'id': 'di.text.sh25', 'type': 'text'}, 'di.text.sh25te0': {'id': 'di.text.sh25te0', 'type': 'text'}, 'di.text.te15': {'id': 'di.text.te15', 'type': 'text'}, 'di.text.te16': {'id': 'di.text.te16', 'type': 'text'}, 'di.text.te17': {'id': 'di.text.te17', 'type': 'text'}, 'di.text.te18': {'id': 'di.text.te18', 'type': 'text'}, 'di.text.te19': {'id': 'di.text.te19', 'type': 'text'}, 'di.text.te20': {'id': 'di.text.te20', 'type': 'text'}, 'di.text.te21': {'id': 'di.text.te21', 'type': 'text'}, 'di.text.te22': {'id': 'di.text.te22', 'type': 'text'}, 'di.text.te23': {'id': 'di.text.te23', 'type': 'text'}, 'di.text.te24': {'id': 'di.text.te24', 'type': 'text'}, 'di.text.te25': {'id': 'di.text.te25', 'type': 'text'}, 'di.text.te26': {'id': 'di.text.te26', 'type': 'text'}, 'di.table.ta26': {'id': 'di.table.ta26', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}\n",
"\n",
"\n",
" --------- < Page Content > --------- \n",
"2025 Seed Program Application \n",
"\n",
"<table><tbody><tr><td rowspan=\"3\">Evaluation of the Basis for the Project (40)</td><td>Potential to lead Korean Studies (20)</td><td>- Assess whether the university has a distinguished reputation in terms of history and academic disciplines.<br>- Evaluate the strength of the network between the Project Director and local researchers.</td></tr><tr><td>Performance capacity (20)<br>Eligibility criteria (10)</td><td>- Determine if the project director possesses the skills and commitment to execute the project (e.g., Korean language proficiency, influence within the institution, management skills).<br>- Review the achievements of collaborative researchers in Korean Studies.<br>- Confirm whether personnel (Beginning/Advanced) or coursework (Advanced) meet eligibility criteria.</td></tr><tr><td>University support (10)</td><td>- Measure the institution's willingness to support Korean Studies (financial, spatial, and human resources, appropriate indirect expense ratio).<br>- Assess the competency of the Central Grant Management Department.</td></tr><tr><td rowspan=\"2\">Evaluation of the Project Content (40)</td><td>Project plans (30)</td><td>- Ensure that the project objectives are realistic and well-defined.<br>- Verify that the plan aligns with local conditions.<br>- Review the suitability of the Project Team’s structure.<br>- Assess whether the budget plan reflects local price levels.</td></tr></tbody></table>\n",
"\n",
" 2 / 3\n",
"\n",
"\n",
" --------- < Metadata > --------- \n",
"{'di.text.he2te0': {'id': 'di.text.he2te0', 'type': 'text'}, 'di.table.ta29': {'id': 'di.table.ta29', 'type': 'table'}, 'di.text.fo3te0': {'id': 'di.text.fo3te0', 'type': 'text'}}\n",
"\n",
"\n"
]
}
],
"source": [
"from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader\n",
"\n",
"loader = PolarisAIDataInsightLoader(\n",
" file_path=\"example_data/polaris_ai_example.docx\",\n",
" resources_dir=\"example_data/tmp\",\n",
" mode=\"page\", # \"element\", \"page\", or \"single\". (default is \"single\")\n",
")\n",
"\n",
"docs = loader.load() # or loader.lazy_load()\n",
"\n",
"for doc in docs[:3]:\n",
" print(\" --------- < Page Content > --------- \")\n",
" print(doc.page_content)\n",
" print(\" --------- < Metadata > --------- \")\n",
" print(doc.metadata)\n",
" print(\"\\n\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "langchain-monorepo-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
19 changes: 19 additions & 0 deletions docs/docs/integrations/providers/polaris_ai_datainsight.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Polaris AI DataInsight

> [Polaris AI DataInsight](https://datainsight.polarisoffice.com/playground) is a document parser
> that extracts document elements (text, images, complex tables, charts, etc.) from various file formats
> into structured JSON, making them easy to integrate into RAG systems.
## Installation and Setup

```bash
pip install langchain-polaris-ai-datainsight
```

## Document Loader

See a [usage example](/docs/docs/integrations/document_loaders/polaris_ai_datainsight).

```
from langchain_polaris_ai_datainsight import PolarisAIDataInsightLoader
```
6 changes: 6 additions & 0 deletions docs/src/theme/FeatureTables.js
Original file line number Diff line number Diff line change
Expand Up @@ -1052,6 +1052,12 @@ const FEATURE_TABLES = {
source: "Various file types (see https://ds4sd.github.io/docling/)",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/docling/"
},
{
name: "PolarisAIDataInsightLoader",
link: "../../integrations/document_loaders/polaris_ai_datainsight",
source: "Various file types (see https://datainsight.polarisoffice.com/documentation?docType=doc_extract)",
apiLink: "https://python.langchain.com/docs/integrations/document_loaders/polaris_ai_datainsight/"
},
]
},
vectorstores: {
Expand Down
6 changes: 5 additions & 1 deletion libs/packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -751,4 +751,8 @@ packages:
path: libs/zeusdb
- name: langchain-scraperapi
path: .
repo: scraperapi/langchain-scraperapi
repo: scraperapi/langchain-scraperapi
- name: langchain-polaris-ai-datainsight
path: .
repo: PolarisOffice/langchain-polaris-ai-datainsight
name_title: Polaris AI Data Insight