Download Latest Version oss-text-extractor-1.1-exec.jar (45.5 MB)
Email in envelope

Get an email when there's a new version of OpenSearchServer Extractor

Home / v1.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2014-11-27 3.0 kB
oss-text-extractor-1.0.0.deb 2014-11-13 39.5 MB
oss-text-extractor-1.0.0-1.noarch.rpm 2014-11-13 39.5 MB
oss-text-extractor-1.0.0-exec.jar 2014-11-13 44.2 MB
Totals: 4 Items   123.2 MB 0

OpenSearchServer Text Extractor

An open source RESTFul Web Service for text extraction and analysis.
oss-text-extractor supports various binary formats.

  • Word processor (doc, docx, odt, rtf)
  • Spreadsheet (xls, xlsx, ods)
  • Presentation (ppt, pptx, odp)
  • Publishing (pdf, pub)
  • Web (rss, html/xhtml)
  • Medias (audio, images)
  • Others (vsd, text)

Quickstart

Requires JAVA

Check that you have installed a JAVA Runtime Environment 7 or newer

Download or compile the JAR:

Download:

The binary archives are available at SourceForge

To follow this quickstart please download oss-text-extractor-1.0-exec.jar

Or clone and compile:

The compilation and packaging requires Maven 3.0 or newer

Clone the source code:

git clone https://github.com/opensearchserver/oss-text-extractor.git

Compile and package (the binary will located in the target directory):

mvn clean package

Usage

Start the server

java -jar target/oss-text-extractor-xxx-exec.jar

Obtain the parser list

curl -XGET http://localhost:9091

Get information about a parser

curl -XGET http://localhost:9091/pdfbox

Submit a document to a parser

By upload a document:

curl -XPUT --data-binary @tutorial.pdf http://localhost:9091/pdfbox

If the file is already available in the server, the follow API is available:

curl -XGET http://localhost:9091/pdfbox?path=/home/user/myfile.pdf

License

Copyright 2014 OpenSearchServer Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Source: README.md, updated 2014-11-27