Menu

Tree [e9ac08] master /
 History

HTTPS access


File Date Author Commit
 src 2015-04-05 Emmanuel Keller Emmanuel Keller [2cb69c] Switch language detection to oss-utils
 .gitignore 2014-11-02 Emmanuel Keller Emmanuel Keller [6bcd8d] Initial skeleton.
 CHANGES.txt 2014-11-23 Emmanuel Keller Emmanuel Keller [343a68] Implements #34
 LICENSE.txt 2014-11-23 Emmanuel Keller Emmanuel Keller [343a68] Implements #34
 NOTICE.txt 2014-12-26 Emmanuel Keller Emmanuel Keller [df39fa] Implements #37, #36, #34
 README.md 2015-03-29 Emmanuel Keller Emmanuel Keller [acc80d] Integration with oss-cluster
 pom.xml 2015-04-05 Emmanuel Keller Emmanuel Keller [2cb69c] Switch language detection to oss-utils

Read Me

OpenSearchServer Extractor

An open source RESTFul Web Service for text extraction and analysis.
oss-extractor supports various binary formats.

  • Word processor (doc, docx, odt, rtf)
  • Spreadsheet (xls, xlsx, ods)
  • Presentation (ppt, pptx, odp)
  • Publishing (pdf, pub)
  • Web (rss, html/xhtml)
  • Medias (audio, images)
  • Others (vsd, text, markdown)

Quickstart

Requires JAVA

Check that you have installed a JAVA Runtime Environment 7 or newer

Download or compile the JAR:

Download:

The binary archives are available at SourceForge

To follow this quickstart please download oss-extractor-1.1-exec.jar

Or clone and compile:

The compilation and packaging requires Maven 3.0 or newer

Clone the source code:

git clone https://github.com/opensearchserver/oss-extractor.git

Compile and package (the binary will located in the target directory):

mvn clean package

Usage

Start the server

java -jar target/oss-extractor-xxx-exec.jar

Obtain the parser list

curl -XGET http://localhost:9091

Get information about a parser

curl -XGET http://localhost:9091/pdfbox

Submit a document to a parser

By uploading a document:

curl -XPUT --data-binary @tutorial.pdf http://localhost:9091/pdfbox

If the file is already available in the server, the follow API can be used:

curl -XGET http://localhost:9091/pdfbox?path=/home/user/myfile.pdf

Issues and change Log

Issues and milestones are tracked on GitHub:

License

Copyright 2014-2015 OpenSearchServer Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.