Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2014-11-27 | 3.0 kB | |
oss-text-extractor-1.0.0.deb | 2014-11-13 | 39.5 MB | |
oss-text-extractor-1.0.0-1.noarch.rpm | 2014-11-13 | 39.5 MB | |
oss-text-extractor-1.0.0-exec.jar | 2014-11-13 | 44.2 MB | |
Totals: 4 Items | 123.2 MB | 0 |
OpenSearchServer Text Extractor
An open source RESTFul Web Service for text extraction and analysis.
oss-text-extractor supports various binary formats.
- Word processor (doc, docx, odt, rtf)
- Spreadsheet (xls, xlsx, ods)
- Presentation (ppt, pptx, odp)
- Publishing (pdf, pub)
- Web (rss, html/xhtml)
- Medias (audio, images)
- Others (vsd, text)
Links
- Home page
- Installation
- Usage
- Extractor list in alphabetical order
- Source code
- Compile and build
- How to contribute
Quickstart
Requires JAVA
Check that you have installed a JAVA Runtime Environment 7 or newer
Download or compile the JAR:
Download:
The binary archives are available at SourceForge
To follow this quickstart please download oss-text-extractor-1.0-exec.jar
Or clone and compile:
The compilation and packaging requires Maven 3.0 or newer
Clone the source code:
git clone https://github.com/opensearchserver/oss-text-extractor.git
Compile and package (the binary will located in the target directory):
mvn clean package
Usage
Start the server
java -jar target/oss-text-extractor-xxx-exec.jar
Obtain the parser list
curl -XGET http://localhost:9091
Get information about a parser
curl -XGET http://localhost:9091/pdfbox
Submit a document to a parser
By upload a document:
curl -XPUT --data-binary @tutorial.pdf http://localhost:9091/pdfbox
If the file is already available in the server, the follow API is available:
curl -XGET http://localhost:9091/pdfbox?path=/home/user/myfile.pdf
License
Copyright 2014 OpenSearchServer Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.