We aim to extract structured information from a web resource:
url --> meaningfulweb engine --> structured information
- meaningfulweb-opengraph.jar <- open graph parser
- meaningfulweb-core.jar <-- core engine
- meaningfulweb-app.war <-- web application
Build and release are managed via Maven: http://maven.apache.org/
- run the script: bin/mvn-install.sh to install .jar files in jars/ to local maven repo
- build opengraph: under meaningfulweb-opengraph/, do: mvn install
- build core: under meaningfulweb-core/, do: mvn install
- start webapp: under meaningfulweb-app/, do: mvn jetty:run
application should be running at: http://localhost:8080/
the rest service should be running at: http://localhost:8080/get-meaning?url=xxx
Example:
http://localhost:8080/get-meaning?url=http://www.google.com
// extract the best image representing an url
String url = "http://www.google.com"
MetaContentExtractor extractor = new MetaContentExtractor();
MeaningfulWebObject obj = extractor.extractFromUrl(url);
String bestImageURL = obj.getImage();
String title = obj.getTitle();
String description = obj.getDescription();
String domain = obj.getDomain();
...
File bugs on our jira system at: http://snaprojects.jira.com/browse/MWEB
Wiki Home: http://snaprojects.jira.com/wiki/display/MWEB/Home