GitHub - encodedcipher/Sentiment-Analysis_v1: Sentiment Analysis for The tl;dr Project

#The tl;dr Project
www.tldrproject.com

##Sentiment Analysis for PHP

PHP sentiment analysis using bayesian opinion mining. This is my first time creating one of these (README files), sorry if it comes off as n00b-ish....

AUTHORS

Colin Poindexter (cpopensource [at] gmail.com) for The tldr Project (www.tldrproject.com)

Ian Barber (where the real heavy lifting for this came). Check out his blog here: http://www.phpir.com/

Chuck Testa http://www.youtube.com/watch?v=mbUVtfUWwF8

DATA CITATION

This data was first used in Bo Pang and Lillian Lee, "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.", Proceedings of the ACL, 2005.

@InProceedings{Pang+Lee:05a, author = {Bo Pang and Lillian Lee}, title = {Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales}, booktitle = {Proceedings of the ACL}, year = 2005 }

DATA INFO

positivedata.txt contains 5331 positive snippets

negativedata.txt contains 5331 negative snippets

Each line in these two files corresponds to a single snippet (usually containing roughly one single sentence); all snippets are down-cased.
The snippets were labeled automatically, as described below (see section "Label Decision").

LABEL DECISION

We assumed snippets (from Rotten Tomatoes webpages) for reviews marked with "fresh" are positive, and those for reviews marked with "rotten" are negative.

To make sure that the polarity of the data is real and not happenstance, we calculate the raw number of sentances for each bias, this creates the inital bias (either positive or negative, whichever has more).

Then we find out how equal these biases are. For every one incorrect bias sentance, there must be at least 2.1 bias sentances as stated by the aforementioned intial bias. If this threshold (.47) is not met, then the entire body is reclassified as being inconclusive (i.e. a story with 87 positive sentances and 88 negative sentances is not nessesarily negative, it just has a very slight slant). This threshold comes from our own human analysis from the data generated by tldrBot (our spider) and tweaking.

CHANGELOG
v1 - Inception

INSTALL
You must have PHP 5.2.0 or newer running on your server.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

BUGS
Has some problems with nuances, otherwise, none that I know of now. If you find any, don't hesitate to tell me: cpopensource [at] gmail [dot] com

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Opinion.class.php		Opinion.class.php
README.md		README.md
classify.php		classify.php
negativedata.txt		negativedata.txt
positivedata.txt		positivedata.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

encodedcipher/Sentiment-Analysis_v1

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages