Skip to content

encodedcipher/Sentiment-Analysis_v1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#The tl;dr Project
www.tldrproject.com

##Sentiment Analysis for PHP

PHP sentiment analysis using bayesian opinion mining. This is my first time creating one of these (README files), sorry if it comes off as n00b-ish....


AUTHORS

Colin Poindexter (cpopensource [at] gmail.com) for The tldr Project (www.tldrproject.com)

Ian Barber (where the real heavy lifting for this came). Check out his blog here: http://www.phpir.com/

Chuck Testa http://www.youtube.com/watch?v=mbUVtfUWwF8


DATA CITATION

This data was first used in Bo Pang and Lillian Lee, "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.", Proceedings of the ACL, 2005.

@InProceedings{Pang+Lee:05a, author = {Bo Pang and Lillian Lee}, title = {Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales}, booktitle = {Proceedings of the ACL}, year = 2005 }


DATA INFO

positivedata.txt contains 5331 positive snippets

negativedata.txt contains 5331 negative snippets

Each line in these two files corresponds to a single snippet (usually containing roughly one single sentence); all snippets are down-cased.
The snippets were labeled automatically, as described below (see section "Label Decision").


LABEL DECISION

We assumed snippets (from Rotten Tomatoes webpages) for reviews marked with "fresh" are positive, and those for reviews marked with "rotten" are negative.

To make sure that the polarity of the data is real and not happenstance, we calculate the raw number of sentances for each bias, this creates the inital bias (either positive or negative, whichever has more).

Then we find out how equal these biases are. For every one incorrect bias sentance, there must be at least 2.1 bias sentances as stated by the aforementioned intial bias. If this threshold (.47) is not met, then the entire body is reclassified as being inconclusive (i.e. a story with 87 positive sentances and 88 negative sentances is not nessesarily negative, it just has a very slight slant). This threshold comes from our own human analysis from the data generated by tldrBot (our spider) and tweaking.


CHANGELOG
v1 - Inception


INSTALL
You must have PHP 5.2.0 or newer running on your server.


LICENSE
Copyright 2011 The tldr Project c/o ipsumedia Limited.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


BUGS
Has some problems with nuances, otherwise, none that I know of now. If you find any, don't hesitate to tell me: cpopensource [at] gmail [dot] com

About

Sentiment Analysis for The tl;dr Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 100.0%