0% found this document useful (0 votes)
5 views

Adobe Scan 08 Jan 2025

The document outlines the process of designing and implementing a Convolutional Neural Network (CNN) for sentiment analysis on textual data, including steps for library installation, dataset preparation, text preprocessing, and model building. It details the architecture of the CNN, training procedures, and evaluation metrics to assess model performance. The results indicate that the CNN effectively captures patterns in text, achieving good accuracy in classifying sentiments as positive or negative.

Uploaded by

akshayareddy102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Adobe Scan 08 Jan 2025

The document outlines the process of designing and implementing a Convolutional Neural Network (CNN) for sentiment analysis on textual data, including steps for library installation, dataset preparation, text preprocessing, and model building. It details the architecture of the CNN, training procedures, and evaluation metrics to assess model performance. The results indicate that the CNN effectively captures patterns in text, achieving good accuracy in classifying sentiments as positive or negative.

Uploaded by

akshayareddy102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Ex.

no:05

CNN FOR SENTIMENT ANALYSIS

Aim:

To design and implement a Convolutional Neural Network (CNN) for sentiment analysis on textual data
and evaluate its performance in predicting the sentiment (positive or negative) of text.

Procedure:

1. Instal Required Libraries:


o Ensure that required libraries such as Keras, TensorFlow, or PyTorch for deep learning, and
libraries for natural language processing (NLP) like nltk or spacy, are installed.
o Use the following commands to installthem:
pip instal keras
. pip install tensorflow
" pip install nltk

2. Load and Prepare the Dataset:


o Use a well-known sentiment analysis dataset such as IMDb movie reviews or any custom dataset
containing text labeled as positive or negative.
o Example (using IMDb dataset in Keras):

from keras.datasets import imdb


(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=10000)

3. Preprocess the Text Data:


o Pad the sequences to ensure that all input text data has the same length (for CNNs, the input
should be a fixed size).
o Tokenize the words and convert them into integer sequences based on word frequency or
embedding.

from keras.preprocessing.sequence import pad_ sequences

max_length = 100 # maximum length of a review


X_train = pad_sequences(X_train, maxlen=max_length)
X_test = pad_sequences(X_test, maxlen=max_length)

4. Embed the Text:


o Use word embeddings to convert the input text into dense vectors of fixed size. This is important
for CNNs to capture semantic meaning from the text.
You can use pre-trained embeddings like GloVe or train an embedding layer directiy on the
dataset.

from keras. layers import Embedding

embedding_layer = Embedding(input_dim=10000, output_dim=128, input_length=max_length)

5. Build the CNN Model:


o Define a CNN architecture for text classification, which includes an embedding layer,
convolutional layers for feature extraction, and dense layers for classification.
o Convolutional layers help capture local dependencies in the text, and max pooling layers reduce
dimensionality.

from keras. models import Sequential


from keras.layers import ConvID, MaxPoolingl D, Flatten, Dense

model = Sequential()
model.add(embedding_layer)
model.add(ConvID(128, 5, activation='relu))
model.add(MaxPooling lD(pool_size=2)
model.add(Flatten())
model. add(Dense( 128, activation=relu')
model. add(Dense(1, activation='sigmoid))

6. Compile the Model:


o Define the optimizer, loss function, and metrics for the model. For binary sentiment analysis
(positive vs. negative), use binary crossentropy as the loss function.

model comple(optimizer='adam', los=binary_crossentropy, metrics=['accuracy')


7. Train the Model:
o Train the CNN on the training data using the fit method. Set the number of epochs, batch size,
and validation data.

23
model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test))

8. Evaluate the Model:


in classifying
o After training, evaluate the model on the test dataset to measure its performance
sentiments.
o Use accuracy as the primary metric for binary classification.

test_loss, test_acc=model.evaluate(X_test, y_test)


print(Test accuracy:, test_acc)

9. Analyze the Results:


o Optionally, visualize the training and validation accuracy and loss over epochs to detect any
overfitting or underfitting.
You can also review misclassified examples to understand where the model struggles

Program:

import pandas as pd
data = pd.read_csv('imdb_labelled.tsv',he ader =
None,
delimiter=\t)
data.columns = (Text', 'Label']df.head()

data.shape data.Label. value_counts()


import redef remove_punct(text):
text_nopunct = "
text_nopunct =re.sub([+string.punctuation+,", text)
return text_nopunctdatalText_Clean'] = data[Text').apply(lambda x:
remove_punct(x))
from nltk import word_tokenizetokens = [word_tokenize(sen) for sen in
data.Text_Clean]
def lower_token(tokens):

return [w.lower() for w in tokens]

lower_tokens = [lower_token(token) for token in tokens]


24
from ntk.copus import stopwordsstoplist =stopwords.,words('english")defremo veStopWords(tokens):
return (word for word in tokens if word not in stoplist}filtered_words = [removeStopWords(sen) for
sen in lower_tokens]data(Text _Final']=l"join(sen)for sen in filtered_words)
data[ tokens'] = filtered_words

pos = [)
neg = |
for l in data. Label:
ifl == 0:

pos.append(0)
neg.append( 1)
elif | == l:
pos.append(1)
neg.append(0)data[ Pos- pos
data('Neg'l= neg
data =datal[Text_Final', tokens, Label', Pos, Negl]data. head()
data_train, data_test = train_test _split(data,test_size=0.10,
random_state=42)

all_training_words = [word for tokens in data_train["tokens"] for word in tokens]

training_sentence_lengths = [len(tokens) for tokens in data_trainl"tokens"]TRAINING_VOCAB =


sorted(list(set(all_training_words)
print("%s words total, with a vocabulary size of %s" % (len(all_training_words),
len(TRAINING_VOCAB)
print("Max sentence length is %s" % max(training_sentence_lengths))

all_test_words =[word for tokens in data test["tokens"] for word in tokens]


test_sentence_lengths =[len(tokens) for tokens in data_test["tokens"]]TEST_VOCAB=
sorted(list(set(all_test_words))
print("%s words total, with a vocabulary size of %s" % (len(all_test_words),
len(TEST_VOCAB))

33/62
print("Max sentence length is %s" % max(test_sentence_lengths))

word2vec_path = GoogleNews-vectors-negative300.bin.gz'
word2vec = models. Keyed Vectors.load_word2vec_format(word2vec_path, binary-True)
tokenizer = Tokenizer(num_words=len(TRAINING_VOCAB), lower=True, char_level=False)
tokenizer.fit on_texts(data_train["Text_Final"],tolist()
training_sequences =
tokenizer.texts to_sequences(data_train["Text _Final"], tolist())train_word index =
tokenizer.word_index
print('Found %s unique tokens. % len(train_word_index))train_cnn_data =
pad_sequences(training_sequences,
maxlen=MAX_SEQUENCE_LENGTH)
train_embedding_weights = np.zeros(len(train_word_index)+1,

EMBEDDING_DIM) for word,index in train_word_index.items(): train_embedding_weights[index,:] =


word2vec[word] if word in word2vec else
np.random.rand(EMBEDDING_DIM)print(train_embedding_weights.shape)
def ConvNet(embeddings, max_sequence_length, num_words, embedding_dim,
labels_index):
embedding_layer = Embedding(num_words,
embedding_dim,
weights=[embeddings],
input_length=max_sequence_length,
trainable=False)

sequence_input = Input(shape=(max_sequence_length,), dtype='int32')


embedded _sequences = embedding _layer(sequence_input)
convs = ]filter_sizes = [2,3,4,5,6]
for filter_size in filter_sizes: 1_conv = Convl D(filters=200,
kernel_size=filter_size,
activation=relu)(embedded_sequences)l_pool
= GlobalMaxPooling lD0(L_conv)
convs.append(l_pool) Lmerge=concatenate(convs, axis=1) X=

Dropout(0.1)(1_merge)
X= Dense(128, activation='relu )(x)x =
Dropout(0.2X×)
preds = Dense(labels_index, activation='sigmoid')(x) model =

Model(sequence_input, preds) model.compile(loss= binary_crossentropy.


optimizer='adam,
metrics=['acc'|)
model.summary()
return model

model = ConvNet(train_embedding_weights,

MAX_SEQUENCE_LENGTH,
len(train_word_index)+1,
EMBEDDING_DIM,
len(list(label_names))
num_epochs = 3

batch_size = 32
hist = model.fit(x_train,y_tr,
epochs=num_epochs,
validationsplit=0.1,
shuffle=True,
batch_size=batch_size)
predictions = model.predict(test_cnn_data,
batch_size=1024,
verbose=l) labels
=[1,0]
prediction_labels=) for p
in predictions:
prediction_ abels.append(labels[np. argmax(p))sumdata_test.Label-prediction_label
s/len(prediction_labels)
OUTPUT:

Output Shape Param Connected to


Layer (type)
input_2 (InputLayer) (NOne, 50)
(NOne, 5e, 30e) S646e0 inout 2[ejtej
enbedding_ 2 (Embedding)
conv1d 6 (ConviD) (None, 49, 200) 120200 embedding_2[e][e]
convid_7 (Conv10) (NOne. 48, 200) 180200 enbedding_2(0[0)
conv1d8 (Conv1D) (NOne, 47, 28ee) 24e200 enbeddíng_2[eJ[e)
Conv1d9 (ConvV1D) (NOne, 46, 200e) 300200 embedding_2[e][el
Conv1d 1e (Cony1D) (NOne, 45, 200) 36e2ee embedding_2[eJ[e
global nax pooling1d_6 (GlobalM (None, 20e) conv1d_6[ejtei

plobal max pooling1d_7 (Globalu (None, 208) Conv1d_7[elte


Rlobal nax pooling1d_s (GlobalM (None, 208) Convid S[eJ[e)
global max pooling1d_9 (GlobalM (NOne, 200) Conv1d_9[eJ[e]
Rlobal nax poolingid 1e (Global (NOne, 20e) conv1d 1e[ej[e]
Concatenate 2 (Concatenate) (None, 1000) Flobal max poolingid_6[ejt(e]
global max poolingid_7[ejfej
global nax poolingid_8[ejte)
global_ max poolingid 9fejtej
lobal_ max_pooling1d_1e[e][e]
dropout 3 (Dropout) (NOne, 1000) concatenate_2[eJ[ej
dense 3 (Dense) (None, 128) 128128 dropout_3[ejte)
dropout4 (Dropout) (NOne, 128) dense_3[ej[e]
dense 4 (Dense) (None, 2) 258 dropout 4[eje)
Total params 286
Crainable params: S64,690
#on-trai

Result:

35/62
The CNN model was successfully implemented for sentiment analysis on textual data. olut1

layers, the model was able to capture important patterns in the text and classify the sentiment as positive o
negative. The experiment demonstrated that CNNs, which are often used in image processing, can also
effectively handle NLP tasks like sentiment analysis by learning local dependencies in text. The model
performed well, achieving good accuracy in predicting the sentiment of the reviews.

You might also like