sentiment classification using doc2vec and LSTM Models

I am building a text classification model based on sentiment analysis, the data contains text and sentiment[Positive, Natural, Negative]
As first step, I clean the data and normalize it, then create doc2vec embedding:

# Convert the data to TaggedDocument format for Doc2Vec
documents = [TaggedDocument(words=text.split(), tags=[label]) for text, label in zip(data["text"], data["sentiment"])]
print(documents)
model = Doc2Vec(vector_size=10, window=2, min_count=1, workers=4, epochs=100)
model.build_vocab(documents)
model.train(documents, total_examples=model.corpus_count, epochs=model.epochs)

then split the data:

X_train = [model.infer_vector(text.split()) for text in data["text"]]
print(X_train)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
label_encoder = LabelEncoder()
y_trainEmbedding = label_encoder.fit_transform(data['sentiment'])
onehot_encoder = OneHotEncoder(sparse=False)
y_trainEmbedding = onehot_encoder.fit_transform(y_trainEmbedding.reshape(-1, 1))

then build LSTM model:

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense

num_classes = len(np.unique(data["sentiment"]))
model_lstm = Sequential()
model_lstm.add(LSTM(64, input_shape=(10, 1)))
model_lstm.add(Dense(32, activation="relu"))
model_lstm.add(Dense(num_classes, activation="softmax"))
model_lstm.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
X_train_lstm = np.array(X_train).reshape(-1, 10, 1)
y_train_lstm = np.array(y_trainEmbedding)
model_lstm.fit(X_train_lstm, y_train_lstm, epochs=100, batch_size=32)

the result is good and the accuracy is 0.99

but when I try to predict the label of new text such as below:

# Use the trained model to predict the sentiment of new texts
text = "هذا البيت جميل "
text=remove_punctuations(text)
text=remove_repeating_char(text)
text=remove_english_char(text)
text=remove_diacritics(text)
text=remove_noise_char(text)
text=tokenizer(text)
text=remove_stop_word(text)
text=stemming(text) 
new_embedding = model.infer_vector(text.split())
print(new_embedding)
new_embedding_lstm = np.array(new_embedding).reshape(-1, 10, 1)
print(new_embedding)

y_pred = model_lstm.predict(new_embedding_lstm)
print(y_pred)

predicted_label = label_encoder.inverse_transform(np.argmax(y_pred))
print(predicted_label)

this error occured:

 18 
---> 19 predicted_label = label_encoder.inverse_transform(np.argmax(y_pred))
     20 print(predicted_label)

1 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py in column_or_1d(y, dtype, warn)
   1200         return _asarray_with_order(xp.reshape(y, -1), order="C", xp=xp)
   1201 
-> 1202     raise ValueError(
   1203         "y should be a 1d array, got an array of shape {} instead.".format(shape)
   1204     )

ValueError: y should be a 1d array, got an array of shape () instead.

is my process correct? and Anyone can help me solve it?

source https://stackoverflow.com/questions/76401941/sentiment-classification-using-doc2vec-and-lstm-models

StacksPedia

Search This Blog

sentiment classification using doc2vec and LSTM Models

Labels

Comments

Post a Comment

Popular posts from this blog

Confusion between commands.Bot and discord.Client | Which one should I use?

How to show number of registered users in Laravel based on usertype?

Why is my reports service not connecting?