WhatsApp sentiment analysis

Hi there!

Hope you had fun with your Twitter sentiment analysis last week. Today we will discuss sentiment analysis on WhatsApp data.

First of all, we need to get the WhatsApp chat archive.

Get your chat history using ’email chat history’ facility offered by WhatsApp (follow this link if you are finding it difficult to get chat history).

#load required libraries


#Read from chat history file
texts <- readLines("w.txt")

#load libraries to create wordcloud

docs <- Corpus(VectorSource(text))
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, removeWords, c("sharath","gunaje"))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, stripWhitespace)
docs <- tm_map(docs, stemDocument)
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))

#wordcloud of words used in chat has been created, I can not share my wordclud for obvious reasons :P

#sentiment analysis
#we use all packages that are used for twitter sentiment analysis
library(syuzhet) #this library contain sentiment dictionary
library(lubridate) #provides tools that make it easier to parse and manipulate dates
library(dplyr ) #dplyr provides a flexible grammar of data manipulation

#fetch sentiment words from tweets
mySentiment <- get_nrc_sentiment(texts)
text <- cbind(texts, mySentiment)

#count the sentiment words by category
sentimentTotals <- data.frame(colSums(text[,c(2:11)]))
names(sentimentTotals) <- "count"
sentimentTotals <- cbind("sentiment" = rownames(sentimentTotals), sentimentTotals)
rownames(sentimentTotals) <- NULL

#total sentiment score of all texts
ggplot(data = sentimentTotals, aes(x = sentiment, y = count)) +
geom_bar(aes(fill = sentiment), stat = "identity") +
theme(legend.position = "none") +
xlab("Sentiment") + ylab("Total Count") + ggtitle("Total Sentiment Score for All Texts with XYZ")

#here is my output


You can use this code if you are clueless where your chat is leading to!, Just kidding 😛

We will be discussing about the Amazon review analysis in the next post.

Thanks for visiting my blog. I always love to hear constructive feedback. Please give your feedback in the comment section below or write to me personally here.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s