Good evening and happy return write again in my blog.
In the previous topic WAYS COLLECTING DATA FROM TWITTER USING R we've covered regarding data collection.
api_key= "your api key"
api_secret= "your api_secret password"
access_token= "your access token"
access_token_secret= "your access token password"
The fundamental question after we perform the collection of data from twitter, What will we do with the data?
tweets= searchTwitteR("ahok", n=1000, lang = "en", since = "2017-02-05", until = "2017-02-08")
nDocs <- length(tweets)
##  1000 16
in the early stages of step after gathering we will do:
Transform tweets data into a data frame :
#transform the tweets into a data frame formatHow we will provide insigth of the data???
df <- do.call("rbind", lapply(tweets, as.data.frame))
In this regard I would like to try wordcloud analysis to see the spread the word and i can try sentiment analysis algorithm used here is based on the NRC Word-Emotion Association Lexicon of Saif Mohammad and Peter Turney. The idea here is that these researchers have built a dictionary/lexicon containing lots of words with associated scores for eight different emotions and two sentiments (positive/negative). Each individual word in the lexicon will have a “yes” (one) or “no” (zero) for the emotions and sentiments, and we can calculate the total sentiment of a sentence by adding up the individual sentiments for each word in the sentence.
at this stage we will cleaning data with package (tm) in R:
before doing the cleaning of the data with the package (tm) we will cleaning some strange text (unicode, etc.)
# Clean text to remove odd charactersdf$text <- sapply(df$text,function(row) iconv(row, "latin1", "ASCII", sub=""))
myCorpus <- Corpus(VectorSource(df$text))
myCorpus <- tm_map(myCorpus, removePunctuation)
myCorpus <- tm_map(myCorpus,content_transformer(tolower), mc.cores=1)
myCorpus <- tm_map(myCorpus, removeWords, stopwords("english")) #removes common prepositions and conjunctions
myCorpus <- tm_map(myCorpus, removeWords, c("example"))
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
myCorpus <- tm_map(myCorpus, removeURL)
myCorpus <- tm_map(myCorpus, stripWhitespace)
corpus_clean <- tm_map(myCorpus, PlainTextDocument) ##this ensures the corpus transformations final output is a PTD
##optional, remove word stems (cleaning, cleaned, cleaner all would become clean):
##wordCorpus <- tm_map(wordCorpus, stemDocument)
To achieve the goal of getting insigth from data,
So, we must form a text data into value in the term of a matrix.
By the way the above command is a vectorisasi text data to be established within the vector matrix.
After that, we can continue create term document matrix:If document matrix already formed then we can analyze wordcloud, by following steps:
#create term document matrix for analysis
myTdm <- TermDocumentMatrix(corpus_clean, control=list(wordLengths=c(1,Inf)))
##or create a word cloud from the corpus_clean data
pal <- brewer.pal(9,"YlGnBu")
pal <- pal[-(1:4)]
wordcloud(words = corpus_clean, scale=c(4,0.5), max.words=50, random.order=FALSE,
rot.per=0.35, use.r.layout=TRUE, colors=pal)
#sentiment analysisand making visualizations about it with ggplot2
mySentiment <- get_nrc_sentiment(df$text)
df <- cbind(df, mySentiment)
sentimentTotals <- data.frame(colSums(df[,c(17:25)])) ##select columns with sentiment data
names(sentimentTotals) <- "count"
sentimentTotals <- cbind("sentiment" = rownames(sentimentTotals), sentimentTotals)
rownames(sentimentTotals) <- NULL ##graph would be messy if these were left when plotting
ggplot(data = sentimentTotals, aes(x = sentiment, y = count)) +
geom_bar(aes(fill = sentiment), stat = "identity") +
theme(legend.position = "none") +
xlab("Sentiment") + ylab("Total Count") + ggtitle("Total Sentiment Score")
and the above results showed sentiment user from 5-8 February 2017 on twitter about the word "ahok".
Seeing as I'm not really wanted to intervene, DKI election then please describe the results.
I think that needs to be the subject of discussion here is about this:
This is a bugfix when the corpus is formed is the unique code from the code location or latin letters or other unique codes can not form into vector using the package (tm).
may be useful and could be shared learning materials.