ASHES - Desire for Domination
"England have only three major problems. They can't bat, they can't bowl and they can't field." - Martin Jonson (England's tour of Australia 1986-7)
With the ongoing Ashes series gathering steam, I decided to marry analytics with data to gather some interesting insights. For this, I have picked up data in the form of tweets from Twitter with hashtags #ashes. The data processing has been done using a statistical software called R.
A wordcloud or a tag cloud highlights the frequency of occurrence of words in a text document using very intuitive and easy visualization techniques. The larger the text size ; the greater the frequency. Also, words with same color and size have the same occurrence rate.
Technical Details & Code
I have broken down the overall process into numerous steps for ease of reading.
- The text mining program makes use of 4 important R packages namely RoAuth, twitteR,tm,wordcloud and RJSonio. Install the requisite packages and get authorized to access content from Twitter.
- The authorization process gets completed when the program asks you to enter the 'token'.
- Next pull the tweets with specified hashtag by setting the no. of tweets that you want.
- The data now needs to be cleaned, post which the frequency and maximum word limits can be set to plot the wordcloud.
Please note: Though the code specifies 1500 tweets, only 799 were returned by the twitter API.
#Installing Packages install.packages("ROAuth")
install.packages("twitteR")
install.packages("RJSONIO")
install.packages("tm")
install.packages("wordcloud")
install.packages("knitr") #Loading Packages
library("knitr") library("ROAuth")
library("twitteR")
library("RJSONIO")
library("tm") library("wordcloud")
load("twitter_auth.Rdata") #Registering on twitter API reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "http://api.twitter.com/oauth/access_token" authURL <- "http://api.twitter.com/oauth/authorize" #Important step for Windows users download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem") #Follow the link:https://twitter.com/apps/new to get your consumer key and secret. consumerKey <- "Enter your Consumer Key" consumerSecret <- "Enter your consumer secret key" Cred <- OAuthFactory$new(consumerKey = consumerKey,consumerSecret
= consumerSecret, requestURL = reqURL,accessURL = accessURL, authURL = authURL) Cred$handshake(cainfo = "cacert.pem")
#When com
ple
te, record the PIN given to you and provide it on the console
'
save(Cred, file = "twitter_auth.Rdata") registerTwitterOAuth(Cred)
#Extracting tweets
Ashes <- searchTwitter('#ashes', n = 1500,lang = 'en', cainfo
= "cacert.pem")
Ashes <- sapply(Ashes, function(x) x$getText()) #Create a corpus
Ashes_corpus <- Corpus(VectorSource(Ashes))
#Cleaning of dataAshes_corpus <- tm_map(Ashes_corpus, tolower)
Ashes_corpus <-
tm_map(Ashes_corpus, removePunctuation)
Ashes_corpus <-
tm_map(Ashes_corpus, function(x) removeWords(x, stopwords()))
#Selecting color palettes for wordcloud library(RColorBrewer) pal2 <- brewer.pal(8,"Pastel2") wordcloud(Ashes_corpus, scale = c(4,1),min.freq=5,random.order
= T, random.color = T,colors = pal2)
Acknowledgements
The following resources have been used for this post.
1. Tweetsent
2. Mining Twiiter with R
3. One R tip a day
No comments:
Post a Comment