# Older Posts

### Prime Number Patterns

I found a very thought provoking and beautiful visualization on the D3 Website regarding prime numbers. What the visualization shows is that if you draw periodic curves beginning at the origin for each positive integer, the prime numbers will be intersected by only two curves: the prime itself&rsquo;s curve and the curve for one. When I saw this, my mind was blown. How interesting&hellip; and also how obvious. The definition of a prime is that it can only be divided by itself and one (duh). This is a visualization of that fact. The patterns that emerge are stunning. I wanted to build the data and visualization for myself in R. While not as spectacular as the original I found, it was still a nice adventure. I used Plotly to visualize the data. The code can be found on github. Here is the visualization:

### Doing a Sentiment Analysis on Tweets (Part 1)

INTRO So… This post is my first foray into the R twitteR package. This post assumes that you have that package installed already in R. I show here how to get tweets from Twitter in preparation for doing some sentiment analysis. My next post will be the actual sentiment analysis. For this example, I am grabbing tweets related to “Comcast email.” My goal of this exercise is to see how people are feeling about the product I support. STEP 1: GETTING AUTHENTICATED TO TWITTER First, you’ll need to create an application at Twitter. I used this blog post to get rolling with that. This post does a good job walking you through the steps to do that. Once you have your app created, this is the code I used to create and save my authentication credentials. Once you’ve done this once, you need only load your credentials in the future to authenticate with Twitter. library(twitteR) ## R package that does some of the Twitter API heavy lifting consumerKey &lt;- &quot;INSERT YOUR KEY HERE&quot; consumerSecret &lt;- &quot;INSERT YOUR SECRET HERE&quot; reqURL &lt;- &quot;https://api.twitter.com/oauth/request_token &quot; accessURL &lt;- &quot;https://api.twitter.com/oauth/access_token &quot; authURL &lt;- &quot;https://api.twitter.com/oauth/authorize &quot; twitCred &lt;- OAuthFactory$new(consumerKey = consumerKey, consumerSecret = consumerSecret, requestURL = reqURL, accessURL = accessURL, authURL = authURL) twitCred$handshake() save(cred, file=&quot;credentials.RData&quot;) STEP 2: GETTING THE TWEETS Once you have your authentication credentials set, you can use them to grab tweets from Twitter. The next snippets of code come from my scraping_twitter.R script, which you are welcome to see in it’s entirety on GitHub. ##Authentication load(&quot;credentials.RData&quot;) ##has my secret keys and shiz registerTwitterOAuth(twitCred) ##logs me in ##Get the tweets about &quot;comcast email&quot; to work with tweetList &lt;- searchTwitter(&quot;comcast email&quot;, n = 1000) tweetList &lt;- twListToDF(tweetList) ##converts that data we got into a data frame As you can see, I used the twitteR R Package to authenticate and search Twitter. After getting the tweets, I converted the results to a Data Frame to make it easier to analyze the results. STEP 3: GETTING RID OF THE JUNK Many of the tweets returned by my initial search are totally unrelated to Comcast Email. An example of this would be: “I am selling something random… please email me at myemailaddress@comcast.net” The tweet above includes the words email and comcast, but has nothing to actually do with Comcast Email and the way the user feels about it, other than they use it for their business. So… based on some initial, manual, analysis of the tweets, I’ve decided to pull those tweets with the phrases: “fix” AND “email” in them (in that order) “Comcast” AND “email” in them in that order “no email” in them Any tweet that comes from a source with “comcast” in the handle “Customer Service” AND “email” OR the reverse (“email” AND “Customer Service”) in them This is done with this code: ##finds the rows that have the phrase &quot;fix ... email&quot; in them fixemail &lt;- grep(&quot;(fix.*email)&quot;, tweetList$text) ##finds the rows that have the phrase &quot;comcast ... email&quot; in them comcastemail &lt;- grep(&quot;[Cc]omcast.*email&quot;, tweetList$text) ##finds the rows that have the phrase &quot;no email&quot; in them noemail &lt;- grep(&quot;no email&quot;, tweetList$text) ##finds the rows that originated from a Comcast twitter handle comcasttweet &lt;- grep(&quot;[Cc]omcast&quot;, tweetList$screenName) ##finds the rows related to email and customer service custserv &lt;- grep(&quot;[Cc]ustomer [Ss]ervice.*email|email.*[Cc]ustomer [Ss]ervice&quot;, tweetList$text) After pulling out the duplicates (some tweets may fall into multiple scenarios from above) and ensuring they are in order (as returned initially), I assign the relevant tweets to a new variable with only some of the returned columns. The returned columns are: text favorited favoriteCount replyToSN created truncated replyToSID id replyToUID statusSource screenName retweetCount isRetweet retweeted longitude latitude All I care about are: text created statusSource screenName This is handled through this tidbit of code: ##combine all of the &quot;good&quot; tweets row numbers that we greped out above and ##then sorts them and makes sure they are unique combined &lt;- c(fixemail, comcastemail, noemail, comcasttweet, custserv) uvals &lt;- unique(combined) sorted &lt;- sort(uvals) ##pull the row numbers that we want, and with the columns that are important to ##us (tweet text, time of tweet, source, and username) paredTweetList &lt;- tweetList[sorted, c(1, 5, 10, 11)] STEP 4: CLEAN UP THE DATA AND RETURN THE RESULTS Lastly, for this first script, I make the sources look nice, add titles, and return the final list (only a sample set of tweets shown): ##make the device source look nicer paredTweetList$statusSource &lt;- sub(&quot;&lt;.*\&quot;&gt;&quot;, &quot;&quot;, paredTweetList$statusSource) paredTweetList$statusSource &lt;- sub(&quot;&lt;/a&gt;&quot;, &quot;&quot;, paredTweetList\$statusSource) ##name the columns names(paredTweetList) &lt;- c(&quot;Tweet&quot;, &quot;Created&quot;, &quot;Source&quot;, &quot;ScreenName&quot;) paredTweetList Tweet created statusSource screenName Dear Mark I am having problems login into my acct REDACTED@comcast.net I get no email w codes to reset my password for eddygil HELP HELP 2014-12-23 15:44:27 Twitter Web Client riocauto @msnbc @nbc @comcast pay @thereval who incites the murder of police officers. Time to send them a message of BOYCOTT! Tweet/email them NOW 2014-12-23 14:52:50 Twitter Web Client Monty_H_Mathis Comcast, I have no email. This is bad for my small business. Their response “Oh, I’m sorry for that”. Problem not resolved. #comcast 2014-12-23 09:20:14 Twitter Web Client mathercesul CHALLENGES OBSERVED As you can see from the output, sometimes some “junk” still gets in. Something I’d like to continue working on is a more reliable algorithm for identifying appropriate tweets. I also am worried that my choice of subjects is biasing the sentiment.