Introduction In my last post, I went over some of the highlights of the open data set of all Philadelphia Parking Violations. In this post, I’ll go through the steps to build a model to predict the amount of violations the city issues on a daily basis. I’ll walk you through cleaning and building the data set, selecting and creating the important features, and building predictive models using Random Forests and Linear Regression.
Introduction A few weeks ago, I stumbled across Dylan Purcell’s article on Philadelphia Parking Violations. This is a nice glimpse of the data, but I wanted to get a taste of it myself. I went and downloaded the entire data set of Parking Violations in Philadelphia from the OpenDataPhilly website and came up with a few questions after checking out the data:
How many tickets in the data set?
For those of you who aren’t stirred from bed in the small hours to learn data science, you might have missed that March 5th was international open data day. There are hundreds of local events around the world; I was lucky enough to attend DC’s Open Data Day Hackathon. I met a bunch of great people doing noble things with data who taught me a crap-ton (scientific term) and also validated my love for data science and how much I’ve learned since beginning my journey almost two years ago.
INTRO This is post is a continuation of my last post. There I pulled tweets from Twitter related to “Comcast email,” got rid of the junk, and removed the unnecessary/unwanted data.
Now that I have the tweets, I will further clean the text and subject it to two different analyses: emotion and polarity.
WHY DOES THIS MATTER Before I get started, I thought it might be a good idea to talk about WHY I am doing this (besides the fact that I learned a new skill and want to show it off and get feedback).
INTRO So… This post is my first foray into the R twitteR package. This post assumes that you have that package installed already in R. I show here how to get tweets from Twitter in preparation for doing some sentiment analysis. My next post will be the actual sentiment analysis.
For this example, I am grabbing tweets related to “Comcast email.” My goal of this exercise is to see how people are feeling about the product I support.