getting started

Getting Started with Data Science

updated: November 2020 Everyone in the world has a “how to” guide to data science… well, maybe not everyone - but there are a lot of “guides” out there. I get this question infrequently, so I thought I would do my best to put together what have been my best resources for learning. MY STORY Personally, I learned statistics by getting my Masters in Applied Statistics at Villanova University - it took 2.5 years. I got my introduction to R by working through the Johns Hopkins University Data Science Specialization on Coursera. Similarly for python, I got an online introduction via DataCamp. This was all bolstered by working with these tools at work and in side projects. The repetition of working with these tools every day has made it more fluent. Here are some resources that I’ve used or know of - I’ve tried to outline them and group them to the best of my ability. There’s many more out there, and you may find some better or worse depending on your style. LEARNING DATA Johns Hopkins University Data Science Specialization on Coursera: As mentioned above this course gave me my start with R, RStudio, and git. Kaggle: If you are as competitive as I am, this site should get you going - the interactive kernals and social aspects of this site make it a great place to see other data science in action. Plagiarism is greatest form of flattery (and easiest way to learn - thanks, Stack Overflow). EdX - R Programming: I haven’t used EdX much, but there is a wealth of MOOCs here. LEARNING STATISTICS & OTHER IMPORTANT MATH Khahn Academy - Statistics: I have used Khahn Academy on multiple occasions for refreshers in Statistics and Linear Algebra. The classes are interactive, manageable, and self-paced. Khahn Academy - Linear Algebra Coursera - Statistics with R EdX - Data Analytics & Statistics courses Of course - higher education, as well. DATA BOOKS Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy - Cathy O’Neil: Cathy O’Neil does a great job of outlining how data algorithms can have unintended negative consequences. Anyone who builds an machine learning algorithm should read. The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures - Dona M. Wong: I have this book on my desk as a reference. Quick read filled with easy to understand rules and objectives for creating data visualizations. Analyzing data is hard - this book teaches tips to build clear and informative visualizations that don’t take away from the message. The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t - Nate Silver: Nate Silver is [in]famous for predicting elections. This book gets into the details of how he does that. Super interesting for a guy increasingly interested in politics. How Not to Be Wrong: The Power of Mathematical Thinking - Jordan Ellenberg: Critical thinking is crucial in data science and analytics. This book gives some great tips on how to approach “facts” with the right mindset. Thinking, Fast and Slow - Daniel Kahneman: Currently on my list to read. PODCASTS Hidden Brain: NPR podcast covering many topics. I find it super interesting. While not distinctly data related, it frequently covers topics that have tangential importance to being a good data scientist. Exponential View: Not primarily focused on data, but is very frequently covering artificial intelligence and machine learning topics. I recommend the newsletter that goes along with this podcast (link below). Not So Standard Deviations: Richard Peng and Hilary Parker host a podcast on all things data science. The Data Lab Podcast: Local [to Philly] data podcast interviewing local data scientists. I find it reassuring to hear that my habits are often in line with these peoples, plus I’ve picked up many really great tidbits (like the Exponential View newsletter). O’Reilly Data Show: I have attended the Strata data conference by O’Reilly. Much like the conference, this podcast covers many relevant data themes. Data Skeptic: Another data podcast that covers many good data topics. BLOGS & NEWSLETTERS Exponential View: Billed as a weekly “wondermissive”, the author Azeem Azhar covers many topics relevant to data and the greater technology economy. I truly look forward to getting this newsletter every Sunday morning. Farnam Street: A weekly newsletter (and blog) about decision making. I frequently find golden tips on how to think and frame thinking. Must read. Twitter: I follow many great data people on twitter and get a great deal of my data news there.