Getting Started with Data Science

updated: February 2021

Everyone in the world has a “how to” guide to data science… well, maybe not everyone - but there are a lot of “guides” out there. I get this question infrequently, so I thought I would do my best to put together what have been my best resources for learning.

MY STORY

Personally, I learned statistics by getting my Masters in Applied Statistics at Villanova University - it took 2.5 years. I got my introduction to R by working through the Johns Hopkins University Data Science Specialization on Coursera. Similarly for python, I got an online introduction. While the course I took I’d no longer recommend, there are many out there (like this one from EdX).

This was all bolstered by working with these tools at work and in side projects. The repetition of working with these tools every day has made it more fluent.

Here are some resources that I’ve used or know of - I’ve tried to outline them and group them to the best of my ability. There’s many more out there, and you may find some better or worse depending on your style.

LEARNING DATA PROGRAMMING

  • Johns Hopkins University Data Science Specialization on Coursera : As mentioned above this course gave me my start with R, RStudio, and git.
  • Kaggle: If you are as competitive as I am, this site should get you going - the interactive kernals and social aspects of this site make it a great place to see other data science in action. Plagiarism is greatest form of flattery (and easiest way to learn - thanks, Stack Overflow).
  • EdX - R Programming: I haven’t used EdX much, but there is a wealth of MOOCs here.
  • EdX - Python Programming This course is from UCSD and will teach you about Python, Jupyter notebooks, & data viz.

LEARNING STATISTICS & OTHER IMPORTANT MATH

BOOKS ON ALL FACETS OF DATA SCIENCE

ETHICS & ALGORITHM BOOKS

DATA VIZ BOOKS

THINKING BOOKS

BOOKS ABOUT MATH AND STATISTICS

Here is a list of fun books about math and statistics that I’ve enjoyed:

PODCASTS

  • Hidden Brain: NPR podcast covering many topics. I find it super interesting. While not distinctly data related, it frequently covers topics that have tangential importance to being a good data scientist.
  • Exponential View: Not primarily focused on data, but is very frequently covering artificial intelligence and machine learning topics. I recommend the newsletter that goes along with this podcast (link below).
  • Not So Standard Deviations: Richard Peng and Hilary Parker host a podcast on all things data science.
  • The Data Lab Podcast: Local [to Philly] data podcast interviewing local data scientists. I find it reassuring to hear that my habits are often in line with these peoples, plus I’ve picked up many really great tidbits (like the Exponential View newsletter).
  • O’Reilly Data Show: I have attended the Strata data conference by O’Reilly. Much like the conference, this podcast covers many relevant data themes.
  • Data Skeptic: Another data podcast that covers many good data topics.

BLOGS & NEWSLETTERS

  • Exponential View: Billed as a weekly “wondermissive”, the author Azeem Azhar covers many topics relevant to data and the greater technology economy. I truly look forward to getting this newsletter every Sunday morning.
  • Farnam Street: A weekly newsletter (and blog) about decision making. I frequently find golden tips on how to think and frame thinking. Must read.
  • Twitter: I follow many great data people on twitter and get a great deal of my data news there.