The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, proper data science requires careful coding, and pandas will not stop you from creating misleading plots, drawing incorrect conclusions, ignoring relevant data, including misleading data, or executing incorrect calculations.

In this tutorial session from PyCon Cleveland 2018, you’ll perform a variety of data science tasks on a handful of real-world datasets using pandas.

Recently, I delivered a presentation on “Data Science for the Curious” at the WeWork K Street location in Washington, DC.The goal was to help the largely non-technical audience of public policy professionals understand some of the core tenets of data science: its promises and its perils.In light of the recent Facebook revelations, this is more critical now than ever before.

Press the play button below to listen here or visit the show page at DataDriven.tv

 

Frank and Andy talked about doing a Deep Dive show where they take a deep look into a particular data science technology, term, or methodology.  And now, they deliver!

In this very first Deep Dive, Frank and Andy discuss the differences between Data Science and Data Engineering, where they overlap, where they differ, and why so many C-level execs can’t seem to figure out the deltas.