The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, proper data science requires careful coding, and pandas will not stop you from creating misleading plots, drawing incorrect conclusions, ignoring relevant data, including misleading data, or executing incorrect calculations.

In this tutorial session from PyCon Cleveland 2018, you’ll perform a variety of data science tasks on a handful of real-world datasets using pandas.

Recently, I delivered a presentation on “Data Science for the Curious” at the WeWork K Street location in Washington, DC.The goal was to help the largely non-technical audience of public policy professionals understand some of the core tenets of data science: its promises and its perils.In light of the recent Facebook revelations, this is more critical now than ever before.

Frank and Andy talked about doing a Deep Dive show where they take a deep look into a particular data science technology, term, or methodology.  And now, they deliver!

In this very first Deep Dive, Frank and Andy discuss the differences between Data Science and Data Engineering, where they overlap, where they differ, and why so many C-level execs can’t seem to figure out the deltas.

This video is an introduction to R programming in which I provide a tutorial on some statistical analysis (specifically using the t-test and linear regression).

It also demonstrates how to use dplyr and ggplot to do data manipulation and data visualization. Its R programming for beginners really and is filled with graphics, quantitative analysis and some explanations as to how statistics work.

If you’re a statistician, into data science or perhaps someone learning bio-stats and thinking about learning to use R for quantitative analysis, then you’ll find this video useful. Importantly, R is free.

With the rapid emergence of digital devices, an unstoppable, invisible force is changing human lives in incredible ways. That force is data. We generated more data in 2017 than in all the previous 5,000 years of human history.

The massive gathering and analyzing of data in real time is allowing us to address some of humanity’s biggest challenges but as Edward Snowden and the release of NSA documents have shown, the accessibility of all this data comes at a steep price.

This documentary captures the promise and peril of this extraordinary knowledge revolution.