Here’s an interesting talk from PyCon Germany by Joshua Görner, a Data Scientist at BMW.

From the video description:

Interactive notebooks like Jupyter have become more and more popular in the recent past and build the core of many data scientist’s workplace. Being accessed via web browser they allow scientists to easily structure their work by combining code and documentation. Yet notebooks often lead to isolated and disposable analysis artifacts. Keeping the computation inside those notebooks does not allow for convenient concurrent model training, model exposure or scheduled model retraining. Those issues can be addressed by taking advantage of recent developments in the discipline of software engineering. Over the past years containerization became the technology of choice for crafting and deploying applications. Building a data science platform that allows for easy access (via notebooks), flexibility and reproducibility (via containerization) combines the best of both worlds and addresses Data Scientist’s hidden needs.

Data Preparation is generally regarded as taking 80% of the resources for an advanced analytics project, but that so? What’s the difference between prepping and modelling data?

What is featurization and how does it fit in? All these questions and more will be answered in the show, we will even shock and wow Seth with some demos.

Python has quickly risen to be the top language for AI and Data Science. A few years ago, the question was “should I learn Python?” now it’s “How can I learn Python?”

Thankfully, here’s a great tutorial for beginners that’s thorough and free on YouTube.

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, proper data science requires careful coding, and pandas will not stop you from creating misleading plots, drawing incorrect conclusions, ignoring relevant data, including misleading data, or executing incorrect calculations.

In this tutorial session from PyCon Cleveland 2018, you’ll perform a variety of data science tasks on a handful of real-world datasets using pandas.