How to Scale up your Pandas workflows with Modin

pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs to help data scientists prepare, analyze, and explore their data. However, despite its widespread adoption, pandas suffers from severe memory and performance issues on moderately large datasets.

This presentation focuses on Modin, a fast, scalable drop-in replacement for pandas. By changing just a single line of code, Modin seamlessly speeds up pandas workflow on a laptop or in a cluster. Modin has over 6.6k GitHub stars, 1.7 million downloads, and is deployed at many data-centric organizations to accelerate dataframe workflows.

For more details, see: https://github.com/modin-project/modin

ABOUT THE SPEAKER
Devin Petersohn is the lead developer of Modin and the co-founder and CTO of Ponder. Devin recently completed his Ph.D. from UC Berkeley RISE Lab, where he did research on distributed systems for data science. As a part of this work, he created Modin, a system for enabling scalable interactive data science.

Frank

#DataScientist, #DataEngineer, Blogger, Vlogger, Podcaster at http://DataDriven.tv . Back @Microsoft to help customers leverage #AI Opinions mine. #武當派 fan. I blog to help you become a better data scientist/ML engineer Opinions are mine. All mine.