Data Engineering

Data Python

Data Engineering 101: How to Build a real world dataset end-to-end: Part II

This video is from Deep Data Science. Data Engineering is a fundamental yet neglected skill by many data scientists. In this series I’m going to show step-by-step how I construct real world dataset for machine learning (e.g. FlowDB 2.0) from a variety of sources. I will also describe differences between this dataset and constructing data-lakes […]

Read More
AI Big Data Data

The Importance of Data Pipelines in AI and Data Science: An Overview

Data is the lifeblood of Artificial Intelligence (AI) and Data Science. It drives insights, powers decisions, and propels innovations. To unlock its full potential, data must be correctly handled, and this is where data pipelines come into play. What are Data Pipelines? Data pipelines are a series of data processing steps where data is ingested […]

Read More
AI

Data Science Mastery with ChatGPT Prompt Engineering: Learn Faster!

Unlock the power of ChatGPT to master data science & engineering. In this video from DecisionForest learn how to turbocharge your skills with clear prompts, epic data cleaning, visualisations and machine learning algorithms without any data science prior knowledge.

Read More
Databricks

Data Ingestion using Auto Loader

In this video is from Databricks, you will learn how to ingest your data using Auto Loader. Ingestion with Auto Loader allows you to incrementally process new files as they land in cloud object storage while being extremely cost-effective at the same time. It can ingest JSON, CSV, PARQUET, and other file formats. Access more […]

Read More
Big Data Data

How to Use Behavioral Data To Gain Deep Insights Into Your Customer Base

The opportunities for organizations to use behavioral data to gain deep insights into their customer base have increased exponentially in the last few years. In this “Data Cloud Now” interview, host Ryan Green chats with Yali Sassoon, Co-founder and Chief Strategy Office of Snowplow Analytics, about what those new opportunities are and the role that […]

Read More
Databricks

How to Schedule a Job and Automate a Workload in Databricks

In this Databricks tutorial learn how to create, run, and schedule Jobs.

Read More
Big Data Data Python

How to Scale up your Pandas workflows with Modin

pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs to help data scientists prepare, analyze, and explore their data. However, despite its widespread adoption, pandas suffers from severe memory and performance issues on moderately large datasets. This presentation focuses on Modin, a fast, scalable drop-in […]

Read More