ETL

Databricks

How Databricks Leverages Auto Loader to Ingest Millions of Files an Hour

Continuously and incrementally ingesting data as it arrives in cloud storage has become a common workflow in our customers’ ETL pipelines. However, managing this workflow is rife with challenges, such as scalable and efficient file discovery, schema inference and evolution, and fault tolerance with exactly-once guarantees. Auto Loader is a new Structured Streaming source in […]

Read More
Big Data Data Databricks

Empowering Zillow’s Developers with Self-Service ETL

Databricks  shows how their tech empowers Zillow’s developers via self-service ETL. These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform […]

Read More
Azure Data

The Modern Data Warehouse in Azure – Data Processing

In this video, Chris Seferlis continues discussing the Modern Data Platform in Azure with Part 3: Data Processing. Tools Discusssed: Azure Data Factory Data Flows – https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview Azure Databricks – https://azure.microsoft.com/en-us/services/databricks/ Azure HDInsight – https://azure.microsoft.com/en-us/services/hdinsight/ Azure Marketplace – https://azuremarketplace.microsoft.com/en-us/marketplace/

Read More
Azure Big Data Databricks

How to Build a Cloud Data Platform with Databricks Part 2 – ETL Processing

Learn how to use Apache Spark and Delta Lake on Databricks to perform ETL processing, manage late arriving data, and repair corrupted data. Companies look to support both business analytics and machine learning initiatives within their organization, but often face challenges with complex operations, proprietary technologies, and unreliable data.

Read More
Azure Big Data

Introduction to the Modern Data Warehouse in Azure

Chris Seferlis will be publishing on the modern data warehouse in Azure. Here he starts with an overview of the stages of a data warehouse and the implications of ELT vs ETL as we move from sources, ingestion, storage, transformation, staging and presentation. Learn more: https://azure.microsoft.com/en-in/solutions/architecture/modern-data-warehouse/

Read More
Azure SQL Server

Azure Synapse Analytics – Next-gen Azure SQL Data Warehouse

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for […]

Read More
Azure Data Data Science

Kappa vs Lambda Architecture

Chris Seferlis describes some key differences between the Kappa and Lambda Architectures, advantages and disadvantages of each, and why you might choose one over the other on the Azure platform.

Read More
Big Data Data Data Driven

Trey Johnson on ETL, Data, and Fixing Up Old Cars

In this episode of Data Driven, Andy and I talk to Trey Johnson about ETL, Data, and fixing up old cars.  Also, I apologize for the delay in production for this episode Press the play button below to listen here or visit the show page at DataDriven.tv

Read More
Big Data Spark

Azure Databricks for Data Engineers and Data Developers

Data engineering is about 70% of any data pipeline today, and without having the experience to implement a data engineering pipeline well, there is no value to be accumulated from your data. In this session from Microsoft Ignite we discuss the best practices and demonstrate how a data engineer can develop and orchestrate the big […]

Read More