Spark SQL Beyond the Official Documentation

Spark SQL Beyond the Official Documentation

This video with David Vrba focuses on some internal features of Spark SQL which are not well described in official documentation with a strong emphasis... Details
December 2020 Databricks Customer Newsletter

December 2020 Databricks Customer Newsletter

Need a quick video to keep up on all the recent happenings with Databricks? Look no further than this December 2020 newsletter the team put... Details
Michael Armbrust Demystifies Delta Lakes

Michael Armbrust Demystifies Delta Lakes

On the latest episode of Data Brew, Denny Lee talks to Michael Armbrust about Delta Lake. Delta Lake is an open source storage layer that... Details
How to Implement a GAN on Databricks

How to Implement a GAN on Databricks

Here’s a brilliant Lightning talk from Data + AI Summit 2020 by Dr. Evan Eames. We implemented a pix2pix Generative Adversarial Network (GAN) on Databricks... Details
Data Quality Testing in the Medallion Architecture with PyTest and PySpark

Data Quality Testing in the Medallion Architecture with PyTest and PySpark

Here’s a great Lightning talk from Data + AI Summit 2020 by Carter Kilgour on ”Why data quality is especially important in the medallion architecture,... Details
Delta Lakehouse Data Profiler and SQL Analytics Demo

Delta Lakehouse Data Profiler and SQL Analytics Demo

Coming from a data warehousing and BI background, Franco Patano wanted to have a catalogue of the Lakehouse, including schema and profiling statistics. He created... Details
Comparing Azure Synapse, Snowflake and Databricks for Common Data Workloads

Comparing Azure Synapse, Snowflake and Databricks for Common Data Workloads

In this video, Chris Seferlis describes some of the most common data workloads that are being deployed on Azure and which of the 3 major... Details
BI to Lakehouse Round 3: Community Questions Answered

BI to Lakehouse Round 3: Community Questions Answered

Considering shifting gears into Spark Data Engineering? We have another fun session with Simon Whiteley  and Denny Lee as they answer your questions from their... Details
How Do Apache Spark 3 0 and Delta Lake Enhance Reliability

How Do Apache Spark 3 0 and Delta Lake Enhance Reliability

Apache Spark has become the de-facto open-source standard for big data processing for its ease of use and performance. The open-source Delta Lake project improves... Details
Databricks Launches New Web Series on Lakehouses

Databricks Launches New Web Series on Lakehouses

Databricks just launched a new web series: Data Brew and this is the first episode. For this first season, we will be focusing on lakehouses... Details