Spark

Databricks

Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this “data downtime” from happening in the first place. Join Prateek Chawla and Ryan Kearns as […]

Read More
Big Data

Advancing Spark – Engineering behind Featurestore

In this video, Simon takes the same example notebook and looks at applying some engineering best practices, as well as looking at the delta table that sits underneath the featurestore. That way, we can understand the impact of these commands and properly get to grips with using Featurestore in a production environment. In a recent […]

Read More
Azure Synapse

Data Science and Predictive Analytics with Azure Synapse

Discover new Azure Synapse features to integrate predictive analytics capabilities into your organization—using both code-free and code-first options for AI/ML.

Read More
Natural Language Processing Spark

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. Powered by advanced AI algorithms and an intuitive conversational interface answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way […]

Read More
Databricks

Funnel Analysis with Apache Spark and Druid

Every day, millions of advertising campaigns are happening around the world. As campaign owners, measuring the ongoing campaign effectiveness (e.g “how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?”) is super important. However, this task (often referred to as “funnel analysis”) […]

Read More
Databricks Spark

Advancing Spark – Runtime 8 2 and Advanced Schema Evolution

Another week, another new Databricks Runtime. Runtime 8.2 brings some nice functionality around operational metrics, but the big star of the week is the new Schema Inference & Evolution functionality available through Autoloader. In this video, Simon takes a look through simple schema inference, applying schema hints and watching the schema metadata evolve through the […]

Read More
Databricks

Configuring Azure Databricks Spot VM Clusters

Azure Spot VMs are incredibly cheap CPUs that come with the risk of being evicted if enough demand for full-price CPUs occurs in the region. Luckily, Spark is a resilient distributed system that can easily handle replacing nodes, and so we’re left with a very cost effective approach to provisioning lower-priority workloads! In this video, […]

Read More
Azure Databricks

Azure Databricks News – March 2021

With all the things coming out in Azure Databricks recently,  Advancing Analytics is starting a monthly roundup of the platform updates and any runtime additions. March 2021 has seen a whole load of new features, from the GA of Runtime 8.0 AND 8.1, Spot VM bidding, workspace limit lifting and more. Check out this month’s […]

Read More
Databricks

Exploring Databricks Runtime 7.6 and 8.0

Advancing Analytics takes a closer look at the two new runtimes available for Databricks. We have not just one but two new Databricks Runtimes currently in preview – 7.6 brings several new features focussing on making Autoloader more flexible, improving performance of Optimize and Structured Streaming. Runtime 8.0 is a much wider change, seeing the […]

Read More