Simon Whiteley

Databricks

Databricks Delta Change Feed

Advancing Analytics’ Simon Whitely explores the Databricks Change Feed enables CDC, or Change Data Capture, in the spark environment. Keeping track of changed records can be a hugely inefficient exercise, comparing masses of records to determine which ones have been changed by upstream events. With the delta table change feed, we can keep an efficient […]

Read More
Databricks Spark

Advancing Spark – Runtime 8 2 and Advanced Schema Evolution

Another week, another new Databricks Runtime. Runtime 8.2 brings some nice functionality around operational metrics, but the big star of the week is the new Schema Inference & Evolution functionality available through Autoloader. In this video, Simon takes a look through simple schema inference, applying schema hints and watching the schema metadata evolve through the […]

Read More
Databricks

Advancing Spark – Bloom Filter Indexes in Databricks Delta

Data Lakes are notoriously bad at single record lookups, the kind of query where you are looking for a specific ID in amongst millions of records. Eouldn’t it be great if we could just pop an index over the top to speed this type of operation up? Turns out we can! In this video Simon […]

Read More
Databricks

Configuring Azure Databricks Spot VM Clusters

Azure Spot VMs are incredibly cheap CPUs that come with the risk of being evicted if enough demand for full-price CPUs occurs in the region. Luckily, Spark is a resilient distributed system that can easily handle replacing nodes, and so we’re left with a very cost effective approach to provisioning lower-priority workloads! In this video, […]

Read More
Azure Databricks

Azure Databricks News – March 2021

With all the things coming out in Azure Databricks recently,  Advancing Analytics is starting a monthly roundup of the platform updates and any runtime additions. March 2021 has seen a whole load of new features, from the GA of Runtime 8.0 AND 8.1, Spot VM bidding, workspace limit lifting and more. Check out this month’s […]

Read More
Azure PowerBI

Azure Purview – Preview Power BI Lineage

One of the most compelling visuals we’ve seen from the marketing buzz around Azure Purview is that end-to-end picture of source data moving through data processing, all the way into a Power BI dashboard. In this video, Simon from Advancing Analytics  takes a dive into associating Power BI with your Azure Purview account, looking at […]

Read More
AI Azure Data Microsoft

Data & AI Updates at Microsoft Ignite 2021

Advancing Analytics highlights the Data and AI updates from the first day of Ignite 2021.

Read More
Azure Governance

How Much Does Azure Purview Cost?

Advancing Analytics answers the question that has come up a few times in my day to day work: it looks great, but how much is this going to cost? There has been a whole lot of excitement around Azure Purview, the new governance and classification tool in Azure, but also a few worried noises about […]

Read More
Databricks

Exploring Databricks Runtime 7.6 and 8.0

Advancing Analytics takes a closer look at the two new runtimes available for Databricks. We have not just one but two new Databricks Runtimes currently in preview – 7.6 brings several new features focussing on making Autoloader more flexible, improving performance of Optimize and Structured Streaming. Runtime 8.0 is a much wider change, seeing the […]

Read More
Big Data

Manual Lineage with the Purview PyApacheAtlas API

Simon from Advancing Analytics explores the Atlas API that’s exposed under the covers of the new Azure Purview data governance offering. There are a couple of different libraries available currently, so don’t be surprised if we see a lot of shifts & changes as the preview matures! In this video, Simon takes a look at […]

Read More