Simon Whiteley

Databricks

Advancing Spark – Data + AI Summit 2022 Day 1 Recap

In this video, Simon steps through the Data + AI Summit 2022  Day 1 Keynote, available to watch On Demand via the DAIS 2022 website. He discusses Spark Connect, Project Lightspeed, Delta 2.0, Databricks SQL Serverless, Databricks Marketplace and Databricks Cleanrooms! Don’t know what any of that means?? Watch the video!

Read More
Big Data

Advancing Spark – Engineering behind Featurestore

In this video, Simon takes the same example notebook and looks at applying some engineering best practices, as well as looking at the delta table that sits underneath the featurestore. That way, we can understand the impact of these commands and properly get to grips with using Featurestore in a production environment. In a recent […]

Read More
Databricks

Databricks Delta Change Feed

Advancing Analytics’ Simon Whitely explores the Databricks Change Feed enables CDC, or Change Data Capture, in the spark environment. Keeping track of changed records can be a hugely inefficient exercise, comparing masses of records to determine which ones have been changed by upstream events. With the delta table change feed, we can keep an efficient […]

Read More
Databricks Spark

Advancing Spark – Runtime 8 2 and Advanced Schema Evolution

Another week, another new Databricks Runtime. Runtime 8.2 brings some nice functionality around operational metrics, but the big star of the week is the new Schema Inference & Evolution functionality available through Autoloader. In this video, Simon takes a look through simple schema inference, applying schema hints and watching the schema metadata evolve through the […]

Read More
Databricks

Advancing Spark – Bloom Filter Indexes in Databricks Delta

Data Lakes are notoriously bad at single record lookups, the kind of query where you are looking for a specific ID in amongst millions of records. Eouldn’t it be great if we could just pop an index over the top to speed this type of operation up? Turns out we can! In this video Simon […]

Read More
Databricks

Configuring Azure Databricks Spot VM Clusters

Azure Spot VMs are incredibly cheap CPUs that come with the risk of being evicted if enough demand for full-price CPUs occurs in the region. Luckily, Spark is a resilient distributed system that can easily handle replacing nodes, and so we’re left with a very cost effective approach to provisioning lower-priority workloads! In this video, […]

Read More
Azure Databricks

Azure Databricks News – March 2021

With all the things coming out in Azure Databricks recently,  Advancing Analytics is starting a monthly roundup of the platform updates and any runtime additions. March 2021 has seen a whole load of new features, from the GA of Runtime 8.0 AND 8.1, Spot VM bidding, workspace limit lifting and more. Check out this month’s […]

Read More
Azure PowerBI

Azure Purview – Preview Power BI Lineage

One of the most compelling visuals we’ve seen from the marketing buzz around Azure Purview is that end-to-end picture of source data moving through data processing, all the way into a Power BI dashboard. In this video, Simon from Advancing Analytics  takes a dive into associating Power BI with your Azure Purview account, looking at […]

Read More
AI Azure Data Microsoft

Data & AI Updates at Microsoft Ignite 2021

Advancing Analytics highlights the Data and AI updates from the first day of Ignite 2021.

Read More
Azure Governance

How Much Does Azure Purview Cost?

Advancing Analytics answers the question that has come up a few times in my day to day work: it looks great, but how much is this going to cost? There has been a whole lot of excitement around Azure Purview, the new governance and classification tool in Azure, but also a few worried noises about […]

Read More