Spark

Azure Synapse

Data Science and Predictive Analytics with Azure Synapse

Discover new Azure Synapse features to integrate predictive analytics capabilities into your organization—using both code-free and code-first options for AI/ML.

Read More
Natural Language Processing Spark

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. Powered by advanced AI algorithms and an intuitive conversational interface answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way […]

Read More
Databricks

Funnel Analysis with Apache Spark and Druid

Every day, millions of advertising campaigns are happening around the world. As campaign owners, measuring the ongoing campaign effectiveness (e.g “how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?”) is super important. However, this task (often referred to as “funnel analysis”) […]

Read More
Databricks Spark

Advancing Spark – Runtime 8 2 and Advanced Schema Evolution

Another week, another new Databricks Runtime. Runtime 8.2 brings some nice functionality around operational metrics, but the big star of the week is the new Schema Inference & Evolution functionality available through Autoloader. In this video, Simon takes a look through simple schema inference, applying schema hints and watching the schema metadata evolve through the […]

Read More
Databricks

Configuring Azure Databricks Spot VM Clusters

Azure Spot VMs are incredibly cheap CPUs that come with the risk of being evicted if enough demand for full-price CPUs occurs in the region. Luckily, Spark is a resilient distributed system that can easily handle replacing nodes, and so we’re left with a very cost effective approach to provisioning lower-priority workloads! In this video, […]

Read More
Azure Databricks

Azure Databricks News – March 2021

With all the things coming out in Azure Databricks recently,  Advancing Analytics is starting a monthly roundup of the platform updates and any runtime additions. March 2021 has seen a whole load of new features, from the GA of Runtime 8.0 AND 8.1, Spot VM bidding, workspace limit lifting and more. Check out this month’s […]

Read More
Databricks

Exploring Databricks Runtime 7.6 and 8.0

Advancing Analytics takes a closer look at the two new runtimes available for Databricks. We have not just one but two new Databricks Runtimes currently in preview – 7.6 brings several new features focussing on making Autoloader more flexible, improving performance of Optimize and Structured Streaming. Runtime 8.0 is a much wider change, seeing the […]

Read More
Big Data

Manual Lineage with the Purview PyApacheAtlas API

Simon from Advancing Analytics explores the Atlas API that’s exposed under the covers of the new Azure Purview data governance offering. There are a couple of different libraries available currently, so don’t be surprised if we see a lot of shifts & changes as the preview matures! In this video, Simon takes a look at […]

Read More
Big Data

Kafka + Spark Streaming + Hive Example

Davis Busteed walks us through building a proof of concept for Spark Streaming from a Kafka Source to Hive. Check out the README and resource files at https://github.com/dbusteed/kafka-spark-streaming-example 

Read More