Spark

Azure Synapse

Moving Away from Data Flows and Simplifying Data Pipelines in Azure Synapse Analytics

In this session, you will learn how to use Spark SQL and Python to create notebooks which are called from integration pipelines to create an efficient, scalable, maintainable solution to create data migration and transfer tasks.

Read More
AI Data Science Red Hat

Open Data Hub – the origin story (part 2)

In part 2 of the Open Data Hub origin story, fellow Red Hatters Steven Huels and Sherard Griffin describe some of the technical challenges and growth of the Open Data Hub AI meta-project, evolving Elastic Search to multiple data discovery technologies. The evolution to a commercial service offering, Red Hat OpenShift Data Science is also […]

Read More
AI Data Red Hat

Open Data Hub – the origin story (part 1)

Fellow Red Hatters Steven Huels and Sherard Griffin describe how the Open Data Hub meta-project grew from solving practical CI/CD build challenges to where it is today – providing an integrated blueprint stitching together over 20 open source AI tools for running large and distributed AI workloads on OpenShift. Part 1 of a 2 part […]

Read More
Databricks

Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this “data downtime” from happening in the first place. Join Prateek Chawla and Ryan Kearns as […]

Read More
Big Data

Advancing Spark – Engineering behind Featurestore

In this video, Simon takes the same example notebook and looks at applying some engineering best practices, as well as looking at the delta table that sits underneath the featurestore. That way, we can understand the impact of these commands and properly get to grips with using Featurestore in a production environment. In a recent […]

Read More
Azure Synapse

Data Science and Predictive Analytics with Azure Synapse

Discover new Azure Synapse features to integrate predictive analytics capabilities into your organization—using both code-free and code-first options for AI/ML.

Read More
Natural Language Processing Spark

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. Powered by advanced AI algorithms and an intuitive conversational interface answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way […]

Read More
Databricks

Funnel Analysis with Apache Spark and Druid

Every day, millions of advertising campaigns are happening around the world. As campaign owners, measuring the ongoing campaign effectiveness (e.g “how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?”) is super important. However, this task (often referred to as “funnel analysis”) […]

Read More
Databricks Spark

Advancing Spark – Runtime 8 2 and Advanced Schema Evolution

Another week, another new Databricks Runtime. Runtime 8.2 brings some nice functionality around operational metrics, but the big star of the week is the new Schema Inference & Evolution functionality available through Autoloader. In this video, Simon takes a look through simple schema inference, applying schema hints and watching the schema metadata evolve through the […]

Read More