Spark

Spark

Comprehensive View on Intervals in Apache Spark 3.2

Here’s an overview of intervals in Apache Spark before version 3.2, and the changes that are coming in the future releases.

Read More
Natural Language Processing Spark

Jeeves Grows Up: An AI Chatbot for Performance and Quality

Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. Powered by advanced AI algorithms and an intuitive conversational interface answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way […]

Read More
Privacy Spark

Scaling Privacy in a Spark Ecosystem

Privacy has become one of the most important topics in data today. It has more than how do we ingest and consume data but the important factors about how you protect your customer’s rights while balancing the business need. In this video, Privacera CTO, Don Bosco Durai together with Northwestern Mutual to detail an important […]

Read More
Big Data Microsoft Spark

What’s New in .NET for Apache Spark v1.1.1?

.NET for Apache Spark empowers .NET developers to participate in the world of big data analytics. In this episode, Jeremy chats with Michael Rys to discuss some of the new features and capabilities available in this release Related Links .NET for Apache Spark™ .NET for Apache Spark™ tutorial .NET for Apache Spark™ documentation

Read More
Databricks Spark

Advancing Spark – Runtime 8 2 and Advanced Schema Evolution

Another week, another new Databricks Runtime. Runtime 8.2 brings some nice functionality around operational metrics, but the big star of the week is the new Schema Inference & Evolution functionality available through Autoloader. In this video, Simon takes a look through simple schema inference, applying schema hints and watching the schema metadata evolve through the […]

Read More
Spark

Unboxing Spark Standalone Architecture

Big Data Engineering closely examines  Spark Standalone Architecture. Apache Spark has a well-defined layered architecture where all the spark components and layers are loosely coupled. This architecture is further integrated with various extensions and libraries. Apache Spark Architecture is based on two main abstractions: Resilient Distributed Dataset (RDD) Directed Acyclic Graph (DAG)

Read More
Containers Spark

Apache Spark Streaming in K8s with ArgoCD & Spark Operator

Here’s an interesting talk Albert Franziu Cros on a CI/CD setup composed by a Spark Streaming job in K8s consuming from Kafka. Over the last year, we have been moving from a batch processing jobs setup with Airflow using EC2s to a powerful & scalable setup using Airflow & Spark in K8s. The increasing need […]

Read More
Containers Spark

Real-Time Health Score Application using Apache Spark on Kubernates

This on the Databricks YouTube channel presents the web application that calculates real-time health scores at a very rapid speed using Spark on Kubernates. A health score represents a machine’s lifetime and it is commonly used as a landmark for making a decision on whether to replace the machine with new one for high productivity […]

Read More