Spark Streaming


Unboxing Spark Standalone Architecture

Big Data Engineering closely examines  Spark Standalone Architecture. Apache Spark has a well-defined layered architecture where all the spark components and layers are loosely coupled. This architecture is further integrated with various extensions and libraries. Apache Spark Architecture is based on two main abstractions: Resilient Distributed Dataset (RDD) Directed Acyclic Graph (DAG)

Big Data

Kafka + Spark Streaming + Hive Example

Davis Busteed walks us through building a proof of concept for Spark Streaming from a Kafka Source to Hive. Check out the README and resource files at 

