Hadoop

Big Data Data IoT

Kafka Streams 101: Getting Started

To understand Kafka Streams, you have to begin with Apache Kafka®, a distributed, scalable, elastic, and fault-tolerant event streaming platform. The storage nodes in Kafka, brokers, are just instances of the Kafka storage layer process running on your laptop or server. At the heart of each broker is a log, an append-only file that holds […]

Read More
Big Data Databricks

Clean Your Data Swamp by Migrating Off of Hadoop

In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along […]

Read More
Spark

Unboxing Spark Standalone Architecture

Big Data Engineering closely examines  Spark Standalone Architecture. Apache Spark has a well-defined layered architecture where all the spark components and layers are loosely coupled. This architecture is further integrated with various extensions and libraries. Apache Spark Architecture is based on two main abstractions: Resilient Distributed Dataset (RDD) Directed Acyclic Graph (DAG)

Read More
Big Data

Kafka + Spark Streaming + Hive Example

Davis Busteed walks us through building a proof of concept for Spark Streaming from a Kafka Source to Hive. Check out the README and resource files at https://github.com/dbusteed/kafka-spark-streaming-example 

Read More
Azure Containers FinTech

Banking in Latin America: Planning for Hybrid Cloud | Part 2

In this second part episode, Fernando Mejia walks through everything you need to plan for in a Hybrid Cloud architecture for Azure Kubernetes Service. This includes IP address concerns from on-premises to Azure, hub and spoke topology, as well as the different options you have in Azure Kubernetes Service.  Watch Part 1 Learn more: https://azure.microsoft.com/en-us/overview/kubernetes-on-azure

Read More
Azure Databricks

Introduction to Azure Databricks

Ayman El-Ghazali recently presenting this Introduction to Databricks from the perspective of a SQL DBA at the NoVA SQL Users Group. Code available at:https://github.com/thesqlpro/blogThis is an introduction to Databricks from the perspective of a SQL DBA. Come learn about the following topics: Basics of how Spark works Basics of how Databricks works (cluster setup, basic […]

Read More
Azure Data

The Modern Data Warehouse in Azure – Data Processing

In this video, Chris Seferlis continues discussing the Modern Data Platform in Azure with Part 3: Data Processing. Tools Discusssed: Azure Data Factory Data Flows – https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview Azure Databricks – https://azure.microsoft.com/en-us/services/databricks/ Azure HDInsight – https://azure.microsoft.com/en-us/services/hdinsight/ Azure Marketplace – https://azuremarketplace.microsoft.com/en-us/marketplace/

Read More
Azure Data Data Warehouse

Modern Data Warehousing Part 1: Azure Data Ingestion Options

In this video, Chris Seferlis  discusses stage one of the Modern Data Warehouse Process: Data Ingestion. Related Links: Azure Data Factory – https://azure.microsoft.com/en-us/services/data-factory/ Azure Databricks – https://azure.microsoft.com/en-us/services/databricks/ Azure HDInsight – https://azure.microsoft.com/en-us/services/hdinsight/ Azure Synapse – https://azure.microsoft.com/en-us/services/synapse-analytics/ Azure Data Box – https://azure.microsoft.com/en-us/services/databox/ Event Hubs – https://azure.microsoft.com/en-us/services/event-hubs/ Kafka on HDInsight – https://docs.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-introduction IoT Hub – https://azure.microsoft.com/en-us/services/iot-hub/

Read More
Azure Big Data

Big Data Cluster High Availability

In this video learn about the high availability options you have for the mission critical services running within the SQL Server Big Data Clusters. Find out more here: https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-high-availability?view=sql-server-ver15&WT.mc_id=dataexposed-c9-niner https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-high-availability-hdfs-spark?view=sql-server-ver15&WT.mc_id=dataexposed-c9-niner

Read More
Azure Big Data

Introduction to Azure Data Lake Storage Gen 2

Data Lake Storage Gen 2 is the best storage solution for big data analytics in Azure. With its Hadoop compatible access, it is a perfect fit for existing platforms like Databricks, Cloudera, Hortonworks, Hadoop, HDInsight and many more. Take advantage of both blob storage and data lake in one service! In this video, Azure 4 […]

Read More