In this episode of Azure Friday, Thomas Alex discusses how Microsoft uses Apache Kafka for HDInsight to power Siphon, a data ingestion service for internal use.

Apache Kafka for HDInsight is an enterprise-grade, open-source, streaming ingestion service. Microsoft created Siphon as a highly available and reliable service to ingest massive amounts of data for processing in near real time. Siphon handles ingestion of over a trillion events per day across multiple business-critical scenarios at Microsoft. In this episode, learn how Siphon uses Apache Kafka for HDInsight as its scalable pub/sub message queue.

 

For more information:

In this video, Murali Krishnaprasad discusses Interactive Query (also called Hive LLAP, or Low Latency Analytical Processing, or Live Long and Process), which is an Azure HDInsight cluster type. Interactive Query supports in-memory caching, which makes Hive queries super-fast and interactive. See how to use HDInsight Interactive Query to analyze extremely large datasets (~100TB) in common file formats such as ORC and CSV using common BI/SQL tools including Zeppelin notebooks and VS Code.

For more information, see:

In this video, Katherine Kampf, PM on Azure Big Data team, talks about the newly introduced ML Services in Azure HDInsight.

ML Services bridges these Microsoft innovations and contributions coming from the open-source community (R, Python, and AI toolkits) all on top of a single enterprise-grade platform. Any R or Python open-source machine learning package can work side by side with any proprietary innovation from Microsoft.

ML Services includes highly scalable, distributed set of algorithms such as RevoscaleRrevoscalepy, and microsoftML that can work on data sizes larger than the size of physical memory, and run on a wide variety of platforms in a distributed manner.

Spark is gaining momentum in the big data space. Watch this video for a demonstration of how you can use your favorite developer tools to debug Spark applications.

Product info: azure.microsoft.com/en-us/services/hdinsight/apache-spark/
Learn more: docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-load-data-run-query
Documentation: docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-debug-remotely-through-ssh

Raghav Mohan joins Scott Hanselman to talk about Apache Kafka on HDInsight, which added the open-source distributed streaming platform last year to complete a scalable, big data streaming scenario on Azure.

Kafka is capable of processing millions of events/sec, petabytes of data/day to power scenarios like Toyota’s connected car, Office 365’s clickstream analytics, fraud detection for large banks, etc.

Find out how to deploy managed, cost-effective Kafka clusters on Azure HDInsight with a 99.9% SLA with just 4 clicks or pre-created ARM templates.

For more information, see: