Christian Wade joins Scott Hanselman to show you how to unlock petabyte-scale datasets in Azure with a way that was not previously possible. Learn how to use the aggregations feature in Power BI to enable interactive analysis over big data.

For more information:

 

In this episode of Azure Friday, Thomas Alex discusses how Microsoft uses Apache Kafka for HDInsight to power Siphon, a data ingestion service for internal use.

Apache Kafka for HDInsight is an enterprise-grade, open-source, streaming ingestion service. Microsoft created Siphon as a highly available and reliable service to ingest massive amounts of data for processing in near real time. Siphon handles ingestion of over a trillion events per day across multiple business-critical scenarios at Microsoft. In this episode, learn how Siphon uses Apache Kafka for HDInsight as its scalable pub/sub message queue.

 

For more information:

In this video, Murali Krishnaprasad discusses Interactive Query (also called Hive LLAP, or Low Latency Analytical Processing, or Live Long and Process), which is an Azure HDInsight cluster type. Interactive Query supports in-memory caching, which makes Hive queries super-fast and interactive. See how to use HDInsight Interactive Query to analyze extremely large datasets (~100TB) in common file formats such as ORC and CSV using common BI/SQL tools including Zeppelin notebooks and VS Code.

For more information, see:

In this video, Katherine Kampf, PM on Azure Big Data team, talks about the newly introduced ML Services in Azure HDInsight.

ML Services bridges these Microsoft innovations and contributions coming from the open-source community (R, Python, and AI toolkits) all on top of a single enterprise-grade platform. Any R or Python open-source machine learning package can work side by side with any proprietary innovation from Microsoft.

ML Services includes highly scalable, distributed set of algorithms such as RevoscaleRrevoscalepy, and microsoftML that can work on data sizes larger than the size of physical memory, and run on a wide variety of platforms in a distributed manner.

Terrabytes, Petabytes, and Yottabytes. We toss these terms around quite casually, but do we really know how huge these are?

Using grains of rice and retro computer graphics reminiscent of the Computer Chronicles, the YouTube channel “It’s OK to Be Smart” explores how big big data is and what the future of data storage may be.

The term Data Estate cropped up a little over a year ago, but what exactly does it mean?

Is it marketing-speak or something more profound?

In this DataPoint, I elaborate on the term “Data Estate” in front of an actual estate in Potomac, MD.

Press the play button below to listen here or visit the show page at DataDriven.tv

In this DataPoint, Frank talks about the term “Data Estate” in front of an actual estate.

With the rapid emergence of digital devices, an unstoppable, invisible force is changing human lives in incredible ways. That force is data. We generated more data in 2017 than in all the previous 5,000 years of human history.

The massive gathering and analyzing of data in real time is allowing us to address some of humanity’s biggest challenges but as Edward Snowden and the release of NSA documents have shown, the accessibility of all this data comes at a steep price.

This documentary captures the promise and peril of this extraordinary knowledge revolution.