Spark is gaining momentum in the big data space. Watch this video for a demonstration of how you can use your favorite developer tools to debug Spark applications.

Product info:
Learn more:

Raghav Mohan joins Scott Hanselman to talk about Apache Kafka on HDInsight, which added the open-source distributed streaming platform last year to complete a scalable, big data streaming scenario on Azure.

Kafka is capable of processing millions of events/sec, petabytes of data/day to power scenarios like Toyota’s connected car, Office 365’s clickstream analytics, fraud detection for large banks, etc.

Find out how to deploy managed, cost-effective Kafka clusters on Azure HDInsight with a 99.9% SLA with just 4 clicks or pre-created ARM templates.

For more information, see:

Deep learning and AI are fundamentally changing the way data is used in computation. They enable computing capabilities that will transform almost every industry, scientific domain, and public usage of data and compute.

The recent success of deep learning algorithms can be seen as the culmination of decades of progress in three areas: research in DL algorithms, broad availability of big data infrastructure, and the massive growth of computation power produced by Moore’s law and the advent of parallel compute architectures.

Deep learning has been employed successfully in such diverse areas as healthcare, transportation, industrial IoT, finance, entertainment, and retail, in addition to high-performance computing.

Examples shown in this video illustrate how the approach works and how it complements high-performance data analytics and traditional business intelligence.

Mike Olson, Chief Strategy Officer and Co-Founder at Cloudera, explains Apache Spark’s origins, its rise in popularity in the open source community, and how Spark is primed to replace MapReduce as the general processing engine in Hadoop.

Talk about big data. CERN, best known for its nuclear research and large hadron collider, has released 300TB of research data on their new data portal.


Previously, CERN had released around 27 terabytes of research information in November 2014. In that case, the data posted was collected from experiments done in 2010.

Kati Lassila-Perini, a physicist who working on the Compact Muon Solenoid detector, said that “members of the CMS Collaboration put in lots of effort and thousands of person-hours each of service work in order to operate the CMS detector and collect this research data for our analysis.”

Furthermore, she added that “once we’ve exhausted our exploration of the data, we see no reason not to make them available publicly.”

She continued, “The benefits are numerous, from inspiring high-school students to the training of the particle physicists of tomorrow.”

It’s incredible that we live in an age where everyone with an internet connection has access to the raw data that previously only a chosen few of advanced researchers had access to.

Who knows where this can lead?