Apache Spark Turns 10

Apache Spark Turns 10: The Secret Sauce Behind One Of The World’s Most Popular Open Source Projects

As Apache Spark is 10 years old. This article in Analytics India Magazine explores what led to Spark’s widespread adoption and what will keep it going into the future.

Dubbed as the official “in-memory replacement for MapReduce”, the disk-based computational engine is at the heart of early Hadoop clusters. Why Spark took off was because it reflects the changing processing paradigm to a more memory intensive pipeline, so if your cluster has a decent memory and an API simpler than MapReduce, processing in Spark will be faster. The reason why Spark is faster is because most of the operations (including reads) decrease in processing time roughly linearly with the number of machines since it’s all distributed.

Frank

#DataScientist, #DataEngineer, Blogger, Vlogger, Podcaster at http://DataDriven.tv . Back @Microsoft to help customers leverage #AI Opinions mine. #武當派 fan. I blog to help you become a better data scientist/ML engineer Opinions are mine. All mine.