Apache Spark Streaming in K8s with ArgoCD & Spark Operator

Here’s an interesting talk Albert Franziu Cros on a CI/CD setup composed by a Spark Streaming job in K8s consuming from Kafka.

Over the last year, we have been moving from a batch processing jobs setup with Airflow using EC2s to a powerful & scalable setup using Airflow & Spark in K8s.

The increasing need of moving forward with all the technology changes, the new community advances, and multidisciplinary teams, forced us to design a solution where we were able to run multiple Spark versions at the same time by avoiding duplicating infrastructure and simplifying its deployment, maintenance, and development.

In our talk, we will be covering our journey about how we ended up with a CI/CD setup composed by a Spark Streaming job in K8s consuming from Kafka, using the Spark Operator and deploying with ArgoCD.

Frank

#DataScientist, #DataEngineer, Blogger, Vlogger, Podcaster at http://DataDriven.tv . Back @Microsoft to help customers leverage #AI Opinions mine. #武當派 fan. I blog to help you become a better data scientist/ML engineer Opinions are mine. All mine.