Databricks shows how their tech empowers Zillow’s developers via self-service ETL.
These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform addresses the use cases of its intended user, leverages internal services through its modular design, and empowers users to create their own ETL without having to worry about how the ETL is implemented.
Members of Zillow’s data engineering team discuss:
Why they created two separate user interfaces to meet the needs different user groups
What degree of abstraction from the orchestration, deployment, processing, and other ancillary tasks that chose for each user group
How they leveraged internal services and packages, including their Apache Spark package — Pipeler, to democratize the creation of high-quality, reliable pipelines within Zillow.
Supply Chain, Healthcare, Insurance, and Finance often require highly accurate forecasting models in an enterprise large-scale fashion.
With Azure Machine Learning on Azure Databricks, the scale and speed to large-scale many-models can be achieved and time-to-product decreases drastically.
The better-together story poses an enterprise approach to AI/ML.
Azure AutoML offers an elegant solution efficiently to build forecasting models on Azure Databricks compute solving sophisticated business problems.
This presentation covers the Azure Machine Learning + Azure Databricks approach (see slides attached) while the demo covers a hands-on business problem building a forecasting model in Azure Databricks using Azure Machine Learning. The AI/ML better-together story is elevated as MLFlow for Data Science Lifecycle Management and Hyperopt for distributed model execution completes AI/ML enterprise readiness for industry problems.
Databricks provides this course on Databricks Lakehouse.
In this course, you’ll discover how the Databricks Lakehouse Platform can help you compete in the world of big data and artificial intelligence. In the first half of the course, we’ll introduce you to foundational concepts in big data, explain key roles and abilities to look for when building data teams, and familiarize you with all parts of a complete data landscape. In the second half, we’ll review how the Databricks Lakehouse Platform can help your organization streamline workflows, break down silos, and make the most out of your data.
Please note: This course provides a high-level overview of big data concepts and the Databricks Lakehouse platform. It does not contain hands-on labs or technical deep dives into Databricks functionality.
No programming experience required
No experience with Databricks required
Business leads, executives, analysts, and data scientists rely on up-to-date information to make business decision, adjust to the market, meet needs of their customers or run effective supply chain operations.
Come hear how Asurion used Delta, Structured Streaming, AutoLoader and SQL Analytics to improve production data latency from day-minus-one to near real time Asurion’s technical team will share battle tested tips and tricks you only get with certain scale.
Asurion data lake executes 4000+ streaming jobs and hosts over 4000 tables in production Data Lake on AWS.
The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake.