
Where Should You Put Your Data in Azure?
A frequent question asked is: Where goes what or where should I put my data? With Amy Boyd, Frank (not me) invited different product teams to share what type of data and goes in their service. Let’s meet with Synapse Analytics, Cosmo DB, Azure Data Lake, and Azure Data Explorer product manager. Each one will […]
Read More
Databricks on Databricks: AMA with Data Engineering SMEs
Data engineers and data leaders are the linchpin of every data-driven organization. Today’s data engineers face a number of critical use cases: ensuring the organization has access to clean, reliable data, maintaining governance and security as the organization scales, and providing access to data teams for analysis. Whatch this session for live Q&A with Databricks […]
Read More
Clean Your Data Swamp by Migrating Off of Hadoop
In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along […]
Read More
Large Scale Lakehouse Implementation Using Structured Streaming
Business leads, executives, analysts, and data scientists rely on up-to-date information to make business decision, adjust to the market, meet needs of their customers or run effective supply chain operations. Come hear how Asurion used Delta, Structured Streaming, AutoLoader and SQL Analytics to improve production data latency from day-minus-one to near real time Asurion’s technical […]
Read More
Dave Wentzel on Why You Don’t Need a Data Warehouse
In this episode of Data Driven, Frank and Andy chat with Philadelphia Microsoft Technology Center Data Architect Dave Wentzel on why you do not need a data warehouse. Also, Frank discusses leaving Microsoft, Frank and Andy talk about five seasons of Data Driven, and even BAILeY has a sentimental moment.Show NotesComing Soon. Press the play […]
Read More
What is Delta Lake?
Delta Lake is an open format storage layer that delivers reliability, security and performance on your data lake — for both streaming and batch operations. By replacing data silos with a single home for structured, semi-structured and unstructured data, Delta Lake is the foundation of a cost-effective, highly scalable lakehouse. Learn more: https://databricks.com/product/delta-lake-on-databricks
Read More
Introduction to Databricks Unified Data Platform
Simplify your data lake. Simplify your data architecture. Simplify your data engineering. Powered by Delta Lake, Databricks combines the best of data warehouses and data lakes into a lakehouse architecture, giving you one platform to collaborate on all of your data, analytics and AI workloads.
Read More
How to Design and Implement a Real-time Data Lake with Dynamically Changing Schema
Building a curated data lake on real time data is an emerging data warehouse pattern with delta. However in the real world, what we many times face ourselves with is dynamically changing schemas which pose a big challenge to incorporate without downtimes. In this presentation we will present how we built a robust streaming ETL […]
Read More
How to Use SQL with Delta Lake
Delta Lake is an open-source storage management system (storage layer) that brings ACID transactions and time travel to Apache Spark and big data workloads. The latest and greatest of Delta Lake 0.7.0 requires Apache Spark 3 and among the features is a full coverage of SQL DDL and DML commands.
Read More
Delta Lakehouse Data Profiler and SQL Analytics Demo
Coming from a data warehousing and BI background, Franco Patano wanted to have a catalogue of the Lakehouse, including schema and profiling statistics. He created the Lakehouse Data Profiler notebook using Python and SQL to analyze the data and generate schema and statistics tables. He then uses the new SQL Analytics product from Databricks to […]
Read More