Coming from a data warehousing and BI background, Franco Patano wanted to have a catalogue of the Lakehouse, including schema and profiling statistics.
He created the Lakehouse Data Profiler notebook using Python and SQL to analyze the data and generate schema and statistics tables. He then uses the new SQL Analytics product from Databricks to dashboard and visualize the data profiling statistics. He discusses how to use these dashboards to optimize JOINs and other operations.
[ Lightning talk from Data + AI Summit 2020]