
Moving Away from Data Flows and Simplifying Data Pipelines in Azure Synapse Analytics
In this session, you will learn how to use Spark SQL and Python to create notebooks which are called from integration pipelines to create an efficient, scalable, maintainable solution to create data migration and transfer tasks.
Read More
Open Data Hub – the origin story (part 2)
- Frank
- July 6, 2022
- AI
- AI on Kubernetes
- Ai on OpenShift
- AI/ML
- artificial intelligence
- Data Science
- DevOps
- Git
- Jupyter
- JupyterHub
- JupyterLab
- Machine Learning
- ML
- ML on Kubernetes
- ML on OpenShift
- MLOps
- ODH
- Open Data Hub
- OpenShift
- OpenShift Data Science
- Python
- PyTorch
- Red Hat
- S2i
- Source-to-image
- Spark
- TensorFlow
In part 2 of the Open Data Hub origin story, fellow Red Hatters Steven Huels and Sherard Griffin describe some of the technical challenges and growth of the Open Data Hub AI meta-project, evolving Elastic Search to multiple data discovery technologies. The evolution to a commercial service offering, Red Hat OpenShift Data Science is also […]
Read More
Open Data Hub – the origin story (part 1)
- Frank
- July 6, 2022
- AI
- AI on Kubernetes
- Ai on OpenShift
- AI/ML
- artificial intelligence
- Data Science
- DevOps
- Git
- Jupyter
- JupyterHub
- JupyterLab
- Machine Learning
- ML
- ML on Kubernetes
- ML on OpenShift
- MLOps
- ODH
- Open Data Hub
- OpenShift
- OpenShift Data Science
- Python
- PyTorch
- Red Hat
- S2i
- Source-to-image
- Spark
- TensorFlow
Fellow Red Hatters Steven Huels and Sherard Griffin describe how the Open Data Hub meta-project grew from solving practical CI/CD build challenges to where it is today – providing an integrated blueprint stitching together over 20 open source AI tools for running large and distributed AI workloads on OpenShift. Part 1 of a 2 part […]
Read More
Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark
- Frank
- April 21, 2022
- Databricks
- PySpark
- Spark
From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this “data downtime” from happening in the first place. Join Prateek Chawla and Ryan Kearns as […]
Read More
Advancing Spark – Engineering behind Featurestore
In this video, Simon takes the same example notebook and looks at applying some engineering best practices, as well as looking at the delta table that sits underneath the featurestore. That way, we can understand the impact of these commands and properly get to grips with using Featurestore in a production environment. In a recent […]
Read More
Data Science and Predictive Analytics with Azure Synapse
Discover new Azure Synapse features to integrate predictive analytics capabilities into your organization—using both code-free and code-first options for AI/ML.
Read More
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves is a chatbot created to simplify data operations management for enterprise Spark clusters. Powered by advanced AI algorithms and an intuitive conversational interface answers to get users in and out of problems quickly. Instead of being stuck to screens displaying logs and metrics, users can now have a more refreshing experience via a two-way […]
Read More
Funnel Analysis with Apache Spark and Druid
- Frank
- August 10, 2021
- Databricks
- Druid
- Spark
Every day, millions of advertising campaigns are happening around the world. As campaign owners, measuring the ongoing campaign effectiveness (e.g “how many distinct users saw my online ad VS how many distinct users saw my online ad, clicked it and purchased my product?”) is super important. However, this task (often referred to as “funnel analysis”) […]
Read More
Is the World Running Out Of Oil?
- Frank
- June 29, 2021
- climate change
- climate change 2021
- colonial pipeline hack
- colonial pipeline leak
- colonial pipeline news
- Documentary
- Education
- engineering
- factual
- fossil fuels
- fuel shortage 2021
- full documentary
- gasoline
- global warming
- how much oil is left
- how to
- Learning
- Mind blown
- oil pipeline
- oil pipeline hack
- oil pipeline shut down
- oil refinery process
- oil reserves documentary
- oil reserves in the world
- science
- science documentary
- Spark
- Technology
- what if oil ran out
Is the world in danger of running out of oil or will a combination of new technologies, advanced analytics, and diversifying energy sources make this a non-issue?
Read More
Advancing Spark – Runtime 8 2 and Advanced Schema Evolution
Another week, another new Databricks Runtime. Runtime 8.2 brings some nice functionality around operational metrics, but the big star of the week is the new Schema Inference & Evolution functionality available through Autoloader. In this video, Simon takes a look through simple schema inference, applying schema hints and watching the schema metadata evolve through the […]
Read More