Big Data

Data

Dataiku End-to-End Demo

This demo uses a project that predicts flight delays to demonstrate connecting to data, preparing and enriching it, building machine learning models, and operationalizing your work entirely in Dataiku.

Read More
Big Data

Dealing With Big Data – Computerphile

Big Data sounds may be a buzz word, and is hard to quantify, but the problems with large data sets are very real. Dr Isaac Triguero explains some of the challenges.

Read More
AI Startups

The Rise of AI and the Canadian Silicon Valley

Here’s an interesting documentary (“Canada – The Rise of AI”, Ep. 11) on the “Canadian Silicon Valley.” Silicon Valley may be home to some of the biggest tech giants in the world but it’s being challenged like never before. Crazy tech geniuses have popped up all over the planet making things that will blow your […]

Read More
Machine Learning

The Most Important Topic in Machine Learning right now

Karolina Sowinska  discusses the most important topic in Machine Learning right now, namely model explainability. It is one of the hottest discussion points in the data community, because ultimately if we cannot understand how the models arrive at the predictions, it renders them useless in many practical applications.

Read More
Azure Synapse CosmosDB

Overview of Azure Synapse Link featuring CosmosDB

In this video Chris Seferlis gives an overview of Azure Synapse Link, a newer feature of the Synapse Analytics Suite of tools. Find out why this feature is important, the way it moves Operational Data to Analytical Data, and what you can then do with it. More details about the service and some great tutorials […]

Read More
Spark

Unboxing Spark Standalone Architecture

Big Data Engineering closely examines  Spark Standalone Architecture. Apache Spark has a well-defined layered architecture where all the spark components and layers are loosely coupled. This architecture is further integrated with various extensions and libraries. Apache Spark Architecture is based on two main abstractions: Resilient Distributed Dataset (RDD) Directed Acyclic Graph (DAG)

Read More
Azure Synapse

Getting started with Spark Pools in Azure Synapse

Chris Seferlis introduce us to the newly added Apache Spark Pools in Azure Synapse Analytics for Big Data, Machine Learning, and Data Processing needs. From the description: I give an overview of what Spark is, and where it came from; why the Synapse Team added it to the suite of offering, and some sample workloads […]

Read More
Big Data Spark

How to Use SQL with Delta Lake

Delta Lake is an open-source storage management system (storage layer) that brings ACID transactions and time travel to Apache Spark and big data workloads. The latest and greatest of Delta Lake 0.7.0 requires Apache Spark 3 and among the features is a full coverage of SQL DDL and DML commands.

Read More