Data Engineering

Big Data Data Python

How to Scale up your Pandas workflows with Modin

pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs to help data scientists prepare, analyze, and explore their data. However, despite its widespread adoption, pandas suffers from severe memory and performance issues on moderately large datasets. This presentation focuses on Modin, a fast, scalable drop-in […]

Read More

Databricks on Databricks: AMA with Data Engineering SMEs

Data engineers and data leaders are the linchpin of every data-driven organization. Today’s data engineers face a number of critical use cases: ensuring the organization has access to clean, reliable data, maintaining governance and security as the organization scales, and providing access to data teams for analysis. Whatch this session for live Q&A with Databricks […]

Read More
AWS Databricks

Databricks on AWS Cloud Integration Demo

Did you ever wonder how Databricks plays a role in AWS data intergrations? Well, wonder no more. Time stamps: 0:00 Databricks Lakehouse on AWS overview 0:20 Connecting to EC2, S3, Glue, and IAM 0:46 Ingesting Kinesis streams into Delta Lake 1:32 Viewing Delta Lake tables in the Glue console 1:49 Databricks – Redshift integration 2:20 […]

Read More

Dataiku End-to-End Demo

This demo uses a project that predicts flight delays to demonstrate connecting to data, preparing and enriching it, building machine learning models, and operationalizing your work entirely in Dataiku.

Read More
Big Data Data

Code Once Use Often with Declarative Data Pipelines

In this video watch Anthony Awuley, a developer, and Carter Kilgour, a data engineer, explain the value of declarative data pipelines. 

Read More
Big Data Databricks

Tech Talk Series Part Four: Continuous Integration and Continuous Delivery with Delta Lake

Join Databricks for the final in a four part series with Salesforce Engineering.

Read More

Introduction to Databricks Unified Data Platform

Simplify your data lake. Simplify your data architecture. Simplify your data engineering. Powered by Delta Lake, Databricks combines the best of data warehouses and data lakes into a lakehouse architecture, giving you one platform to collaborate on all of your data, analytics and AI workloads.

Read More

Unboxing Spark Standalone Architecture

Big Data Engineering closely examines  Spark Standalone Architecture. Apache Spark has a well-defined layered architecture where all the spark components and layers are loosely coupled. This architecture is further integrated with various extensions and libraries. Apache Spark Architecture is based on two main abstractions: Resilient Distributed Dataset (RDD) Directed Acyclic Graph (DAG)

Read More
Azure Azure Synapse Data Warehouse

Azure Synapse and On-Demand Serverless Compute and Querying

Microsoft Mechanics learns how UK-based data engineering consultant, endjin, is evaluating Azure Synapse for on-demand serverless compute and querying. Endjin specializes in big data analytics solutions for customers across a range of different industries such as ocean research, financial services, and retail industries. Host Jeremy Chapman speaks with Jess Panni, Principal and Data Architect at […]

Read More