Big Data

Big Data Video Production

VFX Artist Reveals the TRUE Scale of Data!

Two of my favorite things in one video: VFX and Data Visualization. Wren uses VFX to explain the progression of Data storage over the ages as well as all of the Data that is stored on the entire planet.

Read More
Big Data Data IoT

Kafka Streams 101: Getting Started

To understand Kafka Streams, you have to begin with Apache Kafka®, a distributed, scalable, elastic, and fault-tolerant event streaming platform. The storage nodes in Kafka, brokers, are just instances of the Kafka storage layer process running on your laptop or server. At the heart of each broker is a log, an append-only file that holds […]

Read More
Big Data Data

Lessons Learned from Deidentifying 700 Million Patient Notes

Providence embarked on an ambitious journey to de-identify all our clinical electronic medical record (EMR) data to support medical research and the development of novel treatments. This talk shares how this was done for patient notes and how you can achieve the same. First, we built a deidentification pipeline using pre-trained deep learning models, fine-tuned […]

Read More
AWS Big Data Data

DynamoDB design patterns with GraphQL APIs and AppSync

Organizations building modern applications are increasingly turning to serverless API architectures powered by GraphQL and NoSQL databases to increase development velocity and decrease operational overhead. This talk reviews key GraphQL and NoSQL concepts, how to use AWS AppSync and Amazon DynamoDB to create a serverless GraphQL API, and discuss the pros and cons of two […]

Read More
AWS Big Data Data

AWS Innovate 2022 – Data Edition

Organizations are managing more data than ever before and there’s no sign of that slowing down. At AWS Innovate – Data Edition, learn from AWS experts on how a modern data strategy can support your present and future use cases including steps to build an end-to-end data solution to store and access, analyze and visualize, […]

Read More
Big Data Data Red Hat

Red Hat Openshift Database Access (RHODA) Integration with Jupyter Notebook

Integrating Jupyter notebooks with OpenShift cloud services allows data scientists to get started quickly, without having to understand or manage any of the OpenShift infrastructure details. This video demonstrates the service binding between a Jupyter notebook and the Red Hat OpenShift Database Access connection to database cloud services. This allows a notebook user to access […]

Read More
Big Data Data

What is a serverless database? (in under 3 minutes)

Serverless architecture has revolutionized application development, but the database has been lagging behind. So what is a serverless database – what does it look like when we apply the principles of serverless development to the database? Time stamps: 0:00 What is a serverless database? 0:40 Elements of a serverless database 0:43 Automated elastic scale 1:16 […]

Read More
Big Data Research

Analyzing the Witcher’s Network | Relationship Extraction & Network Analysis with Spacy & NetworkX

In this video a data scientist attempts to extract relationships between The Witcher’s characters from the books. We also did some graph analyses (centrality measures and community detection) and visualization using NetworkX and Pyvis. The analyses are still quite preliminary and I might have glossed over some details, but hopefully, this is a bit interesting […]

Read More
Big Data Data Python

How to Scale up your Pandas workflows with Modin

pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs to help data scientists prepare, analyze, and explore their data. However, despite its widespread adoption, pandas suffers from severe memory and performance issues on moderately large datasets. This presentation focuses on Modin, a fast, scalable drop-in […]

Read More