Scaling Transformer to 1M tokens and beyond with RMT (Paper Explained)

Yannic Kilcher explains this paper that promises to scale transformers to 1 million tokens and beyond. We take a look at the technique behind it: The Recurrent Memory Transformer, and what its strengths and weaknesses are.

MIT Introduction to Deep Learning 6.S191: Lecture 6 with Ava Soleimany. Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!! Lecture Outline 0:00 – Introduction 0:58 – Course logistics 3:59 – Upcoming guest lectures 5:35 – Deep learning and expressivity […]

Efficient Computing for Deep Learning, Robotics, and AI

Lex Fridman shared this lecture by Vivienne Sze in January 2020 as part of the MIT Deep Learning Lecture Series. Website: Slides: Playlist: LECTURE LINKS: Twitter: YouTube: MIT professional course: NeurIPS 2019 tutorial: Tutorial and survey paper: Book coming out in Spring 2020! OUTLINE: 0:00 – Introduction […]

