attention is all you need

AI Natural Language Processing Neural Networks

Illustrated Guide to Transformers Neural Network: A step by step explanation

Transformers are the rage nowadays, but how do they work? This video demystifies the novel neural network architecture with step by step explanation and illustrations on how transformers work. CORRECTIONS: The sine and cosine functions are actually applied to the embedding dimensions and time steps!

Read More
Generative AI

TransGAN: Two Transformers Can Make One Strong GAN

Generative Adversarial Networks (GANs) hold the state-of-the-art when it comes to image generation. However, while the rest of computer vision is slowly taken over by transformers or other attention-based architectures, all working GANs to date contain some form of convolutional layers. This paper changes that and builds TransGAN, the first GAN where both the generator […]

Read More
AI Natural Language Processing

Transformers for Image Recognition at Scale

Yannic Kilcher explains why transformers are ruining convolutions. This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works […]

Read More
AI Research

Explaining the Paper: Hopfield Networks is All You Need

Yannic Kilcher explains the paper “Hopfield Networks is All You Need.” Hopfield Networks are one of the classic models of biological memory networks. This paper generalizes modern Hopfield Networks to continuous states and shows that the corresponding update rule is equal to the attention mechanism used in modern Transformers. It further analyzes a pre-trained BERT […]

Read More