attention mechanism

AI Research

GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton’s Paper Explained)

Yannic Kilcher covers a paper where Geoffrey Hinton describes GLOM, a Computer Vision model that combines transformers, neural fields, contrastive learning, capsule networks, denoising autoencoders and RNNs. GLOM decomposes an image into a parse tree of objects and their parts. However, unlike previous systems, the parse tree is constructed dynamically and differently for each input, […]

Read More
Generative AI

TransGAN: Two Transformers Can Make One Strong GAN

Generative Adversarial Networks (GANs) hold the state-of-the-art when it comes to image generation. However, while the rest of computer vision is slowly taken over by transformers or other attention-based architectures, all working GANs to date contain some form of convolutional layers. This paper changes that and builds TransGAN, the first GAN where both the generator […]

Read More
AI Natural Language Processing

Transformers for Image Recognition at Scale

Yannic Kilcher explains why transformers are ruining convolutions. This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works […]

Read More
AI Machine Learning

Object-Centric Learning with Slot Attention

Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the pictures they look at. Yannic Kilcher explore a paper on object-centric learning. By imposing an objectness prior, this paper a module that is able to recognize permutation-invariant sets of objects from pixels in […]

Read More