iclr

AI Natural Language Processing

Transformers for Image Recognition at Scale

Yannic Kilcher explains why transformers are ruining convolutions. This paper, under review at ICLR, shows that given enough data, a standard Transformer can outperform Convolutional Neural Networks in image recognition tasks, which are classically tasks where CNNs excel. In this Video, I explain the architecture of the Vision Transformer (ViT), the reason why it works […]

Read More