attention

AI Natural Language Processing

AlphaCode Explained: AI Code Generation

AlphaCode is DeepMind’s new massive language model for generating code. It is similar to OpenAI Codex, except for in the paper they provide a bit more analysis. The field of NLP within AI and ML has exploded get a lot more papers all the time. Hopefully this video can help you understand how AlphaCode works […]

Read More
AI Research

Explaining the Paper: Hopfield Networks is All You Need

Yannic Kilcher explains the paper “Hopfield Networks is All You Need.” Hopfield Networks are one of the classic models of biological memory networks. This paper generalizes modern Hopfield Networks to continuous states and shows that the corresponding update rule is equal to the attention mechanism used in modern Transformers. It further analyzes a pre-trained BERT […]

Read More
AI Machine Learning

Object-Centric Learning with Slot Attention

Visual scenes are often comprised of sets of independent objects. Yet, current vision models make no assumptions about the nature of the pictures they look at. Yannic Kilcher explore a paper on object-centric learning. By imposing an objectness prior, this paper a module that is able to recognize permutation-invariant sets of objects from pixels in […]

Read More
AI Natural Language Processing

GPT-3: Language Models are Few-Shot Learners

How far can you go with ONLY language modeling? Can a large enough language model perform NLP task out of the box? OpenAI take on these and other questions by training a transformer that is an order of magnitude larger than anything that has ever been built before and the results are astounding. Yannic Kilcher […]

Read More
AI Natural Language Processing

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Yannic Kilcher investigates BERT and the white paper associated with it https://arxiv.org/abs/1810.04805 Abstract:We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As […]

Read More