
AI
Large Language Models
Research
Scaling Transformer to 1M tokens and beyond with RMT (Paper Explained)
Yannic Kilcher explains this paper that promises to scale transformers to 1 million tokens and beyond. We take a look at the technique behind it: The Recurrent Memory Transformer, and what its strengths and weaknesses are.
Read More