Yannic Kilcher explains this paper that promises to scale transformers to 1 million tokens and beyond. We take a look at the technique behind it: The Recurrent Memory Transformer, and what its strengths and weaknesses are.
#DataScientist, #DataEngineer, Blogger, Vlogger, Podcaster at http://DataDriven.tv .
Back @Microsoft to help customers leverage #AI Opinions mine. #武當派 fan.
I blog to help you become a better data scientist/ML engineer
Opinions are mine. All mine.