deep learning tutorial

AI Neural Networks Research

Retentive Network: A Successor to Transformer for Large Language Models (Paper Explained)

This video is from Yannic Kilcher. Retention is an alternative to Attention in Transformers that can both be written in a parallel and in a recurrent fashion. This means the architecture achieves training parallelism while maintaining low-cost inference. Experiments in the paper look very promising. Paper: https://arxiv.org/abs/2307.08621

Read More
AI Deep Learning Large Language Models

Are Retentive Networks A Successor to Transformer for Large Language Models?

Retention is an alternative to Attention in Transformers that can both be written in a parallel and in a recurrent fashion. This means the architecture achieves training parallelism while maintaining low-cost inference. Experiments in the paper look very promising. Yannic Kilcher elaborates.

Read More
Natural Language Processing Research

LLaMA: Open and Efficient Foundation Language Models (Paper Explained)

Large Language Models (LLMs) are all the rage right now. ChatGPT is the LLM everyone talks about, but there are others. With the attention (and money) that OpenAI is getting, expect more of them. LLaMA is a series of large language models from 7B to 65B parameters, trained by Meta AI. They train for longer […]

Read More
AI Generative AI Natural Language Processing

ChatGPT: This AI has a JAILBREAK?!

Yannic explores ChatGPT and discovers that it has a JailBreak?! ChatGPT, OpenAI’s newest model is a GPT-3 variant that has been fine-tuned using Reinforcement Learning from Human Feedback, and it is taking the world by storm!

Read More
AI Mathematics

This is a game changer! (AlphaTensor by DeepMind explained)

Matrix multiplication is the most used mathematical operation in all of science and engineering. Speeding this up has massive consequences. Thus, over the years, this operation has become more and more optimized. A fascinating discovery was made when it was shown that one actually needs less than N^3 multiplication operations to multiply to NxN matrices. […]

Read More
AI Hardware Research

How to make your CPU as fast as a GPU – Advances in Sparsity w/ Nir Shavit

Sparsity is awesome, but only recently has it become possible to properly handle sparse models at good performance. Neural Magic does exactly this, using a plain CPU. No specialized hardware needed, just clever algorithms for pruning and forward-propagation of neural networks. Nir Shavit and I talk about how this is possible, what it means in […]

Read More
Generative AI

[ML News] Stable Diffusion Takes Over! (Open Source AI Art)

Stable Diffusion has been released and is riding a wave of creativity and collaboration. But not everyone is happy about this — especially artists!

Read More