
Transformers are the rage nowadays, but how do they work?
This video demystifies the novel neural network architecture with step by step explanation and illustrations on how transformers work.
CORRECTIONS:
The sine and cosine functions are actually applied to the embedding dimensions and time steps!