If you’re a reader of this blog, then you know that I have mentioned Alpha Go Zero before on a few occasions. However, I think this video by the incomparable Siraj Raval explains it best. Watch this video to get a technical overview of its neural components.

In case you didn’t already know, DeepMind’s AlphaGo Zero algorithm beat the best Go player in the world by training entirely by self-play. It played against itself repeatedly, getting better over time with no human gameplay input. AlphaGo Zero was a remarkable moment in AI history, a moment that will always be remembered.

For the next 10 weeks Siraj Raval  is going to go from the basics to the state of the art of reinforcement learning (RL), a popular subfield of machine learning using video game environments as our testbed.

RL is a huge reason DeepMind and OpenAI have been so successful thus far in creating world changing AI bots.