In case you didn’t know, I write a monthly column for MSDN Magazine on AI called “Artificially Intelligent”

In the last two articles, I covered one of the most exciting topics in AI in these days: reinforcement learning

Here’s a snippet and link to the full articles on MSDN.

Introduction to Reinforcement Learning

In previous articles, I’ve mentioned both supervised learning and unsupervised learning algorithms. Beyond these two methods of machine learning lays another type: Reinforcement Learning (RL). Formally defined, RL is a computational approach to goal-oriented learning through interaction with the environment under ideal learning conditions.

Like other aspects of AI, many of the algorithms and approaches actively used today trace their origins back to the 1980s ( With the advent of inexpensive storage and on-demand compute power, reinforcement learning techniques have re-emerged.

Read more

A Closer Look at Reinforcement Learning

In last month’s column, I explored a few basic concepts of reinforcement learning (RL), trying both a strictly random approach to navigating a simple environment and then implementing a Q-Table to remember both past actions and which actions led to which rewards. In the demo, an agent working randomly was able to reach the goal state approximately 1 percent of the time and roughly half the time when using a Q-Table to remember previous actions. However, this experiment only scratched the surface of the promising and expanding field of RL.

Read more

By leveraging powerful prior knowledge about how the world works, humans can quickly figure out efficient strategies in new and unseen environments.

Currently, even state-of-the-art Reinforcement Learning algorithms typically don’t have strong priors and this is one of the fundamental challenges in current research on Transfer Learning.

Related Links

If you’re a reader of this blog, then you know that I have mentioned Alpha Go Zero before on a few occasions. However, I think this video by the incomparable Siraj Raval explains it best. Watch this video to get a technical overview of its neural components.

In case you didn’t already know, DeepMind’s AlphaGo Zero algorithm beat the best Go player in the world by training entirely by self-play. It played against itself repeatedly, getting better over time with no human gameplay input. AlphaGo Zero was a remarkable moment in AI history, a moment that will always be remembered.