Microsoft Research  recently had Warren Powell speak on Sequential Decision Analytics: A unified framework

Warren Powell is Professor Emeritus at Princeton University and Chief Analytics Officer of Optimal Dynamics. He’s also Founder and Director of Castle Labs at Princeton which manages over 70 grants and contracts with government agencies and leading companies working to develop models of algorithms, freight logistics, energy systems, and other industries.

He’s created a new field called sequential decision analytics which he covers in this talk and in his new book: Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions.

Reinforcement learning is one of the most exciting collections of techniques for building self-learning systems.

Over the past five years, we’ve seen RL successfully meet such challenges as exceeding human performance on popular video games and board games.

Despite the excitement around the success of these RL agents, it has remained extraordinarily difficult for most people to get to build their own RL agents.

In this webinar, Microsoft researcher Kristian Holsheimer guides you through the landscape of RL agents and shows you how to build your own custom agent in just a few lines of code and without breaking a sweat.

Steve Brunton explains reinforcement learning.

Reinforcement learning is a powerful technique at the intersection of machine learning and control theory, and it is inspired by how biological systems learn to interact with their environment. In this video, we provide a high level overview of reinforcement learning, along with leading algorithms and impressive applications.

MSR’s New York City lab is home to some of the best reinforcement learning research on the planet but if you ask any of the researchers, they’ll tell you they’re very interested in getting it out of the lab and into the real world.

One of those researchers is Dr. Akshay Krishnamurthy and today, he explains how his work on feedback-driven data collection and provably efficient reinforcement learning algorithms is helping to move the RL needle in the real-world direction.

Are you curious how data scientists and researchers train agents that make decisions? 

Learn how to use reinforcement learning to optimize decision making using Azure Machine Learning.  We show you how to get started.

Time Index:

  • [00:36] – What is reinforcement learning?
  • [01:37] – How do reinforcement learning algorithms work?
  • [04:10] – Reinforcement Learning on Azure – Notebook sample
  • [05:17] – Reinforcement Learning Estimator
  • [07:21] – Sample training Python script
  • [09:06] – Training Result
  • [10:15] – What kind of problems can you solve with reinforcement learning?

Learn More:

The AI Show’s Favorite links:

Reinforcement Learning (RL) uses a “trial and error” method and interacts with the environment to learn an optimal policy for gaining maximum rewards by making the right decisions.

It is one of the most popular machine learning techniques among organizations to develop solutions like recommender systems, healthcare, robotics, and many more.

Analytics India Magazine has compiled a list of the top 10 free resources to learn RL.

Reinforcement learning is one of the most popular machine learning techniques among organisations to develop solutions like recommendation systems, healthcare, robotics, transportations, among others. This learning technique follows the “trial and error” method and interacts with the environment to learn an optimal policy for gaining maximum rewards by making […]

Lex Fridman interviews David Silver for the Artificial Intelligence podcast..

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning.

Time Index:

  • 0:00 – Introduction
  • 4:09 – First program
  • 11:11 – AlphaGo
  • 21:42 – Rule of the game of Go
  • 25:37 – Reinforcement learning: personal journey
  • 30:15 – What is reinforcement learning?
  • 43:51 – AlphaGo (continued)
  • 53:40 – Supervised learning and self play in AlphaGo
  • 1:06:12 – Lee Sedol retirement from Go play
  • 1:08:57 – Garry Kasparov
  • 1:14:10 – Alpha Zero and self play
  • 1:31:29 – Creativity in AlphaZero
  • 1:35:21 – AlphaZero applications
  • 1:37:59 – Reward functions
  • 1:40:51 – Meaning of life

Machine Learning with Phil explores reinforcement learning with SARSA in this video.

While Q learning is a powerful algorithm, SARSA is equally powerful for many environments in the open AI gym. In this complete reinforcement learning tutorial, I’ll show you how to code an n Step SARSA agent from scratch.

n Step temporal difference learning is a sort of unifying theory of reinforcement learning that bridges the gap between Monte Carlo methods and temporal difference learning. We extend the agent’s horizon from a single step to n steps, and in the limit that n goes to the episode length we end up with Monte Carlo methods. For n = 1 we have vanilla temporal difference learning.

We’ll implement the n step SARSA algorithm directly from Sutton and Barto’s excellent reinforcement learning textbook, and use it to balance the cartpole from the Open AI gym 

Machine Learning with Phil has a great tutorial on how to do Deep Q Learning in PyTorch.

The PyTorch deep learning framework makes coding a deep q learning agent in python easier than ever. We’re going to code up the simplest possible deep Q learning agent, and show that we only need a replay memory to get some serious results in the Lunar Lander environment from the Open AI Gym. We don’t really need the target network, though it has been known to help the deep Q learning algorithm with convergence.