Schedule and Syllabus


This course meets Mondays (from 3:00pm - 4:55pm) and Tuesday (from 3:00pm-3:55pm)
Note: SB = "Reinforcement Learning: An Introduction", Richard S. Sutton and Andrew G. Barto, 2nd Edition Link
Note: PSRPEE = "Probability, Statistics, and Random Processes for Electrical Engineering", 3rd Edition, Alberto Leon-Garcia Link
Note: MLAPP = "Machine Learning: A Probabilistic Perspective", Kevin P. Murphy Link
Lecture 1 Tuesday
Sep 01
Introduction
Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning.
SB: Chapter 1 [slides (pptx)]
Lecture 2 Saturday
Sep 05
Lecture 3 Monday
Sep 07
Probability Primer
Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence.
PSRPEE [slides (pdf)]
Lecture 4 Tuesday
Sep 08
Lecture 4 Monday
Sep 14
Markov Decision Process
Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations.
SB: Chapter 3 [slides (pdf)]
Lecture 5 Tuesday
Sep 15
Lecture 6 Monday
Sep 21
Lecture 6 Monday
Sep 21
Prediction and Control by Dynamic Programing
Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions.
SB: Chapter 4 [slides (pdf)]
Lecture 7 Tuesday
Sep 22
Lecture 8 Monday
Sep 28
Lecture 9 Tuesday
Sep 29
Lecture 10 Monday
Oct 05
Monte Carlo Methods for Model Free Prediction and Control
Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling.
SB: Chapter 5 [slides (pdf)]
Lecture 11 Tuesday
Oct 06
Lecture 12 Monday
Oct 12
TD Methods
Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants.
SB: Chapter 6 [slides (pdf)]
Lecture 13 Tuesday
Oct 13
Lecture 14 Monday
Oct 14
Lecture 15 Thursday
Nov 02
Function Approximation Methods
Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks.
SB: Chapter 9,10,6 [slides (pdf)]
Lecture 16 Tuesday
Nov 03
Lecture 17 Monday
Nov 09
Policy Gradients
Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods.
DeepRL course (Sergey Levine), OpenAI Spinning Up [slides (pdf)]
Lecture 18 Tuesday
Nov 10