Schedule and Syllabus


This course meets Thursday (from 3:00pm - 4:55pm) and Friday (from 3:00pm-3:55pm)
Note: SB = "Reinforcement Learning: An Introduction", Richard S. Sutton and Andrew G. Barto, 2nd Edition Link
Note: PSRPEE = "Probability, Statistics, and Random Processes for Electrical Engineering", 3rd Edition, Alberto Leon-Garcia Link
Note: MLAPP = "Machine Learning: A Probabilistic Perspective", Kevin P. Murphy Link
Lecture 1 Thursday
Aug 12
Introduction
Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning.
SB: Chapter 1 [slides (pptx)]
Lecture 2 Friday
Aug 13
Probability Primer
Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence.
PSRPEE [slides (pdf)]
Lecture 3 Friday
Aug 20
Markov Decision Process
Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations.
SB: Chapter 3 [slides (pdf)]
Lecture 4 Thursday
Aug 26
Lecture 5 Friday
Aug 27
Lecture 6 Friday
Sep 03
Lecture 7 Friday
Sep 09
Prediction and Control by Dynamic Programing
Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions.
SB: Chapter 4 [slides (pdf)]
Lecture 8 Thursday
Sep 10
Lecture 9 Friday
Sep 11
Lecture 10 Friday
Sep 16
Lecture 11 Friday
Sep 17
Monte Carlo Methods for Model Free Prediction and Control
Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling.
SB: Chapter 5 [slides (pdf)]
Lecture 12 Thursday
Sep 23
Lecture 13 Friday
Sep 24
TD Methods
Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants.
SB: Chapter 6 [slides (pdf)]
Lecture 14 Thursday
Sep 30
Lecture 15 Friday
Oct 01
Lecture 16 Thursday
Oct 07
Lecture 17 Thursday
Oct 21
Function Approximation Methods
Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks.
SB: Chapter 9,10,6 [slides (pdf)]
Lecture 18 Friday
Oct 22
Lecture 19 Thursday
Oct 28
Policy Gradients
Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods.
DeepRL course (Sergey Levine), OpenAI Spinning Up [slides (pdf)]
Lecture 20 Friday
Oct 30
Lecture 21 Friday
Nov 05