Schedule and Syllabus


This course meets Thursdays (from 3:00pm - 4:55pm) and Fridays (from 3:00pm-3:55pm), in Classroom 1 of Takshashila building (Second Floor)
Note: SB = "Reinforcement Learning: An Introduction", Richard S. Sutton and Andrew G. Barto, 2nd Edition Link
Note: PSRPEE = "Probability, Statistics, and Random Processes for Electrical Engineering", 3rd Edition, Alberto Leon-Garcia Link
Note: MLAPP = "Machine Learning: A Probabilistic Perspective", Kevin P. Murphy Link
Lecture 1 Thursday
Jul 18
Introduction
Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning.
SB: Chapter 1 [slides (pptx)]
Lecture 2 Friday
Jul 19
Probability Primer
Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence.
PSRPEE [slides (pdf)]
Lecture 3 Thursday
Jul 25
Lecture 4 Friday
Jul 26
Markov Decision Process
Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations.
SB: Chapter 3 [slides (pdf)]
Lecture 5 Thursday
Aug 01
Lecture 6 Friday
Aug 02
Lecture 7 Thursday
Aug 08
Lecture 7 Thursday
Aug 08
Prediction and Control by Dynamic Programing
Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions.
SB: Chapter 4 [slides (pdf)]
Lecture 8 Friday
Aug 09
Lecture 9 Thursday
Aug 29
Lecture 10 Friday
Aug 30
Lecture 11 Thursday
Sep 05
Lecture 12 Friday
Sep 06
Monte Carlo Methods for Model Free Prediction and Control
Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling.
SB: Chapter 5 [slides (pdf)]
Lecture 13 Thursday
Sep 12
Lecture 14 Thursday
Sep 26
TD Methods
Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants.
SB: Chapter 6 [slides (pdf)]
Lecture 15 Thursday
Oct 03
Lecture 16 Thursday
Oct 10
Lecture 17 Friday
Oct 11
Lecture 18 Thursday
Oct 17
Function Approximation Methods
Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks.
SB: Chapter 9,10,6 [slides (pdf)]
Lecture 19 Friday
Oct 18
Lecture 20 Thursday
Oct 24
Lecture 21 Friday
Oct 25
Lecture 22 Thursday
Oct 31
Policy Gradients
Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods.
DeepRL course (Sergey Levine), OpenAI Spinning Up [slides (pdf)]
Lecture 23 Thursday
Nov 07
Lecture 24 Friday
Nov 08