IIT Kharagpur CS60077: Reinforcement Learning

Schedule and Syllabus

This course meets Thursday (from 3:00pm - 4:55pm) and Friday (from 3:00pm-3:55pm)
Note: SB = "Reinforcement Learning: An Introduction", Richard S. Sutton and Andrew G. Barto, 2nd Edition Link
Note: PSRPEE = "Probability, Statistics, and Random Processes for Electrical Engineering", 3rd Edition, Alberto Leon-Garcia Link
Note: MLAPP = "Machine Learning: A Probabilistic Perspective", Kevin P. Murphy Link

Event Type	Date	Description	Readings	Course Materials
Lecture 1	Thursday Aug 12	Introduction Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning.	SB: Chapter 1	[slides (pptx)]
Lecture 2	Friday Aug 13	Probability Primer Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence.	PSRPEE	[slides (pdf)]
Lecture 3	Friday Aug 20	Markov Decision Process Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations.	SB: Chapter 3	[slides (pdf)]
Lecture 4	Thursday Aug 26
Lecture 5	Friday Aug 27
Lecture 6	Friday Sep 03
Lecture 7	Friday Sep 09	Prediction and Control by Dynamic Programing Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions.	SB: Chapter 4	[slides (pdf)]
Lecture 8	Thursday Sep 10
Lecture 9	Friday Sep 11
Lecture 10	Friday Sep 16
Lecture 11	Friday Sep 17	Monte Carlo Methods for Model Free Prediction and Control Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling.	SB: Chapter 5	[slides (pdf)]
Lecture 12	Thursday Sep 23		SB: Chapter 5	[slides (pdf)]
Lecture 13	Friday Sep 24	TD Methods Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants.	SB: Chapter 6	[slides (pdf)]
Lecture 14	Thursday Sep 30
Lecture 15	Friday Oct 01
Lecture 16	Thursday Oct 07
Lecture 17	Thursday Oct 21	Function Approximation Methods Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks.	SB: Chapter 9,10,6	[slides (pdf)]
Lecture 18	Friday Oct 22		SB: Chapter 9,10,6	[slides (pdf)]
Lecture 19	Thursday Oct 28	Policy Gradients Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods.	DeepRL course (Sergey Levine), OpenAI Spinning Up	[slides (pdf)]
Lecture 20	Friday Oct 30
Lecture 21	Friday Nov 05

CS60077: Reinforcement Learning

IIT Kharagpur

Autumn 2021

Schedule and Syllabus