IIT Kharagpur CS60077: Reinforcement Learning

Schedule and Syllabus

This course meets Mondays (from 3:00pm - 4:55pm) and Tuesday (from 3:00pm-3:55pm)
Note: SB = "Reinforcement Learning: An Introduction", Richard S. Sutton and Andrew G. Barto, 2nd Edition Link
Note: PSRPEE = "Probability, Statistics, and Random Processes for Electrical Engineering", 3rd Edition, Alberto Leon-Garcia Link
Note: MLAPP = "Machine Learning: A Probabilistic Perspective", Kevin P. Murphy Link

Event Type	Date	Description	Readings	Course Materials
Lecture 1	Tuesday Sep 01	Introduction Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning.	SB: Chapter 1	[slides (pptx)]
Lecture 2	Saturday Sep 05		SB: Chapter 1	[slides (pptx)]
Lecture 3	Monday Sep 07	Probability Primer Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence.	PSRPEE	[slides (pdf)]
Lecture 4	Tuesday Sep 08		PSRPEE	[slides (pdf)]
Lecture 4	Monday Sep 14	Markov Decision Process Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations.	SB: Chapter 3	[slides (pdf)]
Lecture 5	Tuesday Sep 15
Lecture 6	Monday Sep 21
Lecture 6	Monday Sep 21	Prediction and Control by Dynamic Programing Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions.	SB: Chapter 4	[slides (pdf)]
Lecture 7	Tuesday Sep 22
Lecture 8	Monday Sep 28
Lecture 9	Tuesday Sep 29
Lecture 10	Monday Oct 05	Monte Carlo Methods for Model Free Prediction and Control Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling.	SB: Chapter 5	[slides (pdf)]
Lecture 11	Tuesday Oct 06		SB: Chapter 5	[slides (pdf)]
Lecture 12	Monday Oct 12	TD Methods Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants.	SB: Chapter 6	[slides (pdf)]
Lecture 13	Tuesday Oct 13
Lecture 14	Monday Oct 14
Lecture 15	Thursday Nov 02	Function Approximation Methods Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks.	SB: Chapter 9,10,6	[slides (pdf)]
Lecture 16	Tuesday Nov 03		SB: Chapter 9,10,6	[slides (pdf)]
Lecture 17	Monday Nov 09	Policy Gradients Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods.	DeepRL course (Sergey Levine), OpenAI Spinning Up	[slides (pdf)]
Lecture 18	Tuesday Nov 10		DeepRL course (Sergey Levine), OpenAI Spinning Up	[slides (pdf)]

CS60077: Reinforcement Learning

IIT Kharagpur

Autumn 2020

Schedule and Syllabus