IIT Kharagpur CS60077: Reinforcement Learning

Schedule and Syllabus

This course meets Thursdays (from 3:00pm - 4:55pm) and Fridays (from 3:00pm-3:55pm), in Classroom 1 of Takshashila building (Second Floor)
Note: SB = "Reinforcement Learning: An Introduction", Richard S. Sutton and Andrew G. Barto, 2nd Edition Link
Note: PSRPEE = "Probability, Statistics, and Random Processes for Electrical Engineering", 3rd Edition, Alberto Leon-Garcia Link
Note: MLAPP = "Machine Learning: A Probabilistic Perspective", Kevin P. Murphy Link

Event Type	Date	Description	Readings	Course Materials
Lecture 1	Thursday Jul 18	Introduction Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning.	SB: Chapter 1	[slides (pptx)]
Lecture 2	Friday Jul 19	Probability Primer Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence.	PSRPEE	[slides (pdf)]
Lecture 3	Thursday Jul 25		PSRPEE	[slides (pdf)]
Lecture 4	Friday Jul 26	Markov Decision Process Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations.	SB: Chapter 3	[slides (pdf)]
Lecture 5	Thursday Aug 01
Lecture 6	Friday Aug 02
Lecture 7	Thursday Aug 08
Lecture 7	Thursday Aug 08	Prediction and Control by Dynamic Programing Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions.	SB: Chapter 4	[slides (pdf)]
Lecture 8	Friday Aug 09
Lecture 9	Thursday Aug 29
Lecture 10	Friday Aug 30
Lecture 11	Thursday Sep 05
Lecture 12	Friday Sep 06	Monte Carlo Methods for Model Free Prediction and Control Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling.	SB: Chapter 5	[slides (pdf)]
Lecture 13	Thursday Sep 12		SB: Chapter 5
Lecture 14	Thursday Sep 26	TD Methods Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants.	SB: Chapter 6	[slides (pdf)]
Lecture 15	Thursday Oct 03
Lecture 16	Thursday Oct 10
Lecture 17	Friday Oct 11
Lecture 18	Thursday Oct 17	Function Approximation Methods Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks.	SB: Chapter 9,10,6	[slides (pdf)]
Lecture 19	Friday Oct 18
Lecture 20	Thursday Oct 24
Lecture 21	Friday Oct 25
Lecture 22	Thursday Oct 31	Policy Gradients Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods.	DeepRL course (Sergey Levine), OpenAI Spinning Up	[slides (pdf)]
Lecture 23	Thursday Nov 07
Lecture 24	Friday Nov 08

CS60077: Reinforcement Learning

IIT Kharagpur

Autumn 2019

Schedule and Syllabus