Event Type | Date | Description | Readings | Course Materials |
---|---|---|---|---|
Lecture 1 | Thursday Aug 12 |
Introduction Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning. |
SB: Chapter 1 | [slides (pptx)] |
Lecture 2 | Friday Aug 13 |
Probability Primer Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence. |
PSRPEE | [slides (pdf)] |
Lecture 3 | Friday Aug 20 |
Markov Decision Process Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations. |
SB: Chapter 3 | [slides (pdf)] |
Lecture 4 | Thursday Aug 26 |
|||
Lecture 5 | Friday Aug 27 |
|||
Lecture 6 | Friday Sep 03 |
|||
Lecture 7 | Friday Sep 09 |
Prediction and Control by Dynamic Programing Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions. |
SB: Chapter 4 | [slides (pdf)] |
Lecture 8 | Thursday Sep 10 |
|||
Lecture 9 | Friday Sep 11 |
|||
Lecture 10 | Friday Sep 16 |
|||
Lecture 11 | Friday Sep 17 |
Monte Carlo Methods for Model Free Prediction and Control Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling. |
SB: Chapter 5 | [slides (pdf)] |
Lecture 12 | Thursday Sep 23 |
|||
Lecture 13 | Friday Sep 24 |
TD Methods Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants. |
SB: Chapter 6 | [slides (pdf)] |
Lecture 14 | Thursday Sep 30 |
|||
Lecture 15 | Friday Oct 01 |
|||
Lecture 16 | Thursday Oct 07 |
|||
Lecture 17 | Thursday Oct 21 |
Function Approximation Methods Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks. |
SB: Chapter 9,10,6 | [slides (pdf)] |
Lecture 18 | Friday Oct 22 |
|||
Lecture 19 | Thursday Oct 28 |
Policy Gradients Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods. |
DeepRL course (Sergey Levine), OpenAI Spinning Up | [slides (pdf)] |
Lecture 20 | Friday Oct 30 |
|||
Lecture 21 | Friday Nov 05 |