Event Type | Date | Description | Readings | Course Materials |
---|---|---|---|---|
Lecture 1 | Tuesday Sep 01 |
Introduction Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning. |
SB: Chapter 1 | [slides (pptx)] |
Lecture 2 | Saturday Sep 05 |
|||
Lecture 3 | Monday Sep 07 |
Probability Primer Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence. |
PSRPEE | [slides (pdf)] |
Lecture 4 | Tuesday Sep 08 |
|||
Lecture 4 | Monday Sep 14 |
Markov Decision Process Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations. |
SB: Chapter 3 | [slides (pdf)] |
Lecture 5 | Tuesday Sep 15 |
|||
Lecture 6 | Monday Sep 21 |
|||
Lecture 6 | Monday Sep 21 |
Prediction and Control by Dynamic Programing Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions. |
SB: Chapter 4 | [slides (pdf)] |
Lecture 7 | Tuesday Sep 22 |
|||
Lecture 8 | Monday Sep 28 |
|||
Lecture 9 | Tuesday Sep 29 |
|||
Lecture 10 | Monday Oct 05 |
Monte Carlo Methods for Model Free Prediction and Control Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling. |
SB: Chapter 5 | [slides (pdf)] |
Lecture 11 | Tuesday Oct 06 |
|||
Lecture 12 | Monday Oct 12 |
TD Methods Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants. |
SB: Chapter 6 | [slides (pdf)] |
Lecture 13 | Tuesday Oct 13 |
|||
Lecture 14 | Monday Oct 14 |
|||
Lecture 15 | Thursday Nov 02 |
Function Approximation Methods Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks. |
SB: Chapter 9,10,6 | [slides (pdf)] |
Lecture 16 | Tuesday Nov 03 |
|||
Lecture 17 | Monday Nov 09 |
Policy Gradients Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods. |
DeepRL course (Sergey Levine), OpenAI Spinning Up | [slides (pdf)] |
Lecture 18 | Tuesday Nov 10 |