Event Type | Date | Description | Readings | Course Materials |
---|---|---|---|---|
Lecture 1 | Thursday Jul 18 |
Introduction Course logistics and overview. Origin and history of Reinforcement Learning research. Its connections with other related fields and with different branches of machine learning. |
SB: Chapter 1 | [slides (pptx)] |
Lecture 2 | Friday Jul 19 |
Probability Primer Brush up of Probability concepts - Axioms of probability, concepts of random variables, PMF, PDFs, CDFs, Expectation. Concepts of joint and multiple random variables, joint, conditional and marginal distributions. Correlation and independence. |
PSRPEE | [slides (pdf)] |
Lecture 3 | Thursday Jul 25 |
|||
Lecture 4 | Friday Jul 26 |
Markov Decision Process Introduction to RL terminology, Markov property, Markov chains, Markov reward process (MRP). Introduction to and proof of Bellman equations for MRPs along with proof of existence of solution to Bellman equations in MRP. Introduction to Markov decision process (MDP), state and action value functions, Bellman expectation equations, optimality of value functions and policies, Bellman optimality equations. |
SB: Chapter 3 | [slides (pdf)] |
Lecture 5 | Thursday Aug 01 |
|||
Lecture 6 | Friday Aug 02 |
|||
Lecture 7 | Thursday Aug 08 |
|||
Lecture 7 | Thursday Aug 08 |
Prediction and Control by Dynamic Programing Overiew of dynamic programing for MDP, definition and formulation of planning in MDPs, principle of optimality, iterative policy evaluation, policy iteration, value iteration, Banach fixed point theorem, proof of contraction mapping property of Bellman expectation and optimality operators, proof of convergence of policy evalutation and value iteration algorithms, DP extensions. |
SB: Chapter 4 | [slides (pdf)] |
Lecture 8 | Friday Aug 09 |
|||
Lecture 9 | Thursday Aug 29 |
|||
Lecture 10 | Friday Aug 30 |
|||
Lecture 11 | Thursday Sep 05 |
|||
Lecture 12 | Friday Sep 06 |
Monte Carlo Methods for Model Free Prediction and Control Overiew of Monte Carlo methods for model free RL, First visit and every visit Monte Carlo, Monte Carlo control, On policy and off policy learning, Importance sampling. |
SB: Chapter 5 | [slides (pdf)] |
Lecture 13 | Thursday Sep 12 |
|||
Lecture 14 | Thursday Sep 26 |
TD Methods Incremental Monte Carlo Methods for Model Free Prediction, Overview TD(0), TD(1) and TD(λ), k-step estimators, unified view of DP, MC and TD evaluation methods, TD Control methods - SARSA, Q-Learning and their variants. |
SB: Chapter 6 | [slides (pdf)] |
Lecture 15 | Thursday Oct 03 |
|||
Lecture 16 | Thursday Oct 10 |
|||
Lecture 17 | Friday Oct 11 |
|||
Lecture 18 | Thursday Oct 17 |
Function Approximation Methods Getting started with the function approximation methods, Revisiting risk minimization, gradient descent from Machine Learning, Gradient MC and Semi-gradient TD(0) algorithms, Eligibility trace for function approximation, Afterstates, Control with function approximation, Least squares, Experience replay in deep Q-Networks. |
SB: Chapter 9,10,6 | [slides (pdf)] |
Lecture 19 | Friday Oct 18 |
|||
Lecture 20 | Thursday Oct 24 |
|||
Lecture 21 | Friday Oct 25 |
|||
Lecture 22 | Thursday Oct 31 |
Policy Gradients Getting started with policy gradient methods, Log-derivative trick, Naive REINFORCE algorithm, bias and variance in Reinforcement Learning, Reducing variance in policy gradient estimates, baselines, advantage function, actor-critic methods. |
DeepRL course (Sergey Levine), OpenAI Spinning Up | [slides (pdf)] |
Lecture 23 | Thursday Nov 07 |
|||
Lecture 24 | Friday Nov 08 |