CS60077 : Reinforcement Learning Autumn 2022, L-T-P: 3-0-0

Schedule

Instructor     Aritra Hazra
Timings     Thursday (15:00–17:00), Friday (15:00–16:00)
Venue     NC442 (Nalanda Complex)
Teaching Assistants     Ayan Maity   |   Somnath Hazra   |   Sriyash Poddar

Notices and Announcements

August 02, 2022

The first class will be held on 04-August-2022 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...

July 25, 2022

We shall consider requests via ERP from all students till 8:00pm, 02-Aug-2022 and finalize the approvals among those requested students based on their CGPA (only) immediately after 8:00pm (if possible). Please note that, we cannot take all requested students due to seat limitation. Considering the first class to be held on 04-Aug-2022, the declined students are requested to switch over to other courses before the subject registration deadline expires. We shall NOT consider any further request for approval beyond that.

Course Pre-requisites: Probability and Linear Algebra (Basics), Programming Knowledge (preferably Python), Data Structures and Algorithms, Artificial Intelligence, Machine Learning and (Deep) Neural Networks

Syllabus and Coverage

TopicDetailsDateSlides**References
Introduction to RL

The RL Problem, Setup and Course Layout

04-Aug-2022

Lecture-1

Sutton-Barto [1]
(Chapter-1)
Markov Decision Process (MDP)

Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs

05-Aug-2022
11-Aug-2022

Lecture-2

Sutton-Barto [1]
(Chapter-3)
Planning by Dynamic Programming (DP)

Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions and Convergence using Contraction Mapping

12-Aug-2022
25-Aug-2022

Lecture-3

Sutton-Barto [1]
(Chapter-4)
Model-free Prediction

Monte-Carlo (MC) Learning, Temporal-Difference (TD) Learning, TD-Lambda and Eligibility Traces

26-Aug-2022
01-Sep-2022

Lecture-4

Sutton-Barto [1]
(Chapter-5+6)
Model-free Control

On-Policy MC Control, On-Policy TD Learning and Off-Policy Learning

02-Sep-2022
08-Sep-2022

Lecture-5

Sutton-Barto [1]
(Chapter-5+6+7)
Value Function Approximation

Incremental Methods and Batch Methods, Deep Q-Learning, Deep Q-Networks and Experience Replay

09-Sep-2022
10-Sep-2022

Lecture-6

Sutton-Barto [1]
(Chapter-9+10+11+12)
Policy Gradient Methods

Finite-Difference, Monte-Carlo and Actor-Critic Methods

13-Oct-2022

Lecture-7

Sutton-Barto [1]
(Chapter-13)
Integrating Planning with Learning

Model-based RL, Integrated Architecture and Simulation-based Search

14-Oct-2022
20-Oct-2022

Lecture-8

Sutton-Barto [1]
(Chapter-8)
Exploration and Exploitation (Bandits)

Multi-arm Bandits, Contextual Bandits and MDP Extensions

20-Oct-2022
21-Oct-2022

Lecture-9

Sutton-Barto [1]
(Chapter-2)
Integrating AI Search and Learning

Classical Games: Combining Minimax Search and RL

27-Oct-2022

Lecture-10

Sutton-Barto [1]
(Chapter-16)
Hierarchical RL

Semi-Markov Decision Process, Learning with Options, Abstract Machines and MAXQ Decomposition

28-Oct-2022
03-Nov-2022

Lecture-11a
Lecture-11b

Barto-Mahadevan [6]
Dietterich [7]
Deep RL

PPO, DDPG, Double Q-Learning, Advanced Policy Gradients etc.

03-Nov-2022

Lecture-12a
Lecture-12b

Francois-Lavet et al. [8]
Li [9]
Vitay [10]
Multi-Agent RL

Cooperative vs. Competitive Settings, Mixed Setting, Games, MARL Algorithms

04-Nov-2022

Lecture-13a
Lecture-13b

Zhang-Yang-Başar [11]
Yang-Wang [12]
Conclusion

Summary, Open Problems and Path Ahead

10-Nov-2022

Lecture-14

Sutton-Barto [1]
(Chapter-14+15+17)

** Slides Credit:   Dr. David Silver (UCL and Deepmind) and Dr. Abir Das (CSE, IIT Kharagpur)

Books and References

  1. Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020.   [ TEXTBOOK ]

  2. Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
  3. Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
  4. Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
  5. Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol.4, pp. 237-285, 1996.
  6. Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
  7. Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
  8. Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; ArXiv ePrint, 2018.
  9. Yuxi Li; Deep Reinforcement Learning: An Overview; ArXiv ePrint, 2018.
  10. Julien Vitay; Deep Reinforcement Learning, 2020.
  11. Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
  12. Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.

Term-Project

Examinations

 CS60077 : Reinforcement Learning Autumn 2022, L-T-P: 3-0-0