CS60077 : Reinforcement Learning Autumn 2022, L-T-P: 3-0-0

Schedule

Instructor     Aritra Hazra
Timings     Thursday (15:00–17:00), Friday (15:00–16:00)
Venue     NC442 (Nalanda Complex)
Teaching Assistants     Ayan Maity   |   Somnath Hazra   |   Sriyash Poddar

Notices and Announcements

August 02, 2022

The first class will be held on 04-August-2022 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...

July 25, 2022

We shall consider requests via ERP from all students till 8:00pm, 02-Aug-2022 and finalize the approvals among those requested students based on their CGPA (only) immediately after 8:00pm (if possible). Please note that, we cannot take all requested students due to seat limitation. Considering the first class to be held on 04-Aug-2022, the declined students are requested to switch over to other courses before the subject registration deadline expires. We shall NOT consider any further request for approval beyond that.

Course Pre-requisites: Probability and Linear Algebra (Basics), Programming Knowledge (preferably Python), Data Structures and Algorithms, Artificial Intelligence, Machine Learning and (Deep) Neural Networks

Syllabus and Coverage

TopicDetailsDateReferences
Introduction to RL

The RL Problem, Setup and Course Layout

04-Aug-2022

Sutton-Barto [1]
(Chapter-1)
Markov Decision Process (MDP)

Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs

05-Aug-2022
11-Aug-2022

Sutton-Barto [1]
(Chapter-3)
Planning by Dynamic Programming (DP)

Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions and Convergence using Contraction Mapping

12-Aug-2022
25-Aug-2022

Sutton-Barto [1]
(Chapter-4)
Model-free Prediction

Monte-Carlo (MC) Learning, Temporal-Difference (TD) Learning, TD-Lambda and Eligibility Traces

26-Aug-2022
01-Sep-2022

Sutton-Barto [1]
(Chapter-5+6)
Model-free Control

On-Policy MC Control, On-Policy TD Learning and Off-Policy Learning

02-Sep-2022
08-Sep-2022

Sutton-Barto [1]
(Chapter-5+6+7)
Value Function Approximation

Incremental Methods and Batch Methods, Deep Q-Learning, Deep Q-Networks and Experience Replay

09-Sep-2022
10-Sep-2022

Sutton-Barto [1]
(Chapter-9+10+11+12)
Policy Gradient Methods

Finite-Difference, Monte-Carlo and Actor-Critic Methods

13-Oct-2022

Sutton-Barto [1]
(Chapter-13)
Integrating Planning with Learning

Model-based RL, Integrated Architecture and Simulation-based Search

14-Oct-2022
20-Oct-2022

Sutton-Barto [1]
(Chapter-8)
Exploration and Exploitation (Bandits)

Multi-arm Bandits, Contextual Bandits and MDP Extensions

20-Oct-2022
21-Oct-2022

Sutton-Barto [1]
(Chapter-2)
Integrating AI Search and Learning

Classical Games: Combining Minimax Search and RL

27-Oct-2022

Sutton-Barto [1]
(Chapter-16)
Hierarchical RL

Semi-Markov Decision Process, Learning with Options, Abstract Machines and MAXQ Decomposition

28-Oct-2022
03-Nov-2022

Barto-Mahadevan [6]
Dietterich [7]
Deep RL

PPO, DDPG, Double Q-Learning, Advanced Policy Gradients etc.

03-Nov-2022

Francois-Lavet et al. [8]
Li [9]
Vitay [10]
Multi-Agent RL

Cooperative vs. Competitive Settings, Mixed Setting, Games, MARL Algorithms

04-Nov-2022

Zhang-Yang-Başar [11]
Yang-Wang [12]
Conclusion

Summary, Open Problems and Path Ahead

10-Nov-2022

Sutton-Barto [1]
(Chapter-14+15+17)

** For Reference Slides/Materials, Visit the following Course Pages:

  • Course by Dr. David Silver (Deepmind and UCL, UK)
  • Course by Dr. Abit Das (IIT Kharagpur, India)
  • Course by Dr. Emma Brunskill (Stanford, USA)

  • Books and References

    1. Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020.   [ TEXTBOOK ]

    2. Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
    3. Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
    4. Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
    5. Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol.4, pp. 237-285, 1996.
    6. Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
    7. Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
    8. Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; ArXiv ePrint, 2018.
    9. Yuxi Li; Deep Reinforcement Learning: An Overview; ArXiv ePrint, 2018.
    10. Julien Vitay; Deep Reinforcement Learning, 2020.
    11. Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
    12. Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.

    Term-Project

    Examinations

     CS60077 : Reinforcement Learning Autumn 2022, L-T-P: 3-0-0