CS60077 : Reinforcement Learning Autumn 2022, L-T-P: 3-0-0


Instructor     Aritra Hazra
Timings     Thursday (15:00–17:00), Friday (15:00–16:00)
Venue     NC442 (Nalanda Complex)
Teaching Assistants     Ayan Maity   |   Somnath Hazra   |   Sriyash Poddar

Notices and Announcements

August 02, 2022

The first class will be held on 04-August-2022 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...

July 25, 2022

We shall consider requests via ERP from all students till 8:00pm, 02-Aug-2022 and finalize the approvals among those requested students based on their CGPA (only) immediately after 8:00pm (if possible). Please note that, we cannot take all requested students due to seat limitation. Considering the first class to be held on 04-Aug-2022, the declined students are requested to switch over to other courses before the subject registration deadline expires. We shall NOT consider any further request for approval beyond that.

Course Pre-requisites: Probability and Linear Algebra (Basics), Programming Knowledge (preferably Python), Data Structures and Algorithms, Artificial Intelligence, Machine Learning and (Deep) Neural Networks

Syllabus and Coverage

Introduction to RL

The RL Problem, Setup and Course Layout


Sutton-Barto [1]
Markov Decision Process (MDP)

Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs


Sutton-Barto [1]
Planning by Dynamic Programming (DP)

Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions and Convergence using Contraction Mapping


Sutton-Barto [1]
Model-free Prediction

Monte-Carlo (MC) Learning, Temporal-Difference (TD) Learning, TD-Lambda and Eligibility Traces


Sutton-Barto [1]
Model-free Control

On-Policy MC Control, On-Policy TD Learning and Off-Policy Learning


Sutton-Barto [1]
Value Function Approximation

Incremental Methods and Batch Methods, Deep Q-Learning, Deep Q-Networks and Experience Replay


Sutton-Barto [1]
Policy Gradient Methods

Finite-Difference, Monte-Carlo and Actor-Critic Methods


Sutton-Barto [1]
Integrating Planning with Learning

Model-based RL, Integrated Architecture and Simulation-based Search


Sutton-Barto [1]
Exploration and Exploitation (Bandits)

Multi-arm Bandits, Contextual Bandits and MDP Extensions


Sutton-Barto [1]
Integrating AI Search and Learning

Classical Games: Combining Minimax Search and RL


Sutton-Barto [1]
Hierarchical RL

Semi-Markov Decision Process, Learning with Options, Abstract Machines and MAXQ Decomposition


Barto-Mahadevan [6]
Dietterich [7]
Deep RL

PPO, DDPG, Double Q-Learning, Advanced Policy Gradients etc.


Francois-Lavet et al. [8]
Li [9]
Vitay [10]
Multi-Agent RL

Cooperative vs. Competitive Settings, Mixed Setting, Games, MARL Algorithms


Zhang-Yang-Başar [11]
Yang-Wang [12]

Summary, Open Problems and Path Ahead


Sutton-Barto [1]

** For Reference Slides/Materials, Visit the following Course Pages:

  • Course by Dr. David Silver (Deepmind and UCL, UK)
  • Course by Dr. Abit Das (IIT Kharagpur, India)
  • Course by Dr. Emma Brunskill (Stanford, USA)

  • Books and References

    1. Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020.   [ TEXTBOOK ]

    2. Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
    3. Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
    4. Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
    5. Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol.4, pp. 237-285, 1996.
    6. Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
    7. Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
    8. Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; ArXiv ePrint, 2018.
    9. Yuxi Li; Deep Reinforcement Learning: An Overview; ArXiv ePrint, 2018.
    10. Julien Vitay; Deep Reinforcement Learning, 2020.
    11. Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
    12. Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.



     CS60077 : Reinforcement Learning Autumn 2022, L-T-P: 3-0-0