CS60077 : Reinforcement Learning

CS60077 : Reinforcement Learning

Autumn 2022, L-T-P: 3-0-0

Schedule

Instructor     Aritra Hazra
Timings     Thursday (15:00–17:00), Friday (15:00–16:00)
Venue     NC442 (Nalanda Complex)
Teaching Assistants     Ayan Maity | Somnath Hazra | Sriyash Poddar

Notices and Announcements

August 02, 2022

The first class will be held on 04-August-2022 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...

July 25, 2022

We shall consider requests via ERP from all students till 8:00pm, 02-Aug-2022 and finalize the approvals among those requested students based on their CGPA (only) immediately after 8:00pm (if possible). Please note that, we cannot take all requested students due to seat limitation. Considering the first class to be held on 04-Aug-2022, the declined students are requested to switch over to other courses before the subject registration deadline expires. We shall NOT consider any further request for approval beyond that.

Course Pre-requisites: Probability and Linear Algebra (Basics), Programming Knowledge (preferably Python), Data Structures and Algorithms, Artificial Intelligence, Machine Learning and (Deep) Neural Networks

Syllabus and Coverage

Topic Details Date References
Introduction to RL
The RL Problem, Setup and Course Layout
04-Aug-2022
Sutton-Barto [1]
(Chapter-1)

Markov Decision Process (MDP)
Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs
05-Aug-2022
11-Aug-2022
Sutton-Barto [1]
(Chapter-3)

Planning by Dynamic Programming (DP)
Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions and Convergence using Contraction Mapping
12-Aug-2022
25-Aug-2022
Sutton-Barto [1]
(Chapter-4)

Model-free Prediction
Monte-Carlo (MC) Learning, Temporal-Difference (TD) Learning, TD-Lambda and Eligibility Traces
26-Aug-2022
01-Sep-2022
Sutton-Barto [1]
(Chapter-5+6)

Model-free Control
On-Policy MC Control, On-Policy TD Learning and Off-Policy Learning
02-Sep-2022
08-Sep-2022
Sutton-Barto [1]
(Chapter-5+6+7)

Value Function Approximation
Incremental Methods and Batch Methods, Deep Q-Learning, Deep Q-Networks and Experience Replay
09-Sep-2022
10-Sep-2022
Sutton-Barto [1]
(Chapter-9+10+11+12)

Policy Gradient Methods
Finite-Difference, Monte-Carlo and Actor-Critic Methods
13-Oct-2022
Sutton-Barto [1]
(Chapter-13)

Integrating Planning with Learning
Model-based RL, Integrated Architecture and Simulation-based Search
14-Oct-2022
20-Oct-2022
Sutton-Barto [1]
(Chapter-8)

Exploration and Exploitation (Bandits)
Multi-arm Bandits, Contextual Bandits and MDP Extensions
20-Oct-2022
21-Oct-2022
Sutton-Barto [1]
(Chapter-2)

Integrating AI Search and Learning
Classical Games: Combining Minimax Search and RL
27-Oct-2022
Sutton-Barto [1]
(Chapter-16)

Hierarchical RL
Semi-Markov Decision Process, Learning with Options, Abstract Machines and MAXQ Decomposition
28-Oct-2022
03-Nov-2022
Barto-Mahadevan [6]
Dietterich [7]

Deep RL
PPO, DDPG, Double Q-Learning, Advanced Policy Gradients etc.
03-Nov-2022
Francois-Lavet et al. [8]
Li [9]
Vitay [10]

Multi-Agent RL
Cooperative vs. Competitive Settings, Mixed Setting, Games, MARL Algorithms
04-Nov-2022
Zhang-Yang-Başar [11]
Yang-Wang [12]

Conclusion
Summary, Open Problems and Path Ahead
10-Nov-2022
Sutton-Barto [1]
(Chapter-14+15+17)

** For Reference Slides/Materials, Visit the following Course Pages:
Course by Dr. David Silver (Deepmind and UCL, UK)

Course by Dr. Abit Das (IIT Kharagpur, India)

Course by Dr. Emma Brunskill (Stanford, USA)

Books and References

Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020. [ TEXTBOOK ]

Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol.4, pp. 237-285, 1996.
Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; ArXiv ePrint, 2018.
Yuxi Li; Deep Reinforcement Learning: An Overview; ArXiv ePrint, 2018.
Julien Vitay; Deep Reinforcement Learning, 2020.
Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.

Term-Project

Phase-I  (Duration: 22-Aug-2022 – 23-Sep-2022, Marks: 30)

Phase-II  (Duration: 10-Oct-2022 – 11-Nov-2022, Marks: 30)

Examinations

Mid-Semester  [ Questions | Solutions ]  (29-Sep-2022, Thursday, 09:00-11:00 | Room: CSE-107)  [ Syllabus: Upto Value Function Approximation ]
End-Semester  [ Questions | Solutions ]  (25-Nov-2022, Friday, 14:00-17:00 | Room: CSE-107)  [ Syllabus: Full ]

Topic	Details	Date	References
Introduction to RL	The RL Problem, Setup and Course Layout	04-Aug-2022	Sutton-Barto [1] (Chapter-1)
Markov Decision Process (MDP)	Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs	05-Aug-2022 11-Aug-2022	Sutton-Barto [1] (Chapter-3)
Planning by Dynamic Programming (DP)	Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions and Convergence using Contraction Mapping	12-Aug-2022 25-Aug-2022	Sutton-Barto [1] (Chapter-4)
Model-free Prediction	Monte-Carlo (MC) Learning, Temporal-Difference (TD) Learning, TD-Lambda and Eligibility Traces	26-Aug-2022 01-Sep-2022	Sutton-Barto [1] (Chapter-5+6)
Model-free Control	On-Policy MC Control, On-Policy TD Learning and Off-Policy Learning	02-Sep-2022 08-Sep-2022	Sutton-Barto [1] (Chapter-5+6+7)
Value Function Approximation	Incremental Methods and Batch Methods, Deep Q-Learning, Deep Q-Networks and Experience Replay	09-Sep-2022 10-Sep-2022	Sutton-Barto [1] (Chapter-9+10+11+12)
Policy Gradient Methods	Finite-Difference, Monte-Carlo and Actor-Critic Methods	13-Oct-2022	Sutton-Barto [1] (Chapter-13)
Integrating Planning with Learning	Model-based RL, Integrated Architecture and Simulation-based Search	14-Oct-2022 20-Oct-2022	Sutton-Barto [1] (Chapter-8)
Exploration and Exploitation (Bandits)	Multi-arm Bandits, Contextual Bandits and MDP Extensions	20-Oct-2022 21-Oct-2022	Sutton-Barto [1] (Chapter-2)
Integrating AI Search and Learning	Classical Games: Combining Minimax Search and RL	27-Oct-2022	Sutton-Barto [1] (Chapter-16)
Hierarchical RL	Semi-Markov Decision Process, Learning with Options, Abstract Machines and MAXQ Decomposition	28-Oct-2022 03-Nov-2022	Barto-Mahadevan [6] Dietterich [7]
Deep RL	PPO, DDPG, Double Q-Learning, Advanced Policy Gradients etc.	03-Nov-2022	Francois-Lavet et al. [8] Li [9] Vitay [10]
Multi-Agent RL	Cooperative vs. Competitive Settings, Mixed Setting, Games, MARL Algorithms	04-Nov-2022	Zhang-Yang-Başar [11] Yang-Wang [12]
Conclusion	Summary, Open Problems and Path Ahead	10-Nov-2022	Sutton-Barto [1] (Chapter-14+15+17)

CS60077 : Reinforcement Learning

Autumn 2022, L-T-P: 3-0-0

Instructor		Aritra Hazra
Timings		Thursday (15:00–17:00), Friday (15:00–16:00)
Venue		NC442 (Nalanda Complex)
Teaching Assistants		Ayan Maity \| Somnath Hazra \| Sriyash Poddar