CS60077 : Reinforcement Learning | Autumn 2022, L-T-P: 3-0-0 |
Schedule
Instructor Aritra Hazra Timings Thursday (15:00–17:00), Friday (15:00–16:00) Venue NC442 (Nalanda Complex) Teaching Assistants Ayan Maity | Somnath Hazra | Sriyash Poddar Notices and Announcements
- August 02, 2022
- The first class will be held on 04-August-2022 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...
- July 25, 2022
- We shall consider requests via ERP from all students till 8:00pm, 02-Aug-2022 and finalize the approvals among those requested students based on their CGPA (only) immediately after 8:00pm (if possible). Please note that, we cannot take all requested students due to seat limitation. Considering the first class to be held on 04-Aug-2022, the declined students are requested to switch over to other courses before the subject registration deadline expires. We shall NOT consider any further request for approval beyond that.
- Course Pre-requisites: Probability and Linear Algebra (Basics), Programming Knowledge (preferably Python), Data Structures and Algorithms, Artificial Intelligence, Machine Learning and (Deep) Neural Networks
Syllabus and Coverage
Topic Details Date References Introduction to RL The RL Problem, Setup and Course Layout
04-Aug-2022 Sutton-Barto [1]
(Chapter-1)Markov Decision Process (MDP) Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs
05-Aug-2022
11-Aug-2022Sutton-Barto [1]
(Chapter-3)Planning by Dynamic Programming (DP) Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions and Convergence using Contraction Mapping
12-Aug-2022
25-Aug-2022Sutton-Barto [1]
(Chapter-4)Model-free Prediction Monte-Carlo (MC) Learning, Temporal-Difference (TD) Learning, TD-Lambda and Eligibility Traces
26-Aug-2022
01-Sep-2022Sutton-Barto [1]
(Chapter-5+6)Model-free Control On-Policy MC Control, On-Policy TD Learning and Off-Policy Learning
02-Sep-2022
08-Sep-2022Sutton-Barto [1]
(Chapter-5+6+7)Value Function Approximation Incremental Methods and Batch Methods, Deep Q-Learning, Deep Q-Networks and Experience Replay
09-Sep-2022
10-Sep-2022Sutton-Barto [1]
(Chapter-9+10+11+12)Policy Gradient Methods Finite-Difference, Monte-Carlo and Actor-Critic Methods
13-Oct-2022 Sutton-Barto [1]
(Chapter-13)Integrating Planning with Learning Model-based RL, Integrated Architecture and Simulation-based Search
14-Oct-2022
20-Oct-2022Sutton-Barto [1]
(Chapter-8)Exploration and Exploitation (Bandits) Multi-arm Bandits, Contextual Bandits and MDP Extensions
20-Oct-2022
21-Oct-2022Sutton-Barto [1]
(Chapter-2)Integrating AI Search and Learning Classical Games: Combining Minimax Search and RL
27-Oct-2022 Sutton-Barto [1]
(Chapter-16)Hierarchical RL Semi-Markov Decision Process, Learning with Options, Abstract Machines and MAXQ Decomposition
28-Oct-2022
03-Nov-2022Barto-Mahadevan [6]
Dietterich [7]Deep RL PPO, DDPG, Double Q-Learning, Advanced Policy Gradients etc.
03-Nov-2022 Francois-Lavet et al. [8]
Li [9]
Vitay [10]Multi-Agent RL Cooperative vs. Competitive Settings, Mixed Setting, Games, MARL Algorithms
04-Nov-2022 Zhang-Yang-Başar [11]
Yang-Wang [12]Conclusion Summary, Open Problems and Path Ahead
10-Nov-2022 Sutton-Barto [1]
(Chapter-14+15+17)
** For Reference Slides/Materials, Visit the following Course Pages:Course by Dr. David Silver (Deepmind and UCL, UK)
Course by Dr. Abit Das (IIT Kharagpur, India)
Course by Dr. Emma Brunskill (Stanford, USA)
Books and References
- Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020. [ TEXTBOOK ]
- Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
- Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
- Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
- Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol.4, pp. 237-285, 1996.
- Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
- Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
- Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; ArXiv ePrint, 2018.
- Yuxi Li; Deep Reinforcement Learning: An Overview; ArXiv ePrint, 2018.
- Julien Vitay; Deep Reinforcement Learning, 2020.
- Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
- Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.
Term-Project
- Phase-I (Duration: 22-Aug-2022 – 23-Sep-2022, Marks: 30)
- Phase-II (Duration: 10-Oct-2022 – 11-Nov-2022, Marks: 30)
Examinations
CS60077 : Reinforcement Learning | Autumn 2022, L-T-P: 3-0-0 |