CS60077 : Reinforcement Learning | Autumn 2022, L-T-P: 3-0-0 |
Schedule
Instructor Aritra Hazra Timings Thursday (15:00–17:00), Friday (15:00–16:00) Venue NC442 (Nalanda Complex) Teaching Assistants Ayan Maity | Somnath Hazra | Sriyash Poddar Notices and Announcements
- August 02, 2022
- The first class will be held on 04-August-2022 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...
- July 25, 2022
- We shall consider requests via ERP from all students till 8:00pm, 02-Aug-2022 and finalize the approvals among those requested students based on their CGPA (only) immediately after 8:00pm (if possible). Please note that, we cannot take all requested students due to seat limitation. Considering the first class to be held on 04-Aug-2022, the declined students are requested to switch over to other courses before the subject registration deadline expires. We shall NOT consider any further request for approval beyond that.
- Course Pre-requisites: Probability and Linear Algebra (Basics), Programming Knowledge (preferably Python), Data Structures and Algorithms, Artificial Intelligence, Machine Learning and (Deep) Neural Networks
Syllabus and Coverage
Topic Details Date Slides** References Introduction to RL The RL Problem, Setup and Course Layout
04-Aug-2022 Lecture-1 Sutton-Barto [1]
(Chapter-1)Markov Decision Process (MDP) Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs
05-Aug-2022
11-Aug-2022Lecture-2 Sutton-Barto [1]
(Chapter-3)Planning by Dynamic Programming (DP) Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions and Convergence using Contraction Mapping
12-Aug-2022
25-Aug-2022Lecture-3 Sutton-Barto [1]
(Chapter-4)Model-free Prediction Monte-Carlo (MC) Learning, Temporal-Difference (TD) Learning, TD-Lambda and Eligibility Traces
26-Aug-2022
01-Sep-2022Lecture-4 Sutton-Barto [1]
(Chapter-5+6)Model-free Control On-Policy MC Control, On-Policy TD Learning and Off-Policy Learning
02-Sep-2022
08-Sep-2022Lecture-5 Sutton-Barto [1]
(Chapter-5+6+7)Value Function Approximation Incremental Methods and Batch Methods, Deep Q-Learning, Deep Q-Networks and Experience Replay
09-Sep-2022
10-Sep-2022Lecture-6 Sutton-Barto [1]
(Chapter-9+10+11+12)Policy Gradient Methods Finite-Difference, Monte-Carlo and Actor-Critic Methods
13-Oct-2022 Lecture-7 Sutton-Barto [1]
(Chapter-13)Integrating Planning with Learning Model-based RL, Integrated Architecture and Simulation-based Search
14-Oct-2022
20-Oct-2022Lecture-8 Sutton-Barto [1]
(Chapter-8)Exploration and Exploitation (Bandits) Multi-arm Bandits, Contextual Bandits and MDP Extensions
20-Oct-2022
21-Oct-2022Lecture-9 Sutton-Barto [1]
(Chapter-2)Integrating AI Search and Learning Classical Games: Combining Minimax Search and RL
27-Oct-2022 Lecture-10 Sutton-Barto [1]
(Chapter-16)Hierarchical RL Semi-Markov Decision Process, Learning with Options, Abstract Machines and MAXQ Decomposition
28-Oct-2022
03-Nov-2022Lecture-11a
Lecture-11bBarto-Mahadevan [6]
Dietterich [7]Deep RL PPO, DDPG, Double Q-Learning, Advanced Policy Gradients etc.
03-Nov-2022 Lecture-12a
Lecture-12bFrancois-Lavet et al. [8]
Li [9]
Vitay [10]Multi-Agent RL Cooperative vs. Competitive Settings, Mixed Setting, Games, MARL Algorithms
04-Nov-2022 Lecture-13a
Lecture-13bZhang-Yang-Başar [11]
Yang-Wang [12]Conclusion Summary, Open Problems and Path Ahead
10-Nov-2022 Lecture-14 Sutton-Barto [1]
(Chapter-14+15+17)
** Slides Credit: Dr. David Silver (UCL and Deepmind) and Dr. Abir Das (CSE, IIT Kharagpur)
Books and References
- Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020. [ TEXTBOOK ]
- Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
- Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
- Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
- Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol.4, pp. 237-285, 1996.
- Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
- Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
- Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; ArXiv ePrint, 2018.
- Yuxi Li; Deep Reinforcement Learning: An Overview; ArXiv ePrint, 2018.
- Julien Vitay; Deep Reinforcement Learning, 2020.
- Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
- Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.
Term-Project
- Phase-I (Duration: 22-Aug-2022 – 23-Sep-2022, Marks: 30)
- Phase-II (Duration: 10-Oct-2022 – 11-Nov-2022, Marks: 30)
Examinations
CS60077 : Reinforcement Learning | Autumn 2022, L-T-P: 3-0-0 |