CS60077 : Reinforcement Learning | Autumn 2023, L-T-P: 3-0-0 |
Schedule
Instructor Aritra Hazra Slot V3 Timings Thursday (03:00pm–05:00pm) and Friday (03:00pm–04:00pm) Venue CSE-120 (Ground Floor, CSE Dept.) Teaching Assistants Ayan Maity | Somnath Hazra Course Pre-requisites Probability and Linear Algebra (Basics),
Programming Knowledge (preferably Python),
Data Structures and Algorithms,
Artificial Intelligence, Machine Learning and (Deep) Neural Networks
Notices and Announcements
- November 24, 2023
- End-Semester Examination solutions are up! The answer scripts will be shown on 28-Nov-2023.
- October 07, 2023
- Term Projects (Phase II) is up! The submission deadline is 08-Nov-2023 (strict).
- September 27, 2023
- Mid-Semester Examination solutions are up! The answer scripts will be shown on 29-Sep-2023 after the class.
- August 19, 2023
- Term Projects (Phase I) is up! The submission deadline is 10-Sep-2023 (strict).
- August 01, 2023
- We have processed the requests. The approved students may please go ahead with their registrations for this course.
The first class will be held on 03-August-2023 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...
- July 24, 2023
- Students are requested to apply for this course using the ERP portal. Because of limited capacity of seats, I may have to apply some screening (if and as necessary) during the approval of students requests. I shall do that near the middle or the end of the next week. Please do not send individual emails to me. I may not be able to respond to each and every email that I receive about this matter. Moreover, please do not ask about the timelines. I shall give you sufficient time to switch to other courses, in case I am unable to accept you in this course.
The only (officially) listed pre-requisite of this course is Deep Learning (CS60010). However, being an advanced course, this course is heavily dependent on all the mentioned pre-requisites (topics listed above). If you feel that you are well-acquainted with those topics, you are encouraged to apply for pre-requisite waiver through ERP.
Syllabus and Coverage
Topic Details Date References Introduction to RL The RL Problem and Overall Landscape,
Recent Advancements and Highlights,
Setup and Course Layout03-Aug-2023 Sutton-Barto[1] (Chapter 1)
Szepesvári[5]Markov Decision Process (MDP) Markov Process,
Markov Reward Process,
Markov Decision Process,
Bellman Expectation and Optimality Equations,
Partially Observable MDPs04-Aug-2023
10-Aug-2023Sutton-Barto[1] (Chapter 3)
Szepesvári[5]Planning by Dynamic Programming (DP) Policy Evaluation,
Policy Iteration, DP Extensions,
Value Iteration,
Maximum Entropy Formulation,
Convergence using Contraction Mapping11-Aug-2023
17-Aug-2023Sutton-Barto[1] (Chapter 4)
Szepesvári[5]Model-free Prediction Monte-Carlo (MC) Learning,
Temporal-Difference (TD) Learning,
TD(λ) and Eligibility Traces24-Aug-2023 Sutton-Barto[1] (Chapters 5,6)
Szepesvári[5]Model-free Control On-Policy MC Control,
On-Policy TD Learning,
Off-Policy Learning,
SARSA and Q-Learning,
Double Q-Learning[11]25-Aug-2023
31-Aug-2023Sutton-Barto[1] (Chapters 5,6,7)
Szepesvári[5]Integrating Planning with Learning Model-based RL,
Integrated Architecture,
Simulation-based Search (Monte-Carlo Tree Search)01-Sep-2023 Sutton-Barto[1] (Chapter 8) Value Function Approximation Incremental Methods (Linear and Gradient based),
Batch Methods (Least Square based)08-Sep-2023
14-Sep-2023Sutton-Barto[1] (Chapters 9,10,11,12) Deep RL Deep Q-Networks (DQN) with Experience Replay[12,13],
Double DQN (DDQN)[14],
Prioritized Replay DDQN[15],
Duelling DQN[16],
Distributional DQN[17],
Noisy DQN[18]29-Sep-2023 Dong et. al.[6]
Plaat[8]
Hessel et. al.[19]
Francois-Lavet et. al.[20]
Li[21]
Vitay[22]Policy Gradient Methods Finite-Difference Method,
Likelihood-Ratio Policy Gradient,
Vanilla (Monte-Carlo) Policy Gradient,
Actor-Critic Methods (A2C, A3C, GAE)[23,24,25,26],
Advantage Function and Compatible Function Approximation,
Natural Policy Gradient[27,28]05-Oct-2023
06-Oct-2023Sutton-Barto[1] (Chapter 13)
Agarwal et. al.[9]Advanced Policy Gradients Trust Region Policy Optimization (TRPO)[29],
Proximal Policy Optimization (PPO)[30],
Actor-Critic using Kronecker-Factored Trust Region (ACKTR)[31],
Deep Deterministic Policy Gradient (DDPG)[32],
Soft Actor-Critic (SAC)[33]12-Oct-2023
13-Oct-2023Dong et. al.[6]
Plaat[8]
Kakade-Langford[34]
Achiam et. al.[35]Integrating AI Search with Learning Classical Games wtih Self-Play,
Combining Minimax Search and RL[36,37,38,39],
Monte-Carlo Tree Search19-Oct-2023 Sutton-Barto[1] (Chapter 16)
Plaat[7]Exploration and Exploitation (Bandits) Exploration Principles (Greedy, Optimistic, Probabilistic, Informative),
Multi-arm Bandits,
Contextual Bandits and Upper Confidence Bounds (UCB),
MDP Extensions02-Nov-2023 Sutton-Barto[1] (Chapter-2)
Lattimore-Szepesvári[4]
Agarwal et. al.[9]Hierarchical RL Semi-Markov Decision Process,
Learning with Options,
Abstract Machines,
MAXQ Decomposition03-Nov-2023
09-Nov-2023Barto-Mahadevan[40]
Dietterich[41]Multi-Agent RL Cooperative vs. Competitive Settings,
Mixed Setting,
Game-Theoretic Formulation,
MARL Algorithms10-Nov-2023 Zhang et. al.[42]
Yang-Wang[43]Inverse RL
(tentative as per available time)Inferring Reward from Policy and Behavior
– Ng-Russell[44]
Abbeel-Ng[45]
Arora-Doshi[46]Imitation RL
(tentative as per available time)Learning by Mimicking and Behavior Cloning
– Ho-Ermon[47]
Price-Boutilier[48]
Le et. al.[49]
Peng et. al.[50]Conclusion Summary, Open Problems and Path Ahead
10-Nov-2023 Sutton-Barto[1] (Chapters 14,15,16,17)
Kaelbling et. al.[10]
** For Reference Slides/Materials, Visit the following Course Pages:Course by Dr. David Silver (Deepmind and UCL, UK)
Course by Dr. Abir Das (IIT Kharagpur, India)
Course by Dr. Emma Brunskill (Stanford, USA)
Term-Projects
- Phase-I (Duration: 20-Aug-2023 – 10-Sep-2023, Marks: 30)
- Phase-II (Duration: 08-Oct-2023 – 08-Nov-2023, Marks: 30)
Examinations
Marks Distribution: 30% Projects + 30% MidSem + 40% EndSem
- Mid-Semester [ Questions | Solutions ] (26-Sep-2023, Tuesday, 09:00am-11:00am (FN) | Marks: 60 | Room: CSE-120) [ Syllabus: Upto Value Function Approximation ]
- End-Semester [ Questions | Solutions ] (24-Nov-2023, Friday, 09:00am-12:00pm (FN) | Marks: 80 | Room: CSE-120) [ Syllabus: Full ]
Books and References
- Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020. [ TEXTBOOK ]
- Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
- Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
- Tor Lattimore and Csaba Szepesvári; Bandit Algorithms; 1st Edition, Cambridge University Press, 2020. [Open-Access]
- Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
- Hao Dong, Zihan Ding and Shanghang Zhang; Deep Reinforcement Learning: Fundamental, Research and Applications; Springer, 2020.
- Aske Plaat; Learning to Play: Reinforcement Learning and Games; Springer, 2020.
- Aske Plaat; Deep Reinforcement Learning; Springer, 2022.
- Alekh Agarwal, Nan Jiang, Sham M. Kakade and Wen Sun; Reinforcement Learning: Theory and Algorithms; Working Draft, Jan. 31, 2022.
- Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
- Hado van Hasselt; Double Q-learning; Advances in Neural Information Processing Systems (NIPS), vol. 23, 2010.
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra and Martin Riedmiller; Playing Atari with Deep Reinforcement Learning; arXiv preprint, arXiv:1312.5602, 2013.
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis; Human-Level Control through Deep Reinforcement Learning; Nature, vol. 518, pp. 529-533, 2015.
- Hado van Hasselt, Arthur Guez and David Silver; Deep Reinforcement Learning with Double Q-learning; In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI), pp. 2094–2100, 2016.
- Tom Schaul, John Quan, Ioannis Antonoglou and David Silver; Prioritized Experience Replay; arXiv preprint arXiv:1511.05952, 2016 (ICLR 2016 Poster).
- Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot and Nando De Freitas; Dueling Network Architectures for Deep Reinforcement Learning; Proceedings of Machine Learning Research (PMLR), vol. 48, pp. 1995-2003, 2016.
- Marc G. Bellemare, Will Dabney and Rémi Munos; A Distributional Perspective on Reinforcement Learning; Proceedings of Machine Learning Research (PMLR), vol. 70, pp. 449-458, 2017.
- Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundelland Shane Legg; Noisy Networks for Exploration; arXiv preprint arXiv:1706.10295, 2017 (ICLR 2018).
- Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar and David Silver; Rainbow: Combining Improvements in Deep Reinforcement Learning; In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 32, no. 1, pp. 3215-3222, 2018.
- Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; arXiv preprint arXiv:1811.12560, 2018.
- Yuxi Li; Deep Reinforcement Learning: An Overview; arXiv preprint arXiv:1701.07274, 2018.
- Julien Vitay; Deep Reinforcement Learning, 2020.
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra and Martin Riedmiller; Deterministic Policy Gradient Algorithms; In Proceedings of the 31st International Conference on Machine Learning (ICML), PMLR, vol. 32, no. 1, pp. 387-395, 2014.
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu; Asynchronous Methods for Deep Reinforcement Learning; In Proceedings of the 31st International Conference on Machine Learning (ICML), PMLR, vol. 48, pp. 1928-1937, 2016.
- John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel; High-Dimensional Continuous Control Using Generalized Advantage Estimation; arXiv preprint arXiv:1506.02438, 2018 (ICLR 2016 Poster).
- Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu and Nando de Freitas; Sample Efficient Actor-Critic with Experience Replay; arXiv preprint arXiv:1611.01224, 2017 (ICLR 2017).
- Jan Peters and Stefan Schaal; Natural Actor-Critic; Neurocomputing, vol. 71, no. 7–9, pp. 1180-1190, 2008.
- Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh and Mark Lee; Natural Actor-Critic Algorithms; Automatica, vol. 45, no. 11, pp. 2471-2482, 2009.
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan and Philipp Moritz; Trust Region Policy Optimization; In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR, vol. 37, pp. 1889-1897, 2015.
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford and Oleg Klimov; Proximal Policy Optimization Algorithms; arXiv preprint arXiv:1707.06347, 2017.
- Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shun Liao and Jimmy Ba; Scalable Trust-Region Method for Deep Reinforcement Learning using Kronecker-Factored Approximation; Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017.
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver and Daan Wierstra; Continuous Control with Deep Reinforcement Learning; arXiv preprint arXiv:1509.02971, 2019 (ICLR Poster 2016).
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine; Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor; arXiv preprint arXiv:1801.01290, 2018.
- Sham Kakade and John Langford; Approximately Optimal Approximate Reinforcement Learning; In Proceedings of the 19th International Conference on Machine Learning (ICML), pp. 267-274, 2002.
- Joshua Achiam, David Held, Aviv Tamar and Pieter Abbeel; Constrained Policy Optimization; In Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR, vol. 70, pp. 22-31, 2017.
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel and Demis Hassabis; Mastering the Game of Go with Deep Neural Networks and Tree Search; Nature, vol. 529, pp. 484–489, 2016.
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan and Demis Hassabis; A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play; Science, vol. 362, no. 6419, pp. 1140-1144, 2018.
- Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps and David Silver; Grandmaster Level in StarCraft II using Multi-Agent Reinforcement Learning; Nature, vol. 575, pp. 350–354, 2019.
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap and David Silver; Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model; Nature, vol. 588, pp. 604-609, 2020.
- Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
- Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
- Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
- Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.
- Andrew Y. Ng and Stuart J. Russell; Algorithms for Inverse Reinforcement Learning; In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pp. 663-670, 2000.
- Pieter Abbeel and Andrew Y. Ng; Apprenticeship Learning via Inverse Reinforcement Learning; In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), 2004.
- Saurabh Arora and Prashant Doshi; A Survey of Inverse Reinforcement Learning: Challenges Methods and Progress; Artificial Intelligence Journal, vol. 297, 2021.
- Jonathan Ho and Stefano Ermon; Generative Adversarial Imitation Learning; Advances in Neural Information Processing Systems (NIPS), vol. 29, 2016.
- Bob Price and Craig Boutilier; Accelerating Reinforcement Learning through Implicit Imitation; Journal of Artificial Intelligence Research (JAIR), vol. 19, pp. 569-629, 2003.
- Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue and Hal Daumé III; Hierarchical Imitation and Reinforcement Learning; In Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR, vol. 80, pp. 2917-2926, 2018.
- Xue Bin Peng, Pieter Abbeel, Sergey Levine and Michiel van de Panne; DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills; ACM Transactions on Graphics, vol. 37, no. 4, pp. 143:1–14, 2018.