CS60077 : Reinforcement Learning Autumn 2024, L-T-P: 3-0-0

Schedule

Instructor     Aritra Hazra and Soumyajit Dey
Timings [Slot: G3]     Wednesday (11:00am–12:00pm), Thursday (12:00pm–01:00pm), Friday (08:00am–09:00am)
Venue     CSE-108 (Ground Floor, CSE Dept.)
Teaching Assistants     Somnath Hazra   |   Suraj Singh
Course Pre-requisites     Probability and Linear Algebra (Basics),
Programming Knowledge (preferably Python),
Data Structures and Algorithms,
Artificial Intelligence, Machine Learning and (Deep) Neural Networks

Notices and Announcements

October 19, 2024

Term Projects (Phase II) is up! The submission deadline is 09-Nov-2024 (strict).

September 25, 2024

Mid-Semester Examination solutions are up! The answer scripts will be shown on 27-Sep-2024 in the class (8:00am).

August 16, 2024

Term Projects (Phase I) is up! The submission deadline is 07-Sep-2024 (strict).

July 20, 2024

We have processed the requests. The approved students may please go ahead with their registrations for this course.

The first class will be held on 24-July-2024 (Wednesday) at 11:00am.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...

July 08, 2024

Students are requested to apply for this course using the ERP portal. Because of limited capacity of seats, we may have to apply some screening (if and as necessary) during the approval of students requests. We shall approve and finalize a couple of days before the classes start. Please do not send individual emails to us. We may not be able to respond to each and every email that we receive about this matter. Moreover, please do not ask about the timelines. We shall give you sufficient time to switch to other courses, in case we are unable to accept you in this course.

The only (officially) listed pre-requisite of this course is Deep Learning (CS60010). However, being an advanced course, this course is heavily dependent on all the mentioned pre-requisites (topics listed above). If you feel that you are well-acquainted with those topics, you are encouraged to apply for pre-requisite waiver through ERP.


Syllabus and Coverage

TopicDetailsInstructorDateReferences
Introduction to RL

The RL Problem and Overall Landscape,
Recent Advancements and Highlights,
Setup and Course Layout

Aritra Hazra

24-Jul-2024
25-Jul-2024

Sutton-Barto[1] (Chapter 1)
Szepesvári[5]
Markov Decision Process (MDP)

Markov Process, Markov Reward Process, Markov Decision Process,
Bellman Expectation and Optimality Equations,
Partially Observable MDPs

Aritra Hazra

26-Jul-2024
31-Jul-2024

Sutton-Barto[1] (Chapter 3)
Szepesvári[5]
Planning by Dynamic Programming (DP)

Policy Evaluation, Policy Iteration, DP Extensions, Value Iteration,
Maximum Entropy Formulation,
Convergence using Contraction Mapping

Aritra Hazra

01-Aug-2024
02-Aug-2024
07-Aug-2024

Sutton-Barto[1] (Chapter 4)
Szepesvári[5]
Model-free Prediction

Monte-Carlo (MC) Learning,
Temporal-Difference (TD) Learning, TD(λ) and Eligibility Traces

Aritra Hazra

08-Aug-2024
09-Aug-2024
14-Aug-2024

Sutton-Barto[1] (Chapters 5,6)
Szepesvári[5]
Model-free Control

On-Policy MC Control, On-Policy TD Learning,
Off-Policy Learning, SARSA and Q-Learning,
Double Q-Learning[11]

Aritra Hazra

16-Aug-2024
21-Aug-2024

Sutton-Barto[1] (Chapters 5,6,7)
Szepesvári[5]
Integrating Planning with Learning

Model-based RL, Integrated Architecture,
Simulation-based Search (Monte-Carlo Tree Search)

Aritra Hazra

22-Aug-2024
23-Aug-2024

Sutton-Barto[1] (Chapter 8)
Value Function Approximation

Incremental Methods (Linear and Gradient based),
Batch Methods (Least Square based)

Aritra Hazra

28-Aug-2024
29-Aug-2024
30-Aug-2024

Sutton-Barto[1] (Chapters 9,10,11,12)
Integrating AI Search with Learning

Classical Games with Self-Play,
Combining Minimax Search and RL[12,13,14,15],
Monte-Carlo Tree Search and TD Methods

Aritra Hazra

04-Sep-2024
05-Sep-2024

Sutton-Barto[1] (Chapter 16)
Plaat[7]
Exploration and Exploitation (Bandits)

Exploration Principles (Greedy, Optimistic, Probabilistic, Informative),
Multi-arm Bandits, Contextual Bandits,
Upper Confidence Bounds (UCB), MDP Extensions

Aritra Hazra

06-Sep-2024
11-Sep-2024
12-Sep-2024

Sutton-Barto[1] (Chapter-2)
Lattimore-Szepesvári[4]
Agarwal et. al.[9]
Deep RL

Deep Q-Networks (DQN) with Experience Replay[16,17],
Double DQN (DDQN)[18],
Prioritized Replay DDQN[19],
Duelling DQN[20],
Distributional DQN[21],
Noisy DQN[22]

Soumyajit Dey

26-Sep-2024
27-Sep-2024
16-Oct-2024
17-Oct-2024
18-Oct-2024

Dong et. al.[6]
Plaat[8]
Hessel et. al.[23]
Francois-Lavet et. al.[24]
Li[25]
Vitay[26]
Hierarchical RL

Semi-Markov Decision Process,
Learning with Options,
Abstract Machines,
MAXQ Decomposition

Aritra Hazra

03-Oct-2024
04-Oct-2024

Barto-Mahadevan[27]
Dietterich[28]
Policy Gradient Methods

Finite-Difference Method,
Likelihood-Ratio Policy Gradient,
Vanilla (Monte-Carlo) Policy Gradient,
Actor-Critic Methods (A2C, A3C, GAE)[29,30,31,32],
Advantage Function and Compatible Function Approximation,
Natural Policy Gradient[33,34]

Soumyajit Dey

23-Oct-2024
24-Oct-2024
25-Oct-2024

30-Oct-2024
01-Nov-2024
06-Nov-2024

Sutton-Barto[1] (Chapter 13)
Agarwal et. al.[9]
Advanced Policy Gradients

Trust Region Policy Optimization (TRPO)[35],
Proximal Policy Optimization (PPO)[36],
Actor-Critic using Kronecker-Factored Trust Region (ACKTR)[37],
Deep Deterministic Policy Gradient (DDPG)[38],
Soft Actor-Critic (SAC)[39]

Soumyajit Dey

07-Nov-2024
08-Nov-2024
13-Nov-2024
14-Nov-2024

Dong et. al.[6]
Plaat[8]
Kakade-Langford[40]
Achiam et. al.[41]
Multi-Agent RL
(tentative as per available time)

Cooperative vs. Competitive Settings,
Mixed Setting, Adversarial Setting
Game-Theoretic Formulation,
MARL Algorithms

Zhang et. al.[42]
Yang-Wang[43]
Inverse RL
(tentative as per available time)

Inferring Reward from Policy and Behavior

Ng-Russell[44]
Abbeel-Ng[45]
Arora-Doshi[46]
Imitation RL
(tentative as per available time)

Learning by Mimicking and Behavior Cloning

Ho-Ermon[47]
Price-Boutilier[48]
Le et. al.[49]
Peng et. al.[50]
Conclusion

Summary, Open Problems and Path Ahead

Soumyajit Dey

14-Nov-2024

Sutton-Barto[1] (Chapters 14,15,16,17)
Kaelbling et. al.[10]

** For Reference Slides/Materials, Visit the following Course Pages:

  • Course by Dr. David Silver (Deepmind and UCL, UK)
  • Course by Dr. Emma Brunskill (Stanford, USA)
  • Course by Dr. Sergey Levine (Stanford, USA)
  • Course by Dr. Abir Das (IIT Kharagpur, India)


  • Term-Projects


    Examinations


    Marks Distribution:   30% Projects + 30% MidSem + 40% EndSem


    Books and References

    1. Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020.   [ TEXTBOOK ]
    2. Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
    3. Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
    4. Tor Lattimore and Csaba Szepesvári; Bandit Algorithms; 1st Edition, Cambridge University Press, 2020.   [Open-Access]

    5. Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
    6. Hao Dong, Zihan Ding and Shanghang Zhang; Deep Reinforcement Learning: Fundamental, Research and Applications; Springer, 2020.
    7. Aske Plaat; Learning to Play: Reinforcement Learning and Games; Springer, 2020.
    8. Aske Plaat; Deep Reinforcement Learning; Springer, 2022.
    9. Alekh Agarwal, Nan Jiang, Sham M. Kakade and Wen Sun; Reinforcement Learning: Theory and Algorithms; Working Draft, Jan. 31, 2022.
    10. Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.

    11. Hado van Hasselt; Double Q-learning; Advances in Neural Information Processing Systems (NIPS), vol. 23, 2010.
    12. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel and Demis Hassabis; Mastering the Game of Go with Deep Neural Networks and Tree Search; Nature, vol. 529, pp. 484–489, 2016.
    13. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan and Demis Hassabis; A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play; Science, vol. 362, no. 6419, pp. 1140-1144, 2018.
    14. Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps and David Silver; Grandmaster Level in StarCraft II using Multi-Agent Reinforcement Learning; Nature, vol. 575, pp. 350–354, 2019.
    15. Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap and David Silver; Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model; Nature, vol. 588, pp. 604-609, 2020.
    16. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra and Martin Riedmiller; Playing Atari with Deep Reinforcement Learning; arXiv preprint, arXiv:1312.5602, 2013.
    17. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis; Human-Level Control through Deep Reinforcement Learning; Nature, vol. 518, pp. 529-533, 2015.
    18. Hado van Hasselt, Arthur Guez and David Silver; Deep Reinforcement Learning with Double Q-learning; In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI), pp. 2094–2100, 2016.
    19. Tom Schaul, John Quan, Ioannis Antonoglou and David Silver; Prioritized Experience Replay; arXiv preprint arXiv:1511.05952, 2016 (ICLR 2016 Poster).
    20. Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot and Nando De Freitas; Dueling Network Architectures for Deep Reinforcement Learning; Proceedings of Machine Learning Research (PMLR), vol. 48, pp. 1995-2003, 2016.
    21. Marc G. Bellemare, Will Dabney and Rémi Munos; A Distributional Perspective on Reinforcement Learning; Proceedings of Machine Learning Research (PMLR), vol. 70, pp. 449-458, 2017.
    22. Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundelland Shane Legg; Noisy Networks for Exploration; arXiv preprint arXiv:1706.10295, 2017 (ICLR 2018).
    23. Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar and David Silver; Rainbow: Combining Improvements in Deep Reinforcement Learning; In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 32, no. 1, pp. 3215-3222, 2018.
    24. Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; arXiv preprint arXiv:1811.12560, 2018.
    25. Yuxi Li; Deep Reinforcement Learning: An Overview; arXiv preprint arXiv:1701.07274, 2018.
    26. Julien Vitay; Deep Reinforcement Learning, 2020.
    27. Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
    28. Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
    29. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra and Martin Riedmiller; Deterministic Policy Gradient Algorithms; In Proceedings of the 31st International Conference on Machine Learning (ICML), PMLR, vol. 32, no. 1, pp. 387-395, 2014.
    30. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu; Asynchronous Methods for Deep Reinforcement Learning; In Proceedings of the 31st International Conference on Machine Learning (ICML), PMLR, vol. 48, pp. 1928-1937, 2016.
    31. John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel; High-Dimensional Continuous Control Using Generalized Advantage Estimation; arXiv preprint arXiv:1506.02438, 2018 (ICLR 2016 Poster).
    32. Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu and Nando de Freitas; Sample Efficient Actor-Critic with Experience Replay; arXiv preprint arXiv:1611.01224, 2017 (ICLR 2017).
    33. Jan Peters and Stefan Schaal; Natural Actor-Critic; Neurocomputing, vol. 71, no. 7–9, pp. 1180-1190, 2008.
    34. Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh and Mark Lee; Natural Actor-Critic Algorithms; Automatica, vol. 45, no. 11, pp. 2471-2482, 2009.
    35. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan and Philipp Moritz; Trust Region Policy Optimization; In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR, vol. 37, pp. 1889-1897, 2015.
    36. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford and Oleg Klimov; Proximal Policy Optimization Algorithms; arXiv preprint arXiv:1707.06347, 2017.
    37. Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shun Liao and Jimmy Ba; Scalable Trust-Region Method for Deep Reinforcement Learning using Kronecker-Factored Approximation; Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017.
    38. Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver and Daan Wierstra; Continuous Control with Deep Reinforcement Learning; arXiv preprint arXiv:1509.02971, 2019 (ICLR Poster 2016).
    39. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine; Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor; arXiv preprint arXiv:1801.01290, 2018.
    40. Sham Kakade and John Langford; Approximately Optimal Approximate Reinforcement Learning; In Proceedings of the 19th International Conference on Machine Learning (ICML), pp. 267-274, 2002.
    41. Joshua Achiam, David Held, Aviv Tamar and Pieter Abbeel; Constrained Policy Optimization; In Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR, vol. 70, pp. 22-31, 2017.
    42. Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
    43. Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.
    44. Andrew Y. Ng and Stuart J. Russell; Algorithms for Inverse Reinforcement Learning; In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pp. 663-670, 2000.
    45. Pieter Abbeel and Andrew Y. Ng; Apprenticeship Learning via Inverse Reinforcement Learning; In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), 2004.
    46. Saurabh Arora and Prashant Doshi; A Survey of Inverse Reinforcement Learning: Challenges Methods and Progress; Artificial Intelligence Journal, vol. 297, 2021.
    47. Jonathan Ho and Stefano Ermon; Generative Adversarial Imitation Learning; Advances in Neural Information Processing Systems (NIPS), vol. 29, 2016.
    48. Bob Price and Craig Boutilier; Accelerating Reinforcement Learning through Implicit Imitation; Journal of Artificial Intelligence Research (JAIR), vol. 19, pp. 569-629, 2003.
    49. Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue and Hal Daumé III; Hierarchical Imitation and Reinforcement Learning; In Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR, vol. 80, pp. 2917-2926, 2018.
    50. Xue Bin Peng, Pieter Abbeel, Sergey Levine and Michiel van de Panne; DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills; ACM Transactions on Graphics, vol. 37, no. 4, pp. 143:1–14, 2018.