CS60077 : Reinforcement Learning Autumn 2023, L-T-P: 3-0-0

Schedule

Instructor     Aritra Hazra
Slot     V3
Timings     Thursday (03:00pm–05:00pm) and Friday (03:00pm–04:00pm)
Venue     CSE-120 (Ground Floor, CSE Dept.)
Teaching Assistants     Ayan Maity   |   Somnath Hazra
Course Pre-requisites     Probability and Linear Algebra (Basics),
Programming Knowledge (preferably Python),
Data Structures and Algorithms,
Artificial Intelligence, Machine Learning and (Deep) Neural Networks

Notices and Announcements

November 24, 2023

End-Semester Examination solutions are up! The answer scripts will be shown on 28-Nov-2023.

October 07, 2023

Term Projects (Phase II) is up! The submission deadline is 08-Nov-2023 (strict).

September 27, 2023

Mid-Semester Examination solutions are up! The answer scripts will be shown on 29-Sep-2023 after the class.

August 19, 2023

Term Projects (Phase I) is up! The submission deadline is 10-Sep-2023 (strict).

August 01, 2023

We have processed the requests. The approved students may please go ahead with their registrations for this course.

The first class will be held on 03-August-2023 (Thursday) at 3:00pm.
An e-mail with all relevant details will be sent to the enrolled students before that. Stay tuned ...

July 24, 2023

Students are requested to apply for this course using the ERP portal. Because of limited capacity of seats, I may have to apply some screening (if and as necessary) during the approval of students requests. I shall do that near the middle or the end of the next week. Please do not send individual emails to me. I may not be able to respond to each and every email that I receive about this matter. Moreover, please do not ask about the timelines. I shall give you sufficient time to switch to other courses, in case I am unable to accept you in this course.

The only (officially) listed pre-requisite of this course is Deep Learning (CS60010). However, being an advanced course, this course is heavily dependent on all the mentioned pre-requisites (topics listed above). If you feel that you are well-acquainted with those topics, you are encouraged to apply for pre-requisite waiver through ERP.


Syllabus and Coverage

TopicDetailsDateReferences
Introduction to RL

The RL Problem and Overall Landscape,
Recent Advancements and Highlights,
Setup and Course Layout

03-Aug-2023

Sutton-Barto[1] (Chapter 1)
Szepesvári[5]
Markov Decision Process (MDP)

Markov Process,
Markov Reward Process,
Markov Decision Process,
Bellman Expectation and Optimality Equations,
Partially Observable MDPs

04-Aug-2023
10-Aug-2023

Sutton-Barto[1] (Chapter 3)
Szepesvári[5]
Planning by Dynamic Programming (DP)

Policy Evaluation,
Policy Iteration, DP Extensions,
Value Iteration,
Maximum Entropy Formulation,
Convergence using Contraction Mapping

11-Aug-2023
17-Aug-2023

Sutton-Barto[1] (Chapter 4)
Szepesvári[5]
Model-free Prediction

Monte-Carlo (MC) Learning,
Temporal-Difference (TD) Learning,
TD(λ) and Eligibility Traces

24-Aug-2023

Sutton-Barto[1] (Chapters 5,6)
Szepesvári[5]
Model-free Control

On-Policy MC Control,
On-Policy TD Learning,
Off-Policy Learning,
SARSA and Q-Learning,
Double Q-Learning[11]

25-Aug-2023
31-Aug-2023

Sutton-Barto[1] (Chapters 5,6,7)
Szepesvári[5]
Integrating Planning with Learning

Model-based RL,
Integrated Architecture,
Simulation-based Search (Monte-Carlo Tree Search)

01-Sep-2023

Sutton-Barto[1] (Chapter 8)
Value Function Approximation

Incremental Methods (Linear and Gradient based),
Batch Methods (Least Square based)

08-Sep-2023
14-Sep-2023

Sutton-Barto[1] (Chapters 9,10,11,12)
Deep RL

Deep Q-Networks (DQN) with Experience Replay[12,13],
Double DQN (DDQN)[14],
Prioritized Replay DDQN[15],
Duelling DQN[16],
Distributional DQN[17],
Noisy DQN[18]

29-Sep-2023

Dong et. al.[6]
Plaat[8]
Hessel et. al.[19]
Francois-Lavet et. al.[20]
Li[21]
Vitay[22]
Policy Gradient Methods

Finite-Difference Method,
Likelihood-Ratio Policy Gradient,
Vanilla (Monte-Carlo) Policy Gradient,
Actor-Critic Methods (A2C, A3C, GAE)[23,24,25,26],
Advantage Function and Compatible Function Approximation,
Natural Policy Gradient[27,28]

05-Oct-2023
06-Oct-2023

Sutton-Barto[1] (Chapter 13)
Agarwal et. al.[9]
Advanced Policy Gradients

Trust Region Policy Optimization (TRPO)[29],
Proximal Policy Optimization (PPO)[30],
Actor-Critic using Kronecker-Factored Trust Region (ACKTR)[31],
Deep Deterministic Policy Gradient (DDPG)[32],
Soft Actor-Critic (SAC)[33]

12-Oct-2023
13-Oct-2023

Dong et. al.[6]
Plaat[8]
Kakade-Langford[34]
Achiam et. al.[35]
Integrating AI Search with Learning

Classical Games wtih Self-Play,
Combining Minimax Search and RL[36,37,38,39],
Monte-Carlo Tree Search

19-Oct-2023

Sutton-Barto[1] (Chapter 16)
Plaat[7]
Exploration and Exploitation (Bandits)

Exploration Principles (Greedy, Optimistic, Probabilistic, Informative),
Multi-arm Bandits,
Contextual Bandits and Upper Confidence Bounds (UCB),
MDP Extensions

02-Nov-2023

Sutton-Barto[1] (Chapter-2)
Lattimore-Szepesvári[4]
Agarwal et. al.[9]
Hierarchical RL

Semi-Markov Decision Process,
Learning with Options,
Abstract Machines,
MAXQ Decomposition

03-Nov-2023
09-Nov-2023

Barto-Mahadevan[40]
Dietterich[41]
Multi-Agent RL

Cooperative vs. Competitive Settings,
Mixed Setting,
Game-Theoretic Formulation,
MARL Algorithms

10-Nov-2023

Zhang et. al.[42]
Yang-Wang[43]
Inverse RL
(tentative as per available time)

Inferring Reward from Policy and Behavior

Ng-Russell[44]
Abbeel-Ng[45]
Arora-Doshi[46]
Imitation RL
(tentative as per available time)

Learning by Mimicking and Behavior Cloning

Ho-Ermon[47]
Price-Boutilier[48]
Le et. al.[49]
Peng et. al.[50]
Conclusion

Summary, Open Problems and Path Ahead

10-Nov-2023

Sutton-Barto[1] (Chapters 14,15,16,17)
Kaelbling et. al.[10]

** For Reference Slides/Materials, Visit the following Course Pages:

  • Course by Dr. David Silver (Deepmind and UCL, UK)
  • Course by Dr. Abir Das (IIT Kharagpur, India)
  • Course by Dr. Emma Brunskill (Stanford, USA)


  • Term-Projects

    Examinations

    Marks Distribution:   30% Projects + 30% MidSem + 40% EndSem


    Books and References

    1. Richard S. Sutton and Andrew G. Barto; Reinforcement Learning: An Introduction; 2nd Edition, MIT Press, 2020.   [ TEXTBOOK ]
    2. Dimitri P. Bertsekas; Dynamic Programming and Optimal Control (Vol. I and Vol. II); 4th Edition, Athena Scientific, 2017.
    3. Dimitri P. Bertsekas; Reinforcement Learning and Optimal Control; 1st Edition, Athena Scientific, 2019.
    4. Tor Lattimore and Csaba Szepesvári; Bandit Algorithms; 1st Edition, Cambridge University Press, 2020.   [Open-Access]

    5. Csaba Szepesvári; Algorithms of Reinforcement Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no. 1, 2010.
    6. Hao Dong, Zihan Ding and Shanghang Zhang; Deep Reinforcement Learning: Fundamental, Research and Applications; Springer, 2020.
    7. Aske Plaat; Learning to Play: Reinforcement Learning and Games; Springer, 2020.
    8. Aske Plaat; Deep Reinforcement Learning; Springer, 2022.
    9. Alekh Agarwal, Nan Jiang, Sham M. Kakade and Wen Sun; Reinforcement Learning: Theory and Algorithms; Working Draft, Jan. 31, 2022.
    10. Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore; Reinforcement Learning: A Survey; Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.

    11. Hado van Hasselt; Double Q-learning; Advances in Neural Information Processing Systems (NIPS), vol. 23, 2010.
    12. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra and Martin Riedmiller; Playing Atari with Deep Reinforcement Learning; arXiv preprint, arXiv:1312.5602, 2013.
    13. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis; Human-Level Control through Deep Reinforcement Learning; Nature, vol. 518, pp. 529-533, 2015.
    14. Hado van Hasselt, Arthur Guez and David Silver; Deep Reinforcement Learning with Double Q-learning; In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI), pp. 2094–2100, 2016.
    15. Tom Schaul, John Quan, Ioannis Antonoglou and David Silver; Prioritized Experience Replay; arXiv preprint arXiv:1511.05952, 2016 (ICLR 2016 Poster).
    16. Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot and Nando De Freitas; Dueling Network Architectures for Deep Reinforcement Learning; Proceedings of Machine Learning Research (PMLR), vol. 48, pp. 1995-2003, 2016.
    17. Marc G. Bellemare, Will Dabney and Rémi Munos; A Distributional Perspective on Reinforcement Learning; Proceedings of Machine Learning Research (PMLR), vol. 70, pp. 449-458, 2017.
    18. Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundelland Shane Legg; Noisy Networks for Exploration; arXiv preprint arXiv:1706.10295, 2017 (ICLR 2018).
    19. Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar and David Silver; Rainbow: Combining Improvements in Deep Reinforcement Learning; In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 32, no. 1, pp. 3215-3222, 2018.
    20. Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau; An Introduction to Deep Reinforcement Learning; arXiv preprint arXiv:1811.12560, 2018.
    21. Yuxi Li; Deep Reinforcement Learning: An Overview; arXiv preprint arXiv:1701.07274, 2018.
    22. Julien Vitay; Deep Reinforcement Learning, 2020.
    23. David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra and Martin Riedmiller; Deterministic Policy Gradient Algorithms; In Proceedings of the 31st International Conference on Machine Learning (ICML), PMLR, vol. 32, no. 1, pp. 387-395, 2014.
    24. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu; Asynchronous Methods for Deep Reinforcement Learning; In Proceedings of the 31st International Conference on Machine Learning (ICML), PMLR, vol. 48, pp. 1928-1937, 2016.
    25. John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel; High-Dimensional Continuous Control Using Generalized Advantage Estimation; arXiv preprint arXiv:1506.02438, 2018 (ICLR 2016 Poster).
    26. Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu and Nando de Freitas; Sample Efficient Actor-Critic with Experience Replay; arXiv preprint arXiv:1611.01224, 2017 (ICLR 2017).
    27. Jan Peters and Stefan Schaal; Natural Actor-Critic; Neurocomputing, vol. 71, no. 7–9, pp. 1180-1190, 2008.
    28. Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh and Mark Lee; Natural Actor-Critic Algorithms; Automatica, vol. 45, no. 11, pp. 2471-2482, 2009.
    29. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan and Philipp Moritz; Trust Region Policy Optimization; In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR, vol. 37, pp. 1889-1897, 2015.
    30. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford and Oleg Klimov; Proximal Policy Optimization Algorithms; arXiv preprint arXiv:1707.06347, 2017.
    31. Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shun Liao and Jimmy Ba; Scalable Trust-Region Method for Deep Reinforcement Learning using Kronecker-Factored Approximation; Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017.
    32. Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver and Daan Wierstra; Continuous Control with Deep Reinforcement Learning; arXiv preprint arXiv:1509.02971, 2019 (ICLR Poster 2016).
    33. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine; Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor; arXiv preprint arXiv:1801.01290, 2018.
    34. Sham Kakade and John Langford; Approximately Optimal Approximate Reinforcement Learning; In Proceedings of the 19th International Conference on Machine Learning (ICML), pp. 267-274, 2002.
    35. Joshua Achiam, David Held, Aviv Tamar and Pieter Abbeel; Constrained Policy Optimization; In Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR, vol. 70, pp. 22-31, 2017.
    36. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel and Demis Hassabis; Mastering the Game of Go with Deep Neural Networks and Tree Search; Nature, vol. 529, pp. 484–489, 2016.
    37. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan and Demis Hassabis; A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play; Science, vol. 362, no. 6419, pp. 1140-1144, 2018.
    38. Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps and David Silver; Grandmaster Level in StarCraft II using Multi-Agent Reinforcement Learning; Nature, vol. 575, pp. 350–354, 2019.
    39. Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap and David Silver; Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model; Nature, vol. 588, pp. 604-609, 2020.
    40. Andrew G. Barto and Sridhar Mahadevan; Recent Advances in Hierarchical Reinforcement Learning; Discrete Event Dynamic Systems, vol. 13, pp. 341–379, 2003.
    41. Thomas G. Dietterich; Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; Journal of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
    42. Kaiqing Zhang, Zhuoran Yang, Tamer Başar; Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms; ArXiv ePrint, 2021.
    43. Yaodong Yang, Jun Wang; An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective; ArXiv ePrint, 2021.
    44. Andrew Y. Ng and Stuart J. Russell; Algorithms for Inverse Reinforcement Learning; In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pp. 663-670, 2000.
    45. Pieter Abbeel and Andrew Y. Ng; Apprenticeship Learning via Inverse Reinforcement Learning; In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), 2004.
    46. Saurabh Arora and Prashant Doshi; A Survey of Inverse Reinforcement Learning: Challenges Methods and Progress; Artificial Intelligence Journal, vol. 297, 2021.
    47. Jonathan Ho and Stefano Ermon; Generative Adversarial Imitation Learning; Advances in Neural Information Processing Systems (NIPS), vol. 29, 2016.
    48. Bob Price and Craig Boutilier; Accelerating Reinforcement Learning through Implicit Imitation; Journal of Artificial Intelligence Research (JAIR), vol. 19, pp. 569-629, 2003.
    49. Hoang Le, Nan Jiang, Alekh Agarwal, Miroslav Dudik, Yisong Yue and Hal Daumé III; Hierarchical Imitation and Reinforcement Learning; In Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR, vol. 80, pp. 2917-2926, 2018.
    50. Xue Bin Peng, Pieter Abbeel, Sergey Levine and Michiel van de Panne; DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills; ACM Transactions on Graphics, vol. 37, no. 4, pp. 143:1–14, 2018.