43008 Reinforcement Learning

Warning: The information on this page is indicative. The subject outline for a particular session, location and mode of offering is the authoritative source of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.

Subject handbook information prior to 2025 is available in the Archives.

UTS: Information Technology: Computer Science
Credit points: 6 cp
Result type: Grade and marks

Requisite(s): (31250 Introduction to Data Analytics OR 32130 Fundamentals of Data Analytics) AND (48024 Programming 2 OR 32555 Fundamentals of Software Development) AND (41040 Introduction to Artificial Intelligence OR 31005 Machine Learning OR 42172 Introduction to Artificial Intelligence)

Recommended studies:

Data Analytics, basics of statistics and probability, basics of machine learning, Python programming

Description

The subject introduces reinforcement learning, or in simple terms, how machines learn by interacting with their environment, something human beings do from birth. Reinforcement learning is a subfield of machine learning but is also a general-purpose formalism for automated decision-making and AI. It deals with building programs that learn how to predict and act in a stochastic environment, based on past-experience, or interaction with the environment. Applications of reinforcement learning range from classical control problems, such as power-plant optimisation or dynamical system control, to game playing, inventory control, and many other fields. In this subject, students study the theoretical aspects and practical applications of reinforcement learning. State-of-the art software tools are discussed and used for the implementation of RL algorithms and training intelligent agents. Students learn how to formalise problems as Markov Decision Processes and learn classic and modern algorithms in reinforcement learning. Students learn to implement, train, and test their own RL agent. After the completion of the subject, students will understand value functions, basic exploration methods, and the comparison between exploration and exploitation methods. Finally, students explore how to deploy trained agents and build an AI system that solves a real-world problem in the final project.

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1.	Design Reinforcement Learning algorithms to solve classic reinforcement learning problems. (C.1)
2.	Apply state-of-the-art and customized Reinforcement Learning Algorithms to solve real-world problem. (D.1)
3.	Communicate efficiently in a team, in both oral and written forms. (E.1)

Course intended learning outcomes (CILOs)

This subject also contributes specifically to the development of the following Course Intended Learning Outcomes (CILOs):

Design Oriented: FEIT graduates apply problem solving, design thinking and decision-making methodologies in new contexts or to novel problems, to explore, test, analyse and synthesise complex ideas, theories or concepts. (C.1)
Technically Proficient: FEIT graduates apply theoretical, conceptual, software and physical tools and advanced discipline knowledge to research, evaluate and predict future performance of systems characterised by complexity. (D.1)
Collaborative and Communicative: FEIT graduates work as an effective member or leader of diverse teams, communicating effectively and operating autonomously within cross-disciplinary and cross-cultural contexts in the workplace. (E.1)

Teaching and learning strategies

This subject will have one 1.5 hrs face-to-face lecture each week for 12 weeks. In each class, the lecturer/guest lecturer will first introduce a designated RL topic, and then students will discuss this topic in a Q&A session, as well as a specific tutorial in line with this topic. In the 2-hour tutorial session, students work on tasks, implementing state-of-the-art RL algorithms. Each week’s task will address a necessary component to complete the subject’s project.

An individual project is specifically designed to help students practice experimental skills in implementing related algorithms, reviewing the literature, and presenting the results. Together with reflections and a formative online quiz, students are required to explore the topics with breadth and depth to demonstrate their good understanding of the chosen RL topics.

Content (topics)

Introduction to Reinforcement Learning
Motivation & RL problems
Introduction to Markov Decision Process
Value functions and Bellman Equations
Dynamic Programming
Monte Carlo Methods
Q-Learning
Introduction to Deep-RL
Deep Q-Network (DQN)
Policy Searching Methods
Industry Case-studies, guest lectures

Assessment

Assessment task 1: Assignment 1

Intent:	To demonstrate firm understanding of the basic concepts such as Markov Decision Process (MDP), Dynamic programming etc. To guide students to start the project and assessment task-2.
Objective(s):	This assessment task addresses the following subject learning objectives (SLOs): 1 and 2 This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs): C.1 and D.1
Type:	Laboratory/practical
Groupwork:	Individual
Weight:	35%
Length:	No particular length limits. Generally, the report would be about 10 pages max.

Assessment task 2: Assignment 2

Intent:	To demonstrate firm understanding of Q-learning, DQN and policy search techniques.
Objective(s):	This assessment task addresses the following subject learning objectives (SLOs): 1 and 2 This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs): C.1 and D.1
Type:	Laboratory/practical
Groupwork:	Individual
Weight:	35%
Length:	No particular length limits. Generally, the report would be about 10 pages max.

Assessment task 3: Mini-Project

Intent:	To ensure that students have a clear understanding of the theoretical concepts, are able to implement/use them in a real-world problem context, and can independently / collaboratively design and implement solutions using Reinforcement Learning techniques.
Objective(s):	This assessment task addresses the following subject learning objectives (SLOs): 1, 2 and 3 This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs): C.1, D.1 and E.1
Type:	Project
Groupwork:	Group, individually assessed
Weight:	30%
Length:	No particular length limits. Generally, the report would be 20 pages maximum.

Minimum requirements

In order to pass the subject, a student must achieve an overall mark of 50% or more.

Required texts

Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available here: http://incompleteideas.net/book/first/ebook/the-book.html. Check references on Canvas for more information.

Recommended texts

Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds.
David Silver's course on Reinforcement Learning (https://www.deepmind.com/learning-resources/introduction-to-reinforcement-learning-with-david-silver )

References

Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
Dive into Deep Learning (https://d2l.ai/index.html)