University of Technology Sydney

37373 Programming for Data Analysis

Warning: The information on this page is indicative. The subject outline for a particular session, location and mode of offering is the authoritative source of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.

Subject handbook information prior to 2024 is available in the Archives.

UTS: Science: Mathematical and Physical Sciences
Credit points: 6 cp
Result type: Grade and marks

Requisite(s): (37171 Introduction to Programming OR 41039 Programming 1)) AND (((33230 Mathematics 2 OR 33290 Statistics and Mathematics for Science OR 37132 Introduction to Mathematical Analysis and Modelling)) OR (33130 Mathematics 1 AND 33116 Design, Data, and Decisions)
Anti-requisite(s): 35383 Programming for Mathematical Modelling and Data Analysis

Description

The goal of this subject is to introduce students to data structures and programming techniques that can be used to collect, manipulate, model, and analyse a broad range of datasets, including datasets with missing values. It uses the Python programming language to learn how to work with numerical, string, and more complex data formats, and to perform basic mathematical modelling or statistical analyses based on the data. The subject places a strong emphasis on developing a clear understanding of the common features of data structures from diverse areas, which may include nonlinear dynamics, discrete optimisation, mathematical physics, statistics, computational biology, or stochastic processes. Students develop practical skills in problem solving by working on a real-world data analysis project, using publicly available datasets.

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1. Demonstrate basic Python coding skills for the management, manipulation, analysis, and visualisation of a broad variety of data formats.
2. Design a suitable programming workflow to analyse data from a variety of problems including ones not seen previously.
3. Use the Python standard library, and find information relating to any Python modules, and use that information to write code.
4. Generate/Create a Python code from a simple algorithm
5. Synthetise data analysis results and communicate the findings effectively.
6. Perform exploratory data analysis using Python.
7. Identify and use suitable online resources to improve his or her knowledge of the Python language.

Course intended learning outcomes (CILOs)

This subject also contributes specifically to the development of following course intended learning outcomes:

  • Demonstrate theoretical and technical knowledge of mathematical sciences including calculus, discrete mathematics, linear algebra, probability, statistics and quantitative management. (1.1)
  • Evaluate mathematical and statistical approaches to problem solving, analysis, application, and critical thinking to make mathematical arguments, and conduct experiments based on analytical, numerical, statistical, algorithms to solve new problems. (2.1)
  • Work autonomously or in teams to demonstrate professional and responsible analysis of real-life problems that require application of mathematics and statistics. (3.1)
  • Design creative solutions to contemporary mathematical sciences-related issues by incorporating innovative methods, reflective practices and self-directed learning. (4.1)
  • Use succinct and accurate presentation of reasoning and conclusions to communicate mathematical solutions, and their implications, to a variety of audiences, using a variety of approaches. (5.1)

Contribution to the development of graduate attributes

The Faculty of Science has determined that its courses will aim to develop the following attributes in students at the completion of their course of study. Each subject will contribute to the development of these attributes in ways appropriate to the subject and the stage of progression, thus not all attributes are expected to be addressed in all subjects.

This subject contributes to the development of the following graduate attributes:

1. Disciplinary knowledge
The computing work integrated in the subject, and assessed in assessments 1 and 2, develop the skills necessary to write Python code to analyse a broad range of data types, and demonstrate how to apply these skills to a variety of problems.

2. Research, inquiry, and critical thinking
The third assignment involves the research, processing and analysis of a dataset to investigate a specific topic. Students are encouraged to identify a problem that is of interest to them and find a relevant, publically accessible dataset. They will then determine the most effective way to analyse the data in order to gain new insight into the problem.

3. Professional, ethical, and social responsibility
This subject helps students learn to manage their own work and to accept responsibility for their own learning. It also introduces students to the concept of reproducible research which is becoming an essential aspect of any quantitative discipline. For the project component they will need to manage their time and meet deadlines. Ethical understanding of the importance of privacy and licensing issues in relation to datasets is emphasised, and critical thinking is developed.

4. Reflection, innovation, creativity
The project assignment allows for the demonstration of creativity by requiring that the student identifies a problem of interest, locates the relevant data, and processes it to gain new insights (i.e., extract information not immediately obvious in the raw data) into the problem. The project also requires the student to reflect on their work and share their conclusions in the project notebook.

5. Communication
Presentation of written solutions to problems using appropriate professional language is emphasised by assessments 1, 2 and 3. In particular, one of the goals of the project assignment is to assess the student’s ability to take low-level information (raw dataset) and transform it, through careful analysis, into useful information that can be communicated effectively to non-experts. Oral communication skills are assessed in component 3 of the assessment.

Teaching and learning strategies

The emphasis of this subject is on real-world applications. The goal is for students to work on realistic datasets very quickly. The elegance of the language and the availability of sophisticated libraries are what make Python the ideal programming language for this task. Since programming skills are best acquired, and refined by programming, all teaching activities in this subject will be conducted in a coding environment. This allows students to implement coding concept as soon as they are introduced. Because coding skills are honed by coding, this subject requires a significant amount of personal work, and students are expected to actively engage with all the learning tasks.

We will be using the Jupyter Notebook throughout the subject. The notebook is an ideal platform for data analysis, and it also provides an environment within which the programming, analysis and reporting components of the project assessment can be blended together. This will encourage the student to reflect on their work and on how to best communicate their findings to a broad audience, including both experts and non-experts.

All the notebooks for the subject will be available on Canvas. Students are expected to consult and/or download the relevant materials as they become available. They are also expected to keep up-to-date with all the announcements posted on Canvas.

Content (topics)

Introduction to Python programming and the Jupyter notebook for data analysis; basic syntax, data types and control structures; functions, input and output operations, arrays and array operations, dataframes and dataframe operations, and data visualisation using Python.

Assessment

Assessment task 1: Quizzes

Intent:

This assessment item addresses the following graduate attributes:

1. Disciplinary Knowledge

2. Research, inquiry and critical thinking

3. Professional, ethical and social responsibility

Objective(s):

This assessment task addresses subject learning objective(s):

1, 2, 3, 4 and 6

This assessment task contributes to the development of course intended learning outcome(s):

1.1, 2.1 and 3.1

Type: Quiz/test
Groupwork: Individual
Weight: 40%
Criteria:

Use of appropriate programming techniques. Correctness of the results.

There are four quizzes, and students are expected to complete these before the due date and time.

Other Information: Any late submissions for this task will not be accepted. Any answers submitted that are not your original work (including the use of generative AI tools), will be marked zero.

Assessment task 2: Class Test

Intent:

This assessment item addresses the following graduate attributes:

1. Disciplinary Knowledge

2. Research, inquiry and critical thinking

3. Professional, ethical and social responsibility

Objective(s):

This assessment task addresses subject learning objective(s):

1, 2, 3, 4 and 6

This assessment task contributes to the development of course intended learning outcome(s):

1.1, 2.1 and 3.1

Type: Quiz/test
Groupwork: Individual
Weight: 30%
Criteria:

Use of appropriate programming techniques. Correctness of the results.

Other Information: Any late submissions for this task will not be accepted or marked. Any student who is not in the class during this test, will be awarded zero for this assessment task. Any answers submitted that are not your original work (including the use of generative AI tools), will be marked zero.

Assessment task 3: Project

Intent:

This assessment item addresses the following graduate attributes:

1. Disciplinary Knowledge

2. Research, inquiry and critical thinking

3. Professional, ethical and social responsibility

4. Reflection, Innovation, Creativity

5. Communication

Objective(s):

This assessment task addresses subject learning objective(s):

1, 2, 3, 4, 5, 6 and 7

This assessment task contributes to the development of course intended learning outcome(s):

1.1, 2.1, 3.1, 4.1 and 5.1

Type: Project
Groupwork: Group, group assessed
Weight: 30%
Criteria:

Code must parse without error (otherwise a zero mark will be given for the project component).
* Cogently written report with correct spelling and grammar.
* Originality of the problem tackled and appropriateness of the dataset(s).
* Python code demonstrating an adequate command of the language and performing the processing of the data in an efficient manner.
* Analysis of the results using appropriate mathematical or statistical approaches. Sound conclusions are drawn from the data and the results are communicated effectively using appropriate summaries and/or visualisations.

Minimum requirements

Students must score an overall mark of at least 50% in order to pass the subject.

Required texts

Wes McKinney,Python for Data Analysis. Agile Tools for Real World Data”, 2nd edition, O'Reilly Media

(2017).

Note that the eBook version is available from the UTS library website.

Charles Severance, “Python for Everybody”.

Thanks to the author’s generosity an electronic version of this book is available for free at

https://www.py4e.com/book.php

See the website for details.

Jake VanderPlas, "Python Data Science Handbook".

Thanks to the author’s generosity an electronic version of this book is available for free at

https://jakevdp.github.io/PythonDataScienceHandbook/

See the website for details.

Other resources

DataCamp (https://www.datacamp.com/) has graciously provided access to their online learning platform for students enrolled in this subject at the start of semester.