36103 Statistical Thinking for Data Science
Warning: The information on this page is indicative. The subject outline for a
particular session, location and mode of offering is the authoritative source
of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.
Subject handbook information prior to 2024 is available in the Archives.
Credit points: 8 cp
PostgraduateResult type: Grade, no marks
There are course requisites for this subject. See access conditions.
Statistical thinking is the foundational mindset in data science, emphasizing the use of statistical principles and methods to understand, analyze, and derive meaningful insights from data. It serves as the core of data science. This subject equips students with essential skills and concepts for applying statistical thinking in the context of applied data science. Initially, students are introduced to fundamental statistical principles, developing a simultaneous understanding of modern methods for statistical inference, and gaining valuable hands-on experience with real-world data. Subsequently, they delve into a range of statistical models and estimation techniques, applying their acquired knowledge to engage in a complete data science research cycle. Collaborating in teams, students learn how to formulate research inquiries, employ formal statistics and real-world datasets to address them, and effectively communicate their findings through both oral presentations and written reports.
The progression of this subject starts with more teaching-intensive methods such as workshops and lectures to give students the technical and conceptual know-how to work as practicing data scientists. As the subject progresses, students increasingly move towards an individually driven learning mode, allowing both teams and individuals the flexibility to enhance their statistical thinking and skills.
Upon completion of the subject, students possess a robust foundation in technical, conceptual, and practical aspects, empowering them to continue their development as Data Scientists.
Subject learning objectives (SLOs)
Upon successful completion of this subject students should be able to:
|Manage the complexity of real data science projects and their inevitable compromises
|Formulate authentic data science questions precise enough to be answered by valid statistical techniques
|Justify the use of different statistical concepts and tools to audiences from a wide range of backgrounds
|Find, clean, and merge datasets from a range of sources to answer real world data science problems
|Apply statistical methods that are appropriate to a dataset and stakeholder requirements
|Interpret the results of a statistical analysis correctly, visualizing and reporting upon them in ways that create value for, and are sensitive to the needs of, a wide range of stakeholders
|Collaborate with and contribute to the professional community of data scientists, both local and global
Course intended learning outcomes (CILOs)
This subject also contributes specifically to the development of the following course outcomes:
- Exploring and testing models and describing behaviours of complex systems
Explore and test models and generalisations for describing the behaviour of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders (1.2)
- Making the invisible visible
Use transdisciplinary approaches to seeing and doing to uncover underrepresented, or misrepresented, elements of a system (1.4)
- Exploring, interpreting and visualising data
Explore, analyse, manipulate, interpret and visualise data using data science techniques, software and technologies to make sense of data rich environments (2.2)
- Designing and managing data investigations
Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on underrepresented components (2.4)
- Developing strategies for innovation
Explore, interrogate, generate, apply, test and evaluate problem-solving strategies to extract economic, business, social, strategic or other value from data (3.1)
- Working together
Develop a collaborative and team-oriented mindset to harness value for stakeholders to produce innovative solutions to challenges (3.3)
- Engaging audiences
Explore and craft interpretative narratives that engage key audiences with data analytics and potential significance for action, at a societal, industrial, organisational, group or individual levels (4.2)
- Informing decision making
Develop, test, justify and deliver data project propositions, methodologies, analytics outcomes and recommendations for informing decision-making, both to specialist and non-specialist audiences (4.3)
Contribution to the development of graduate attributes
Your experiences as a student in this subject support you to develop the following graduate attributes (GA):
GA 1 Sociotechnical systems thinking
GA 2 Creative, analytical and rigorous sense making
GA 3 Create value in problem solving and inquiry
GA 4 Persuasive and robust communication
Teaching and learning strategies
Authentic problem based learning: This subject relies heavily upon the principle that students learn best by doing. It offers a range of authentic data science problems to solve that help to develop students’ statistical thinking about complex problems. Students work on real world data analysis problems using datasets that they create using modern data harvesting techniques. These are used to answer realistic data science questions in broad areas of topical interest. This exposes them to the true ambiguities, constraints, and complexities of working as a data scientist for a variety of different stakeholders.
Blend of online and face to face activities: This subject is offered through a series of block sessions blending online with face-to-face learning. Students interact face-to-face with each other and the teaching team in three intensive modules that require the completion of both preparation and after class activities. They concurrently use a range of complementary online resources to develop their statistical thinking according to identified weaknesses in their background knowledge. They are expected to engage in online discussion and to actively participate in other blended activities.
Collaborative work: We place a strong emphasis on group activities and collaboration in diverse teams. As a data science professional you need to approach professional projects and challenges by working with people from different backgrounds, expectations, and expertise. This course simulates that environment by requiring students to work with a team of peers who come from many different backgrounds. Group assessments help students to develop effective strategies for working as a part of a data science team, as well as an appreciation that there are diverse perspectives on many different topics in data science and innovation.
Self paced evaluation and improvement: This subject takes students from an exceptionally wide range of backgrounds, some of who are better versed in statistical methods, and Python, than others. We help all students to self-diagnose their weaknesses and strengths, and to work to improve in areas that they identify as a priority for the professional niche that they would like to occupy as a practicing data scientist. Students choose their own path through a wide variety of curated resources as needed.
Embedding English Language: An aim of this subject is to help you develop academic and professional language and communication skills in order to succeed at university and in the workplace. To determine your current academic language proficiency, you are required to complete an online language screening task, OPELA (information available at https://www.edu.au/research-and-teaching/learning-and-teaching/enhancing/language-and-learning/about-opela-students). If you receive a Basic grade for OPELA, you must attend additional Language Development Tutorials (each week from week [3/4] to week [11/12] in order to pass the subject. These tutorials are designed to support you to develop your language and communication skills. Students who do not complete the OPELA and/or do not attend 80% of the Language Development Tutorials will receive a Fail X grade
Module 1: Understanding the Data and Statistics
- Introduction to Statistics
- Exploratory data analysis
- Data visualisation
Module 2: Parametric and Non–Parametric Models
- Multivariate Linear Regression
- Logistic Regression
- Generalized Linear Models
- Model Selection
- Industry Data Science Practice
- Conceptualising and executing a data science project
Module 3: Estimation and Optimisation Methods
- Maximum likelihood estimation
- Bayesian method
Assessment task 1: Exploration of data skills and issues
3 and 5
A maximum of 7 pages
Assessment task 2: Data analysis project
1, 2, 3, 4, 5, 6 and 7
|Group, group assessed
Group Presentation: 10-15 minutes
Group Report: 500-700 words
Assessment task 3: Individual project exploration
2, 3 and 6
700 to 1000 words Canvas Submission.
To meet the minimum requirement for the course, students must attain a minimum of 50% marks to pass.
Additionally, it is a requirement of this subject that all students complete OPELA. Students who received a Basic grade in the OPELA are required to attend 80% of the Language Development Tutorials in order to pass the subject. Students who do not complete the OPELA and/or do not attend 80% of the Language Development Tutorials will receive a Fail X grade.
Other learning resources:
Depending on your background and what you are planning to learn you will find at least one useful. You are not expected to read all of these resources cover-to-cover. Use them to help you solve specific problems.
- To learn statistical concepts:
James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R
(Second Edition). New York: Springer. (An Introduction to Statistical Learning (statlearning.com))
- Bruce, P., & Bruce, A. (2017). Practical Statistics for Data Scientists: 50 Essential Concepts. O'Reilly Media, Inc. You can get it here. We will refer to it as PSDS in this subject.
- To learn linear regression modelling: Brian Caffo, Regression models for Data Science in R, Lean pubs. You can get a free copy here: leanpub.com/regmods/read. It is written as a companion book to the Coursera Regression Models class, and also has a series of YouTube videos accompanying it. We will refer to it as RM throughout this subject.
- To run a good Data Science project: Godsey, B. (2017). Think Like a Data Scientist: Tackle the data science process step-by-step. Manning Publications Co.. You can get it here.
Additional general and module-specific resources will be available on Canvas