University of Technology Sydney

36106 Machine Learning Algorithms and Applications

Warning: The information on this page is indicative. The subject outline for a particular session, location and mode of offering is the authoritative source of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.

Subject handbook information prior to 2021 is available in the Archives.

UTS: Analytics and Data Science: Transdisciplinary Innovation
Credit points: 8 cp

Subject level:

Postgraduate

Result type: Grade, no marks

There are course requisites for this subject. See access conditions.
Anti-requisite(s): 36113 Applied Data Science for Innovation AND 36114 Advanced Data Science for Innovation

Requisite elaboration/waiver:

Any student wishing to enrol in first- and second-year subjects concurrently must apply for a waiver.

Description

This subject introduces students to key machine learning algorithms and their application in real-world settings. Participants are guided in developing an intuitive understanding of how the algorithms work, as well as their strengths and weaknesses. In addition to gaining practical experience with the algorithms, students develop an understanding of the basic principles of machine learning and the connections between different algorithms. Additionally, they are exposed to industry standard methodologies for data mining and analytics via readings and assessments. Since data science problems are infused with assumptions, often with ethical and legal implications, due attention is given to questioning the assumptions behind data and approaches used to analyse it.

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1. Apply an industry standard analytics life cycle methodology for data mining and pattern discovery
2. Interpret, synthesise and communicate insights extracted from machine learning algorithms in a context-appropriate manner.
3. Articulate the strengths, weaknesses and assumptions of a selection of machine learning algorithms in relation to structured and unstructured data
4. Execute and interpret machine learning approaches available for extracting value from data
5. Demonstrate an appreciation, with examples, of a critical, ethical perspective on decisions made throughout the analytics lifecycle

Course intended learning outcomes (CILOs)

This subject also contributes specifically to the development of the following course outcomes:

  • Understanding relationships & processes within systems
    Identify and represent the human and technical elements and processes within complex systems and organise them within frameworks of relationships (1.1)
  • Exploring and testing models and describing behaviours of complex systems
    Explore and test models and generalisations for describing the behaviour of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders (1.2)
  • Exploring, interpreting and visualising data
    Explore, analyse, manipulate, interpret and visualise data using data science techniques, software and technologies to make sense of data rich environments (2.2)
  • Designing and managing data investigations
    Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on underrepresented components (2.4)
  • Examining and articulating data value
    Critically examine the perceived value of data analytics outcomes and clearly articulate implications for different stakeholders and organisations (3.2)
  • Working together
    Develop a collaborative and team-oriented mindset to harness value for stakeholders to produce innovative solutions to challenges (3.3)
  • Developing communication skills
    Collaborate to develop and refine multimodal communication skills needed to successfully work in data science teams (4.1)
  • Engaging audiences
    Explore and craft interpretative narratives that engage key audiences with data analytics and potential significance for action, at a societal, industrial, organisational, group or individual levels (4.2)
  • Becoming a reflective data practitioner
    Engage in active, reflective practice that supports flexible navigation of assumptions, alternatives and uncertainty in professional data science contexts (5.1)
  • Embracing ethical responsibilities
    Interrogate and justify ethical responsibilities related to data selection, access, analysis and governance to create a framework for practice (5.2)

Contribution to the development of graduate attributes

Your experiences as a student in this subject support you to develop the following graduate attributes (GA):

GA 1 Sociotechnical systems thinking
GA 2 Creative, analytical and rigorous sense making
GA 3 Create value in problem solving and inquiry
GA 4 Persuasive and robust communication
GA 5 Ethical citizenship

Teaching and learning strategies

Blend of online and face to face activities: This subject is offered through a series of block sessions and blends online with face-to-face learning. Students participate in interactive learning experiences in timetabled on-campus sessions, where they make use of the subject materials that they have already engaged with online. In between campus sessions, students will engage in individual and collaborative online activities designed to support the understanding of the machine learning algorithms and their application in real-world settings

Collaborative work: A strong emphasis is placed on group activities and interaction, given that graduates of this course will need to approach professional projects and challenges from a collaborative and consensus position. Insights obtained and developed within the groups is then reworked by individual students to develop the final summative assessment activity. Group assessments and activities enable students to leverage peer-learning and demonstrate effective skills associated with the topics covered in this subject.

Transdisciplinary approaches: Starting from an elemental perspective on data and data science, students will approach learning from their specific professional and potential future contexts. As the subject progresses, the students will be able to combine their analytical and technical skills in developing and applying various machine-learning algorithms, as well as to consider standards and ethical implications of their work.

Assessment

Assessment task 1: Regression and classfication models

Intent:

Gain hands-on experience of building regression and classification models using realistic datasets.

Objective(s):

1, 2, 3, 4 and 5

Type: Report
Groupwork: Individual
Weight: 30%
Length:

Deliverables for Part A and B:

  1. All R code used to generate the model.
  2. A report articulating an understanding of the problem, the identification and breakdown of tasks relating to the solution process (as per CRISP-DM) with appropriate visualizations as well as the technical choices made and the reasons for them. In addition to a detailed discussion of the results, the report should also contain a listing of the key assumptions and their implications.

Length: Each of the two reports should be no more than1000 words each.

Criteria:

Both parts of the assignment will be assessed by the following criteria (see assessment brief for details)

  1. Quality of data exploration (visual + summary stats)
  2. Strength of justification for features selected and model used
  3. Quality of code and accuracy of results
  4. Appropriateness of the CRISP-DM framework usage
  5. Depth of discussion of ethics/privacy issues + mitigation

Assessment task 2: Building and Interpreting a classfication model

Intent:

This assignment is focused on classification modelling in detail.

Objective(s):

1, 2, 3, 4 and 5

Type: Report
Groupwork: Group, group and individually assessed
Weight: 40%
Length:

Deliverables for PART A

  1. All R code used to generate the model.
  2. A csv file with two columns row ID and predicted probabilties. This file should be uploaded to Kaggle.
  3. A comprehensive report articulating an understanding of the problem, the identification and breakdown of tasks relating to the solution process (as per CRISP-DM) as well as the technical choices made and the reasons for them. In addition to a detailed discussion of the results, the report should also contain a listing of the key assumptions and their implications. The appendix should list individual contributions to the projects. Length: 1500 words (excl. appendices)

Deliverables for PART B

  1. A 10 slide management presentation with the individual students interpretation of the group’s recommendations.
Criteria:

Assignments will be assessed by the following criteria (see assessment brief for details)

Part A

  1. Soundness of justification for selected technique
  2. Quality of code and visualisations
  3. Accuracy of results and evidence supporting claims
  4. Breadth of evidence of collaborative work (e.g. meeting minutes, details of contributions etc)

Part B

  1. Clarity of problem statement, appropriateness of approach taken and accuracy of results obtained
  2. Criticality and specificity in evaluating assumptions and potential ethical issues
  3. Appropriateness of communication style to audience

Assessment task 3: Analysis and interpretation of unstructured data

Intent:

The intent of this assessment is to help students gain an understanding of the basics of text analysis and the concepts underlying them. To this end, students wiill a) prepare, process and analyse a corpus of text documents, interpret results and provide insights and b) write a reflective piece on the use of text analytical methods in the workplace..

Objective(s):

1, 2, 3, 4 and 5

Type: Report
Groupwork: Individual
Weight: 30%
Length:

Deliverables for Part A:

A written report describing:

  1. The approach used, assumptions and supporting rationale for each stage of the CRISP-DM framework

  2. Results and recommendations, including supporting visualisations and summary data. Students should evaluate the results of different techniques, giving reasons for their final approach.

  3. An appendix including working code (this should be submitted as a separate .R file).

Length: 1500 words.

Deliverables for Part B

Blog post on CIC Around.

Length: 1000 words

Criteria:

Assignments will be assessed by the following criteria (see assessment brief for details)

Part A

  1. Appropriateness of techniques used
  2. Quality of code
  3. Quality of presentation and visualizations, findings and recommendations and resolution of relevant ambiguous issues

Part B

  1. Depth of Reflection
  2. Standard of presentation

Minimum requirements

Students must participate in all online and face to face requirements, as well as complete assessment tasks.

Required texts

James, G., Witten, D., Hastie, T. and Tibshirani, R. (2015), An introduction to statistical learning with applications in R, New York, NY: Springer-Verlag. PDF available at: http://www-bcf.usc.edu/~gareth/ISL/

Recommended texts

1. Lanz, B. (2015), Machine Learning with R (Second Edition), Birmingham: Packt Publishing. (R-focused machine learning text with less formal math than the prescribed textbook)


2. Hastie, T., Tibshirani, R. and Friedman, J. (2010), The Elements of Statistical Learning – Data Mining, Prediction and Inference (Second Edition), New York, NY: Springer-Verlag (More detailed and mathematically oriented than the textbook)


4. Kuhn, M. and Johnson, K. (2013), Applied Predictive Modeling, New York, NY: Springer. (A practitioner-focused introductory text, low on math. Uses the authors’ caret package)


5. Abu-Mostafa, Y., Magdon-Ismail, M. and Lin, H-T, Learning From Data: A Short Course, Palo Alto: AML Books. (Solid introduction to machine learning from a theoretical perspective, excellent resource…but assumes a decent mathematical background in linear algebra)


6. Foreman, J. (2014), Data Smart: Using Data Science to Transform Information into Insight, New York, NY: Wiley. (Implementing machine learning algorithms in Excel)

7. O’Neil Cathy (2017), Weapons of Math Destruction,New York, NY: Crown Books (Highly readable book on the effects of machine learning algorithms on society at large)

8. Wickham, H. and Grolemund, G. (2016), R for Data Science, CA: O'Reilly Media. Available online at: http://r4ds.had.co.nz/ (The authoritative text on the tidyverse)

Note: some of the above books have been held on reserve for 36106 students at the UTS Library. Please enquire at the library front desk for details.

References

Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5), 662-679.

Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199-231.


Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37.


Kirkpatrick, K. (2016). Battling algorithmic bias: how do we ensure algorithms treat us fairly?. Communications of the ACM, 59(10), 16-17.


Goldman, E. (2005). Search engine bias and the demise of search engine utopianism. Yale Journal of Law & Tech., 8, 188.


Zarsky, T. (2016). The trouble with algorithmic decisions: An analytic road map to examine efficiency and fairness in automated and opaque decision making. Science, Technology, & Human Values, 41(1), 118-132.


Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 1-12.

Note: Many of the above papers are available to UTS students via https://drr.lib.uts.edu.au/search.html?q=36106

Other resources

1. Please ensure that you check Canvas regularly for announcements and additional material (if any).

2. Students may want to check out various Sydney-based data science meetups. Here are some of the better known ones:

https://www.meetup.com/R-Users-Sydney/

https://www.meetup.com/Data-Science-Sydney/

https://www.meetup.com/Deep-Learning-Sydney/