University of Technology Sydney

36118 Applied Natural Language Processing

Warning: The information on this page is indicative. The subject outline for a particular session, location and mode of offering is the authoritative source of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.

Subject handbook information prior to 2024 is available in the Archives.

UTS: Transdisciplinary Innovation
Credit points: 8 cp
Result type: Grade, no marks

Requisite(s): 36100 Data Science for Innovation AND 36103 Statistical Thinking for Data Science AND 36106 Machine Learning Algorithms and Applications

Description

Analysis of naturally occurring human language offers significant potential for understanding our world. However, these forms of data are a challenge in data science as they are often more complex to collect, process, and interpret than numerical data. In this subject, students learn to work with textual data using automated methods of Natural language Processing (NLP) and text mining. They develop technical and communicative skills in the applications of natural language processing on messy unstructured data sets. The subject covers the core concepts of NLP and major techniques to extract insights and discover patterns from natural language text. It provides students with a working knowledge of practical NLP applications and an understanding of their ethical usage across contexts.

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1. Understand core concepts of Natural Language Processing (NLP) and computational linguistics including its limitations (CILO 2.2, 2.3)
2. Evaluate complex challenges for problem solving and sense making using textual data (CILO 2.3, 4.2)
3. Apply text mining techniques on unstructured data sets using advanced NLP programming packages (CILOs 1.2, 2.2)
4. Interpret, extract value and effectively communicate insights from text analysis and create real-world applications suitable to a range of audiences (CILOs 2.4, 3.2, 4.2)
5. Articulate the strengths, weaknesses and underlying assumptions of NLP and text analysis to apply ethical practices (CILO 5.1, 5.2)

Contribution to the development of graduate attributes

1.2 Explore and test models and generalisations for describing the behaviour of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders

2.2 Explore, analyse, manipulate, interpret and visualise data using data science techniques, software and technologies to make sense of data rich environments

2.3 Understand and deal critically and openly with the uncertainty, ambiguity and complexity associated with people, systems and data

2.4 Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on underrepresented components

3.2 Critically examine the perceived value of data analytics outcomes and clearly articulate implications for different stakeholders and organisations

4.2 Explore and craft interpretative narratives that engage key audiences with data analytics and potential significance for action, at a societal, industrial, organisational, group or individual levels

5.1 Engage in active, reflective practice that supports flexible navigation of assumptions, alternatives and uncertainty in professional data science contexts

5.2 Interrogate and justify ethical responsibilities related to data selection, access, analysis and governance to create a framework for practice

Graduate attributes

GA 1 Sociotechnical systems thinking

GA 2 Creative, analytical and rigorous sense making

GA 3 Create value in problem solving and inquiry

GA 4 Persuasive and robust communication

GA 5 Ethical citizenship

Teaching and learning strategies

Blend of online and face to face activities: The subject is offered through a series of teaching sessions which blend online and face-to-face learning. Students learn through interactive lectures and classroom activities making use of the subject materials on canvas. They also engage in individual and collaborative learning activities to understand and apply text analysis techniques in diverse settings.

Authentic problem based learning: This subject offers a range of authentic data science problems to solve that will help develop students’ text analysis skills. They work on real world data analysis problems for broad areas of interest using unstructured data and contemporary techniques.

Collaborative work: Group activities will enable students to leverage peer-learning and demonstrate effective team participation, as well as learning to work in professional teams with an appreciation of diverse perspectives on data science and innovation.

Future-oriented strategies: Students will be exposed to contemporary learning models using speculative thinking, ethical and human-centered approaches as well as reflection. Electronic portfolios will be used to curate, consolidate and provide evidence of learning and development of course outcomes, graduate attributes and professional evolution. Formative feedback will be offered with all assessment activities for successful engagement.

Content (topics)

• Introduction to unstructured data and natural language text
• Foundations of Natural Language Processing (NLP)
• Text analysis techniques using Python
• Advanced NLP and Deep Learning
• Natural Language Understanding (NLU) and Natural Language Generation (NLG)
• Real-world applications of NLP
• Ethical best practices in NLP
• NLP for social good

Content is indicative of the main topics covered, subject to minor changes.

Assessment

Assessment task 1: Assessment 1: Evaluation of NLP capabilities and applications

Intent:

Assessment 1: Evaluation of NLP capabilities and applications

· Part A: Critical summary on the ethical use of NLP (Individual, 20%)

· Part B: NLP snippet for data analysis (Working Python code + Markdown report) (Individual, 30%)

Type: Report
Groupwork: Individual
Weight: 50%

Assessment task 2: Assessment 2: End-to-end NLP project

Intent:

Assessment 2: End-to-end NLP project

· Part A: Part A: Project summary for peer feedback (Group, 10%)

(Template provided for 1-page summary)

· Part B: Design and development of a NLP application (Group & Individual, 40%)

Group project report, plus individual contribution and reflection

A detailed assessment brief will be made available on Canvas once the assignment tasks are released during in-class sessions.

Type: Report
Groupwork: Group, group and individually assessed
Weight: 50%

Minimum requirements

1. Students must participate in all online and face to face requirements
2. Pass all assessment tasks