University of Technology Sydney

94692 Data Science Practice

Warning: The information on this page is indicative. The subject outline for a particular session, location and mode of offering is the authoritative source of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.

Subject handbook information prior to 2022 is available in the Archives.

UTS: Transdisciplinary Innovation
Credit points: 8 cp
Result type: Grade, no marks

Description

The subject covers the following topics in detail:

  • introduction to programming concepts

  • practical introduction to Python and R programming

  • collaboration using version control with Git

  • introduction to data stores and SQL querying language

  • working with UNIX systems and Docker container

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1. Participate in the development of data related projects using popular programming languages including Python, SQL and R.
2. Articulate the strengths, weaknesses and use-cases of common code control workflows, and demonstrate ability to work collaboratively using these tools.
3. Interact with databases and query data sources using SQL.
4. Work confidently in a Unix environment, including the use of Docker and Bash via a Command Line Interface.
5. Understanding basic programming concepts.

Course intended learning outcomes (CILOs)

This subject contributes specifically to the development of the following course intended learning outcomes:

  • Identify and represent the human and technical elements and processes within complex systems and organise them within frameworks of relationships (1.1)
  • Explore and test models and generalisations for describing the behaviour of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders (1.2)
  • Explore, analyse, manipulate, interpret and visualise data using data science techniques, software and technologies to make sense of data rich environments (2.2)
  • Understand and deal critically and openly with the uncertainty, ambiguity and complexity associated with people, systems and data (2.3)
  • Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on underrepresented components (2.4)
  • Develop a collaborative and team-oriented mindset to harness value for stakeholders to produce innovative solutions to challenges (3.3)
  • Collaborate to develop and refine multimodal communication skills needed to successfully work in data science teams (4.1)
  • Engage in active, reflective practice that supports flexible navigation of assumptions, alternatives and uncertainty in professional data science contexts (5.1)

Contribution to the development of graduate attributes

The subject gives students a practical introduction to Data Science practices that are commonly used in industry. Popular technologies and practices are covered and students are given opportunities to apply them in realistic settings. . Students will come away with an understanding of how to work effectively in teams, appreciate how to “get things done” in corporate environments, as well as a familiarity with some of the common risks when deploying data science projects, and controls that can be used to minimize those risks.

The subject addresses the following graduate attributes (GA):

GA 1 Sociotechnical systems thinking

GA 2 Creative, analytical and rigorous sense making

GA 4 Persuasive and robust communication

GA 5 Ethical citizenship and leadership

Teaching and learning strategies

This subject is conducted weekly online sessions with weekly activities & readings assigned between classes. The classes are a mix of lecture components and collaborative “lab” sessions, working on data science projects as a team. Each session runs for 3 hours on Thursdays as decided in the timetable.

The lab components involve two types of activities:

  • ‘code together’ sessions in which the instructor and students build understanding through collaboratively coding solutions to problems or implementing theoretical concepts.

  • Practical coding tasks for students to complete themselves or in small groups.

Assignments are a mix of practical coding exercises, report writing (for a business audience) as well as solution design and implementation tasks. Through these students get deep exposure to historical and current industry trends and challenges, while developing tangible skills to implement these technologies and in a work context.

Due to the rapidly advancing nature of this field it is important for students to develop skills in quickly absorbing, dissecting and understanding new technologies and their value to business problems. This assignments in the subject are designed to help students develop these critical new skills

Assessment

Assessment task 1: Building Currency Converter in Python

Intent:

Develop a Python program that will perform currency conversion using data fetched from an open-source API

Objective(s):

This task addresses the following subject learning objectives:

1 and 5

This assessment task contributes to the development of course intended learning outcome(s):

1.2, 2.3, 2.4 and 4.1

Type: Project
Groupwork: Individual
Weight: 30%
Criteria:
  1. Quality and reliability of Python code

  2. Readability and consistency of coding style

  3. Level of clarity for documentation of pseudo code and code

  4. Comprehensibility and relevance of unit tests

Assessment task 2: Analysing Company Performance with SQL

Intent:

Load data into a database and perform data analysis on the performance of a company using SQL[no content]

Objective(s):

This task addresses the following subject learning objectives:

3 and 5

This assessment task contributes to the development of course intended learning outcome(s):

2.2, 2.3, 2.4 and 4.1

Type: Report
Groupwork: Individual
Weight: 30%
Criteria:

[no content]

  1. Efficiency and conciseness of SQL queries

  2. Readability and consistency of coding style

  3. Insightfulness and level of clarity of written explanations for business questions

  4. Level of clarity and quality of visualizations and written report

Assessment task 3: Collaborative Development of Data Explorer Web App

Intent:

Collaborate as a team to develop a containerised web application in Python and analyse the content of a dataset.

Objective(s):

This task addresses the following subject learning objectives:

1, 2, 4 and 5

This assessment task contributes to the development of course intended learning outcome(s):

1.1, 1.2, 2.4, 3.3, 4.1 and 5.1

Type: Report
Groupwork: Group, group assessed
Weight: 40%
Criteria:
  1. Quality and reliability of Python code and Unix commands

  2. Readability and consistency of coding style

  3. Level of clarity and relevance for documentation, flowcharts and instructions for installing and running the web application

  4. Robustness of the web application

  5. Comprehensiveness of repository structure and level of clarity of documentation of Git workflows (branch management, code review and pull request)

  6. Level of clarity of explanation of the web application and data extraction design

  7. Level of clarity and quality of analysis and visualizations displayed by the web application and written report highlighting individual and teams efforts and problems faced

Minimum requirements

Students must participate in all online requirements, as well as complete assessment tasks.

References

https://www.atlassian.com/git/tutorials/learn-git-with-bitbucket-cloud

D. Sculley et al, 2014. Machine Learning: The High Interest Credit Card of Technical Debt. SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop)

https://seankross.com/the-unix-workbench/

https://earlconf.com/archive/_downloads/london_speakers/EARL2018_-_London_-_Leanne_Fitzpatrick.pdf

https://jeroen.github.io/uros2018/#1

https://fivebooks.com/best-books/computer-science-data-science-hadley-wickham/

http://ropenscilabs.github.io/r-docker-tutorial/?utm_content=buffer66d72&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer