University of Technology Sydney

94692 Data Science Practice

Warning: The information on this page is indicative. The subject outline for a particular session, location and mode of offering is the authoritative source of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.

Subject handbook information prior to 2021 is available in the Archives.

UTS: Transdisciplinary Innovation
Credit points: 8 cp
Result type: Grade, no marks

Description

The subject covers the following topics in detail:

  • introduction to assorted data stores/access techniques like SQL
  • introduction to programming concepts
  • a practical introduction to R and Python for data science
  • working with UNIX systems
  • collaboration using version control with Github
  • building complete machine learning systems

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1. Participate in the development of data science projects using popular programming languages including R, Python and SQL.
2. Articulate the strengths, weaknesses and use-cases of common code control workflows, and demonstrate ability to work collaboratively using these tools
3. Interact with and query data from modern data warehousing and data lake technologies.
4. Work confidently in a Unix environment, including the use of Bash via a Command Line Interface.
5. Understanding basic programming concepts

Course intended learning outcomes (CILOs)

This subject contributes specifically to the development of the following course intended learning outcomes:

  • Explore and test models and generalisations for describing the behaviour of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders (1.2)
  • Critique contemporary trends and theoretical frameworks in data science for relevance to one's own practice (2.1)
  • Explore, analyse, manipulate, interpret and visualise data using data science techniques, software and technologies to make sense of data rich environments (2.2)
  • Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on underrepresented components (2.4)
  • Critically examine the perceived value of data analytics outcomes and clearly articulate implications for different stakeholders and organisations (3.2)
  • Develop a collaborative and team-oriented mindset to harness value for stakeholders to produce innovative solutions to challenges (3.3)
  • Collaborate to develop and refine multimodal communication skills needed to successfully work in data science teams (4.1)
  • Engage in active, reflective practice that supports flexible navigation of assumptions, alternatives and uncertainty in professional data science contexts (5.1)

Contribution to the development of graduate attributes

The subject gives students a practical introduction to Data Science practices that are commonly used in industry. Popular technologies and practices are covered and students are given opportunities to apply them in realistic settings. . Students will come away with an understanding of how to work effectively in teams, appreciate how to “get things done” in corporate environments, as well as a familiarity with some of the common risks when deploying data science projects, and controls that can be used to minimize those risks.

The subject addresses the following graduate attributes (GA):

GA 1 Sociotechnical systems thinking

GA 2 Creative, analytical and rigorous sense making

GA 4 Persuasive and robust communication

GA 5 Ethical citizenship and leadership

Teaching and learning strategies

This subject is conducted weekly face-to-face sessions with weekly activities & readings assigned between classes. The classes are a mix of lecture components and collaborative “lab” sessions, working on data science projects as a team. Each session runs for 3 hours on Wednesdays as decided in the timetable.

The lab components involve two types of activities:

  • ‘code together’ sessions in which the instructor and students build understanding through collaboratively coding solutions to problems or implementing theoretical concepts.
  • Practical coding tasks for students to complete themselves or in small groups.

Assignments are a mix of practical coding exercises, report writing (for a business audience) as well as solution design and implementation tasks. Through these students get deep exposure to historical and current industry trends and challenges, while developing tangible skills to implement these technologies and in a work context.

Due to the rapidly advancing nature of this field it is important for students to develop skills in quickly absorbing, dissecting and understanding new technologies and their value to business problems. This assignments in the subject are designed to help students develop these critical new skills

Assessment

Assessment task 1: Designing Data Stores and accessing this data via SQL

Intent:

Be able to model your own data for analysis and be able to extract and analyse that data using SQL,R and Python

Objective(s):

This task addresses the following subject learning objectives:

1 and 3

This assessment task contributes to the development of course intended learning outcome(s):

1.2, 2.2, 2.4 and 3.2

Type: Report
Groupwork: Individual
Weight: 30%
Criteria:

1. Quality and clarity of written content submitted around why certain data stores were chosen.
2. Clarity of designing the database for speed and accuracy.
3. Justification of optimization employed on SQL queries used to answer the analytical questions.
4. Readability and consistency of style for SQL query, R Code and Python Code.
5. Efficiency and conciseness of use of SQL,R and Python to perform and interpret analysis on the dataset.
6. Insightfulness and clarity of written explanations of difference between SQL,R and Python.
7. Creation of a clear cheatsheet between SQL, R and Python.

Assessment task 2: Programming using R and Python Analysis

Type: Report
Groupwork: Individual
Weight: 30%

Assessment task 3: Collaborative Development of an end to end project using Centralised Code Repositories (PART A) and Github usage analysis and reflection of the project (PART B)

Intent:

To tie all the pieces taught in this Subject together. Conduct an end to end analysis and collaborate as a team using Github. To analyse the github usage of the entire group and reflection of your own project in terms of what worked and didn’t work.

Objective(s):

This task addresses the following subject learning objectives:

1, 2, 3 and 5

This assessment task contributes to the development of course intended learning outcome(s):

2.1, 2.2, 2.4, 3.3, 4.1 and 5.1

Type: Report
Groupwork: Group, individually assessed
Weight: 40%
Criteria:

Group

1. Research on the effective data stores. And designing the datawarehouse appropriately.

2. Using SQL/R/Python to do basic analysis.

3. Clarity and why a certain programming language was chosen.

4. Appropriateness of commits and branches to collaborate within a team using Git, adhering to one of the documented workflows.

5. Clarity and efficiency of content review and change negotiation using Pull Requests, and successful incorporation of individual changes into the team’s master branch.

6. Presenting and clearly communicating your findings as a report.

Individual

1. Clarity on highlighting the individual and teams efforts on github usage.

2. Articulating what worked and didn’t work in during the course of the project.

Minimum requirements

Students must participate in all online and face to face requirements, as well as complete assessment tasks.

References

https://www.atlassian.com/git/tutorials/learn-git-with-bitbucket-cloud

D. Sculley et al, 2014. Machine Learning: The High Interest Credit Card of Technical Debt. SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop)

https://seankross.com/the-unix-workbench/

https://earlconf.com/archive/_downloads/london_speakers/EARL2018_-_London_-_Leanne_Fitzpatrick.pdf

https://jeroen.github.io/uros2018/#1

https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/mobilepresent?slide=id.g362da58057_0_1

https://boards.greenhouse.io/uptake/jobs/159426

https://fivebooks.com/best-books/computer-science-data-science-hadley-wickham/

http://ropenscilabs.github.io/r-docker-tutorial/?utm_content=buffer66d72&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

https://www.featuretools.com/wp-content/uploads/2018/03/ml20.pdf