University of Technology Sydney

31250 Introduction to Data Analytics

Warning: The information on this page is indicative. The subject outline for a particular session, location and mode of offering is the authoritative source of all information about the subject for that offering. Required texts, recommended texts and references in particular are likely to change. Students will be provided with a subject outline once they enrol in the subject.

Subject handbook information prior to 2024 is available in the Archives.

UTS: Information Technology: Computer Science
Credit points: 6 cp

Subject level:

Undergraduate

Result type: Grade and marks

Anti-requisite(s): 31040 Data Mining and Knowledge Discovery AND 32130 Fundamentals of Data Analytics

Description

Data analytics is the art and science of turning large quantities of usually incomprehensive data into meaningful and commercially valuable information. It is the basis of modern computer analytics and intelligence. It includes a number of IT areas, such as statistical methods for identifying patterns in data and making inferences; database technologies for managing the data sets to be mined; a range of intelligent technologies that derive automatically patterns from data; and visualisation and other multimedia techniques that support human pattern discovery capabilities. This subject offers the foundations of data analytics, data mining and knowledge discovery methods and their application to practical problems. It brings together the state-of-the-art research and practical techniques in data analytics, providing students with the necessary knowledge and capacity to initiate and conduct data mining research and development projects, and professionally communicate with analytics experts.

Subject learning objectives (SLOs)

Upon successful completion of this subject students should be able to:

1. Explain the background of data analytics including the business and society context. (B.1)
2. Use data analytics to explore and gain a broad understanding of a dataset. (D.1)
3. Outline the scope and limitations of several state-of-the-art data analytics methods. (D.1)
4. Use data analytics methods to make predictions for a dataset. (D.1)
5. Organise and implement a data analytics project in a business environment. (C.1)
6. Communicate the results of a data analytics project. (E.1)

Course intended learning outcomes (CILOs)

This subject also contributes specifically to the development of the following Course Intended Learning Outcomes (CILOs):

  • Socially Responsible: FEIT graduates identify, engage, interpret and analyse stakeholder needs and cultural perspectives, establish priorities and goals, and identify constraints, uncertainties and risks (social, ethical, cultural, legislative, environmental, economics etc.) to define the system requirements. (B.1)
  • Design Oriented: FEIT graduates apply problem solving, design and decision-making methodologies to develop components, systems and processes to meet specified requirements. (C.1)
  • Technically Proficient: FEIT graduates apply abstraction, mathematics and discipline fundamentals, software, tools and techniques to evaluate, implement and operate systems. (D.1)
  • Collaborative and Communicative: FEIT graduates work as an effective member or leader of diverse teams, communicating effectively and operating within cross-disciplinary and cross-cultural contexts in the workplace. (E.1)

Contribution to the development of graduate attributes

Engineers Australia Stage 1 Competencies

This subject contributes to the development of the following Engineers Australia Stage 1 Competencies:

  • 1.1. Comprehensive, theory based understanding of the underpinning natural and physical sciences and the engineering fundamentals applicable to the engineering discipline.
  • 1.5. Knowledge of engineering design practice and contextual factors impacting the engineering discipline.
  • 2.1. Application of established engineering methods to complex engineering problem solving.
  • 2.2. Fluent application of engineering techniques, tools and resources.
  • 3.2. Effective oral and written communication in professional and lay domains.

Teaching and learning strategies

Subject presentation includes combined workshop and laboratory sessions (3 hours) and research and development work for the assignments. Students will need to undertake preparation using material on Canvas to make effective use of their class time. Online lectures will present the theoretical aspects of data mining. Guest lectures about case studies of real-world business applications of data mining techniques will be face-to-face. The laboratory sessions focus on hands-on experience in data mining and data analytics tools, and the understanding and interpretation of the results. Practical assignments can be performed anywhere. The labs will provide the tools necessary to complete these assignments. Preparation will help students to participate in the in-class individual and group exercises. Regular quizzes throughout the semester will allow students to gauge their progress.

Content (topics)

The subject will cover topics from the following:

  1. Introduction to data mining: problems; data mining concepts, types of data that we collect, the data mining and knowledge discovery process (CRISP DM methodology, SAS SEMMA Methodology), differences between data mining and knowledge discovery, what can be discovered; the concepts of 'interestingness', usefulness' and 'novelty' of discovered patterns; overview of application areas, the data mining professional.
  2. Visual data exploration and mining: data visualisation techniques and their applicability in data mining, visual data mining methods.
  3. Data pre-processing and transformation: problems; small and large data sets; missing data and dealing with it; noisy data and sampling; missing data; techniques for data cleaning; techniques for removing sensitive information, legal issues.
  4. Classification and Prediction: problems for classification and prediction; classification by decision tree induction; classification by support vector machine; ensemble methods and random forest; classification accuracy; issues in prediction; applications in medical diagnosis, credit approval, target marketing, medical diagnosis, DNA microarray analysis.
  5. Clustering: problems for cluster analysis; types of data; partitioning methods, hierarchical methods; density-based methods; k-means and related methods.
  6. Deployment of results: representing patterns as rules, functions, cases; model deployment; industry applications.

Assessment

Assessment task 1: Quizzes

Objective(s):

This assessment task addresses the following subject learning objectives (SLOs):

1 and 2

This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs):

B.1 and D.1

Type: Quiz/test
Groupwork: Individual
Weight: 30%
Length:

About 15 mins.

Assessment task 2: Data Exploration and Preparation

Objective(s):

This assessment task addresses the following subject learning objectives (SLOs):

2 and 3

This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs):

D.1

Type: Report
Groupwork: Individual
Weight: 35%
Length:

A report of about 20 pages in an 11 or 12 point font.

Assessment task 3: Data mining in action

Objective(s):

This assessment task addresses the following subject learning objectives (SLOs):

3, 4, 5 and 6

This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs):

C.1, D.1 and E.1

Type: Report
Groupwork: Individual
Weight: 35%
Length:

A report of about 15 pages in an 11 or 12 point font.

Minimum requirements

In order to pass the subject, a student must achieve an overall mark of 50% or more.

Recommended texts

  1. Pang-Ning Tan, Michael Steinbach and Vipin Kumar (2006). Introduction to Data Mining, Addison-Wesley.
  2. Graham Williams (2011). Data Mining with Rattle and R, Springer. This is a nice simple introduction to data mining using the R statistical language and Rattle, a package that sits on top of it.
  3. Margaret H. Dunham (2002). Data Mining: Introductory and Advanced Topics, Prentice Hall. The book offers the undergraduate Computing and IT student an introduction to the full spectrum of data mining concepts and algorithms in a comprehensive and consistent manner. The depth of coverage of each topic or method is exactly right and appropriate. Each algorithm is presented in pseudocode sufficient for any interested student to convert it into a working implementation.
  4. Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques, Morgan Kaufmann. The book comes from an experienced database professional and also provides an introduction to the data mining concepts and techniques, but from a database perspective. The book provides details about data warehousing and OLAP techniques, examines algorithms, data structures, data types, and complexity of algorithms.
  5. Pyle, D. (1999). Data preparation for data mining, San Francisco, Calif.: Morgan Kaufmann Publishers. A key book on data pre-processing
  6. Hand, D. J., Mannila, H. and Smyth, P. (2001). Principles of Data Mining, Bradford Books, MIT Press. This text provides more engineering approach to the subject.
  7. Witten, I. H. and Frank, E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, CA, 2000. The book is a light broad view of data mining. The book complements the WEKA toolkit used in the class.
  8. Westphal, C. and Blaxton, T. (1998). Data Mining Solutions: Methods and Tools for solving real world problems. John Wiley and Sons, NY. An excellent light text with lots of tools discussed (but at the level of 1997-98 developments).
  9. Michael Friendly, Visualizing Categorical Data, SAS Press, 2001. The issues in this book are related to the visualisation, visual data mining and representation of the results of data mining

References

Krzysztof J. Cios (ed.) (2000), IEEE Engineering in Medicine and Biology Magazine, Special Issue on Data Mining and Knowledge Discovery in Medical Data. This special issue provides the latest developments in the application of data mining methods for discovering of medical knowledge.

Michael J. A. Berry, Gordon Linoff (2000). Mastering data mining: the art and science of customer relationship management, New York, Chichester: Wiley Computer Publishing. This book is devoted to one of the hottest specialized applications of data mining – customer relationship management.

Kovalerchuk B. and Vityaev E. (2000), Data Mining in Finance: Advances in Relational and Hybrid Methods, Kluwer Academic. This book is focused on financial data mining (requires good mathematical background)

Other resources

Subject announcements, the topic discussion boards for the subject and other communication tools will be in UTS Canvas: http://canvas.uts.edu.au