The course teaches students comprehensive and specialised subjects in data science; it develops sophisticated skills in statistics, mathematical modelling, and the ability to code in support of such analyses. It further grounds students in the disciplinary history and methodology of data science, preparing them for either further study or to work as a practitioner in the field. The program prominently features a major capstone project, requiring students to identify a real-world problem that would benefit from a data-driven approach; to collect and prepare the data to address the problem; and to build visualisations in support of their arguments. The combination of rigorous mathematical training with practical approaches gives learners the ability to autonomously further develop their skills after graduation, turning them into lifelong learners of data science methods.
Most industry analysis starts with exploratory data analysis and a thorough study of this will help learners to perform data health checks and provide initial business insights. The module will help the learner to understand and perform descriptive statistics and present the data using appropriate graphs/diagrams and serves as a foundation for advanced analytics. This module also introduces the basics of programming in R and Python, the most commonly used languages used for data science. The module culminates in practices related to data management, which is essential for both exploratory data analysis and advanced analytics. In particular, the module focuses on SQL as a highly practical language for data preprocessing, and addresses ways to connect SQL with R and Python tools, as well as learning the skills required to prepare data for machine learning and efficient data modelling.
This module provides learners with an in-depth understanding of statistical distribution and hypothesis testing in a practical approach for getting things done.
Statistical distributions include Binomial, Poisson, Normal, Log Normal, Exponential, t, F and Chi Square. Parametric and non-parametric tests used in research problems are covered in this unit.
The module will help learners to formulate research hypotheses, select appropriate tests of hypotheses, write primarily R programs to perform hypothesis testing and to draw inferences using the output generated. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analysing data.
This module provides a strong foundation for predictive modelling. Its objective is to define the entire modelling process with the help of real life case studies.
Many concepts in predictive modelling methods are common and, therefore, these concepts will be covered in detail in this module.
Students will learn how to carry out exploratory data analysis to gain insights and prepare data for predictive modelling, an essential skill valued in many industries.
The module also builds on information covered in the module Exploratory Data Analysis to include hands-on applications of the summarization and visualisation of datasets through plots to present results in compelling and meaningful ways.
The ability to render large data sets intelligible, especially in visual means and to potentially nonexpert audiences, is a core part of data science. Building from the introduction provided in Exploratory Data Analysis and Data Management, Data visualisation grounds students in the theory and practice of modern data visualisation, drawing expertise from graphic design, cognitive psychology, user experience, and related fields.
At the end of Data visualisation, students will have developed strategies for making visible both subtle details and large patterns, and for telling visual stories with data.
This module builds on the concepts introduced in the module Fundamentals of Predictive Modelling.
In this module, learners are introduced to model development for categorical dependent variables. Binary dependent variables are encountered in many domains such as risk management, marketing and clinical research and this unit covers detailed model building processes for binary dependent variables. Additionally, a primary goal of the module is for students to be able to select and successfully apply appropriate advanced regression models in applied settings.
The module will culminate with multinomial models and ordinal scaled variables.
Machine learning algorithms are new generation algorithms used in conjunction with classical predictive modelling methods.
Machine learning algorithms are new generation algorithms used in conjunction with classical predictive modelling methods. In this Machine Learning 2 module, learners will understand applications of decision tree and random forest algorithms and neural networks for classification and regression problems. Additionally, students will develop practical machine learning and data science skills including theoretical basics of a broad range of machine learning concepts and methods with practical applications to sample datasets.
Machine learning algorithms are new generation algorithms used in conjunction with classical predictive modelling methods.
In this Machine Learning 1 module, learners will understand applications of the Support vector machine, K Nearest Neighbours and Naive Bayes algorithms for classification and regression problems. Additionally, students will develop practical machine learning and data science skills including theoretical basics of a broad range of machine learning concepts and methods with practical applications to sample datasets.
In this module, students will look at analysing unstructured data such as that found on social media, newspaper articles, videos and more.
Specifically, students will look at text techniques for text mining and natural language processing using R and Python code to produce graphical representations of unstructured data and carry out sentiment analysis.
This module focuses on learning key concepts, tools and methodologies for natural language processing and emphasises hands-on learning through guided tutorials and real-world examples.
PowerBI and Excel are fundamental parts of the data analytics toolkit. A strong understanding in these also provides a basis for more advanced data analytics with other techniques and technologies. In this unit, learners will gain experience in collecting, processing, analysing, and communicating with data using Excel. In addition, data visualisation is a powerful way to communicate meaning in data and support business decision-making. This unit will cover the main commercial tools used in data visualisation such as Tableau and Power BI, enabling learners to create a wide range of graphs, charts, and dashboards and use them appropriately in context.
This advanced graduate class addresses a unique topic on a rotating basis in order to keep the program at the forefront of scholarly research and industry practice. Every year the academic staff member will approve of a new topic to be covered. The bibliography will contain not less than 8 peer-reviewed articles or scholarly publications reflecting the current topic.
Current Topic
Data Mining and Social Media
Thirty years ago, people used to say “on the internet, no one knows you’re a dog.” Using the analytic and inferential tools of social media data mining, however, we are now able to learn a great deal about the individuals who participate online, how they participate, and the different ways that the networks they’re a part of are activated by that participation. A wide variety of organizations, from law enforcement to advertisers to academic researchers and public policy makers, apply data mining techniques to social media to learn more about the public.
This course will focus on practical methods for scraping and analyzing social media data, as well as some theoretical implications of these practices.
This unit provides learners with an opportunity to apply key knowledge and skills through project work. They will be able to select a project from a specific domain and will be required to carry out various data management, exploratory data analysis, data visualisation and predictive modelling tasks.
If a student is pursuing either specialisation A or B, the Data Science in Practice work should deepen their engagement with this material.
The ability to render large data sets intelligible, especially in visual means and to potentially nonexpert audiences, is a core part of data science. Building from the introduction provided in Exploratory Data Analysis and Data Management, Data visualisation grounds students in the theory and practice of modern data visualisation, drawing expertise from graphic design, cognitive psychology, user experience, and related fields.
At the end of Data visualisation, students will have developed strategies for making visible both subtle details and large patterns, and for telling visual stories with data.
This module builds on the concepts introduced in the module Fundamentals of Predictive Modelling.
In this module, learners are introduced to model development for categorical dependent variables. Binary dependent variables are encountered in many domains such as risk management, marketing and clinical research and this unit covers detailed model building processes for binary dependent variables. Additionally, a primary goal of the module is for students to be able to select and successfully apply appropriate advanced regression models in applied settings.
The module will culminate with multinomial models and ordinal scaled variables.
Machine learning algorithms are new generation algorithms used in conjunction with classical predictive modelling methods.
In this Machine Learning 1 module, learners will understand applications of the Support vector machine, K Nearest Neighbours and Naive Bayes algorithms for classification and regression problems. Additionally, students will develop practical machine learning and data science skills including theoretical basics of a broad range of machine learning concepts and methods with practical applications to sample datasets.
PowerBI and Excel are fundamental parts of the data analytics toolkit. A strong understanding in these also provides a basis for more advanced data analytics with other techniques and technologies. In this unit, learners will gain experience in collecting, processing, analysing, and communicating with data using Excel. In addition, data visualisation is a powerful way to communicate meaning in data and support business decision-making. This unit will cover the main commercial tools used in data visualisation such as Tableau and Power BI, enabling learners to create a wide range of graphs, charts, and dashboards and use them appropriately in context.
This advanced graduate class addresses a unique topic on a rotating basis in order to keep the program at the forefront of scholarly research and industry practice. Every year the academic staff member will approve of a new topic to be covered. The bibliography will contain not less than 8 peer-reviewed articles or scholarly publications reflecting the current topic.
Current Topic
Data Mining and Social Media
Thirty years ago, people used to say “on the internet, no one knows you’re a dog.” Using the analytic and inferential tools of social media data mining, however, we are now able to learn a great deal about the individuals who participate online, how they participate, and the different ways that the networks they’re a part of are activated by that participation. A wide variety of organizations, from law enforcement to advertisers to academic researchers and public policy makers, apply data mining techniques to social media to learn more about the public.
This course will focus on practical methods for scraping and analyzing social media data, as well as some theoretical implications of these practices.
This unit provides learners with an opportunity to apply key knowledge and skills through project work. They will be able to select a project from a specific domain and will be required to carry out various data management, exploratory data analysis, data visualisation and predictive modelling tasks.
If a student is pursuing either specialisation A or B, the Data Science in Practice work should deepen their engagement with this material.
The ability to render large data sets intelligible, especially in visual means and to potentially nonexpert audiences, is a core part of data science. Building from the introduction provided in Exploratory Data Analysis and Data Management, Data visualisation grounds students in the theory and practice of modern data visualisation, drawing expertise from graphic design, cognitive psychology, user experience, and related fields.
At the end of Data visualisation, students will have developed strategies for making visible both subtle details and large patterns, and for telling visual stories with data.
This module builds on the concepts introduced in the module Fundamentals of Predictive Modelling.
In this module, learners are introduced to model development for categorical dependent variables. Binary dependent variables are encountered in many domains such as risk management, marketing and clinical research and this unit covers detailed model building processes for binary dependent variables. Additionally, a primary goal of the module is for students to be able to select and successfully apply appropriate advanced regression models in applied settings.
The module will culminate with multinomial models and ordinal scaled variables.
Machine learning algorithms are new generation algorithms used in conjunction with classical predictive modelling methods.
In this Machine Learning 1 module, learners will understand applications of the Support vector machine, K Nearest Neighbours and Naive Bayes algorithms for classification and regression problems. Additionally, students will develop practical machine learning and data science skills including theoretical basics of a broad range of machine learning concepts and methods with practical applications to sample datasets.
Machine learning algorithms are new generation algorithms used in conjunction with classical predictive modelling methods.
Machine learning algorithms are new generation algorithms used in conjunction with classical predictive modelling methods. In this Machine Learning 2 module, learners will understand applications of decision tree and random forest algorithms and neural networks for classification and regression problems. Additionally, students will develop practical machine learning and data science skills including theoretical basics of a broad range of machine learning concepts and methods with practical applications to sample datasets.
In this module, students will look at analysing unstructured data such as that found on social media, newspaper articles, videos and more.
Specifically, students will look at text techniques for text mining and natural language processing using R and Python code to produce graphical representations of unstructured data and carry out sentiment analysis.
This module focuses on learning key concepts, tools and methodologies for natural language processing and emphasises hands-on learning through guided tutorials and real-world examples.
This unit provides learners with an opportunity to apply key knowledge and skills through project work. They will be able to select a project from a specific domain and will be required to carry out various data management, exploratory data analysis, data visualisation and predictive modelling tasks.
If a student is pursuing either specialisation A or B, the Data Science in Practice work should deepen their engagement with this material.