The course teaches students comprehensive and specialised subjects in computer science; it teaches students cutting edge engineering skills to solve real-world problems using computational thinking and tools, as well as soft skills in communication, collaboration, and project management that enable students to succeed in real-world business environments. Most of this program is case (or) project-based where students learn by solving real-world problems end to end. This program has core courses that focus on computational thinking and problems solving from first principles. The core courses are followed by specialization courses that teach various aspects of building real-world systems. This is followed by more advanced courses that focus on research level topics, which cover state of the art methods. The program also has a capstone project at the end, wherein students can either work on building end to end solutions to real world problems (or) work on a research topic. The program also focuses on teaching the students the “ability to learn” so that they can be lifelong learners constantly upgrading their skills. Students can choose from a spectrum of courses to specialize in a specific sub-area of Computer Science like Artificial Intelligence and Machine Learning, Cloud Computing, Software Engineering, or Data Science, etc.
This course helps students translate advanced mathematical/ statistical/ scientific concepts into code. This is a module for writing code to solve real-world problems. It introduces programming concepts (such as control structures, recursion, classes and objects) assuming no prior programming knowledge, to make this course accessible to advanced professionals from scientific fields like Biology, Physics, Medicine, Chemistry, Civil & Mechanical Engineering etc. After building a strong foundation for converting scientific knowledge into programming concepts, the course advances to dive deeply into Object-Oriented Programming and its methodologies. It also covers when and how to use inbuilt-data structures like 1-Dimensional and 2-Dimensional Arrays before introducing the concepts of computational complexity to help students write optimised code using appropriate data structures and algorithmic design methods. The module can be taught to allow students to learn these concepts using a modern programming language such as Java or Python. The course offers students the ability to identify and solve computer programming problems in scientific fields at a graduate level. The course prepares students to handle advanced data structures and algorithm design methods in the separate module, ‘Data Structures’.
This is a core and foundational course which aims to equip the student with the ability to model, design, implement and query relational database systems for real-world data storage & processing needs. Students would start with diagrammatic tools (ER-diagram) to map a real world data storage problem into entities, relationships and keys. Then, they learn to translate the ER-diagram into a relational model with tables. SQL is then introduced as a de facto tool to create, modify, append, delete, query and manipulate data in a relational database. Due toSQL’s popularity, the course spends considerable time building the ability to write optimized and complex queries for various data manipulation tasks. The module exposes students to various real world SQL examples to build solid practical knowledge. Students then move on to understanding various trade-offs in modern relational databases like the ones between storage space and latency. Designing a database would need a solid understanding of normal forms to minimize data duplication, indexing for speedup and flattening tables to avoid complex joins in low-latency environments. These real-world database design strategies are discussed with practical examples from various domains. Most of this course uses the open source MySQL database and cloud-hosted relational databases (like Amazon RDS) to help students apply the concepts learned on real databases via assignments.
The ability to solve problems is a skill, and just like any other skill, the more one practices, the better one gets. So how exactly does one practice problem solving? Learning about different problem-solving strategies and when to use them will give a good start. Problem solving is a process. Most strategies provide steps that help you identify the problem and choose the best solution.Building a toolbox of problem-solving strategies will improve problem solving skills.With practice, students will be able to recognize and choose among multiple strategies to find the most appropriate one to solve complex problems. The course will focus on developing problem-solving strategies such as abstraction, modularity, recursion, iteration, bisection, and exhaustive enumeration. The course will also introduce arrays and some of their real-world applications, such as prefix sum, carry forward, subarrays, and 2-dimensional matrices. Examples will include industry-relevant problems and dive deeply into building their solutions with various approaches, recognizing each’s limitations (i.e when to use a data structure and when not to use a data structure). By the end of this course a student can come up with the best strategy which can optimize both time and space complexities by choosing the best data structure suitable for a given problem.
This course helps students translate mathematical/statistical/scientific concepts into code. This is a foundational course for writing code to solve Data Science ML & AI problems. It introduces basic programming concepts (like control structures, recursion, classes and objects) from scratch, assuming no prerequisites, to make this course accessible to students from non-computational scientific fields like Biology, Physics, Medicine, Chemistry, Civil & Mechanical Engineering etc. After building a strong foundation, the course advances to dive deep into core Mathematical libraries like NumPy, Scipy and Pandas. Students also learn when and how to use inbuilt-data structures like Lists, Dicts, Sets and Tuples. The module introduces the concepts of computational complexity to help students write optimized code using appropriate data structures and algorithmic design methods. The module does not dive deep into the data structures and algorithm design methods in this course -– that is available in the ‘Data Structures and Algorithms’ module. This course is valuable for all students specializing in mathematical sub-areas of CS like ML, Data Science, Scientific Computing etc.
This course introduces basic probability theory , statistical methods and computational algorithms to perform mathematically rigorous data analysis. The course starts with basic foundational concepts of random variables, histograms, and various plots (PMF, PDF and CDF). Students learn various popular discrete and continuous distributions like Bernoulli, Binomial, Poisson, Gaussian, Exponential,Pareto, log-normal etc., both mathematically and from an applicative perspective.Students learn various measures like mean, median, percentiles, quantiles, variance and interquartile-range. Students learn the pros and cons of each metric and understand when and how to use them in practice. Students will learn conditional probability and Bayes theorem in the applied context of real-world problems in medicine and healthcare. The module teaches the foundations of non-parametric statistics and applies them to solve problems using computational tools. Students learn various methods to determine correlations rigorously in data. This is followed by applied and mathematical understanding of the statistics underlying control treatment (A/B) experiments and hypothesis testing. The module engages computation tools in modern statics like Bootstrapping, Monte-Carlo methods,RANSAC etc.
This course focuses on building basic classification and regression models and understanding these models rigorously both with a mathematical and an applicative focus. The module starts with a basic introduction to high dimensional geometry of points, distance-metrics, hyperplanes and hyperspheres. We build on top this to introduce the mathematical formulation of logistic regression to find a separating hyperplane. Students learn to solve the optimization problem usingvector calculus and gradient descent (GD) based algorithms. The module introduces computational variations of GD like mini-batch and stochastic gradient descent.Students also learn other popular classification and regression methods like k-Nearest Neighbours, NaiveI Bayes, Decision Trees, Linear Regression etc. Students also learn how each of these techniques under various real world situations like the presence of outliers, imbalanced data, multi class classification etc. Students learn bias and variance trade-off and various techniques to avoid overfitting and underfitting. Students also study these algorithms from a Bayesian viewpoint along with geometric intuition. This module is hands-on and students apply all these classical techniques to real world problems.
This course introduces more advanced ML techniques like ensembles: bagging, boosting, cascading and stacking classifiers and regressors. It covers both the theoretical foundations and applicative details of these techniques along with popular implementations of boosting like LightGBM, CatBoost and XGBoost. Students also delve into kernel methods with specific focus on SVMs for classification and regression. Students will study state of the art model agnostic feature importance and model interpretability techniques like LIME and SHAP.Students also study classical NLP based text encoding methods like Bag-of-words,TF-IDF etc. The module teaches various classical methods in time series analysis and forecasting like ARMA, ARIMA etc. Students also learn how to pose time series forecasting problems as regression and classification problems to leverage well studied ML techniques. This is followed by various domain and problem specificFeature engineering techniques that are often helpful in real world problem solving. Students will study methods like error analysis, ablative analysis etc., to debug and understand why and where a model is performing well and where it is not performing well. This will further help us in designing appropriate features.Students study model calibration techniques like Platt Scaling, Isotonic Regression etc. Later in this course, we cover how to build recommender systems using content-based and collaborative filtering methods. The module also teaches the detailed solution of the Netflix prize (2009) and various recent advances in RecSys.
This course provides a strong mathematical and applicative introduction to DeepLearning. The module starts with the perceptron model as an over simplified approximation to a biological neuron. We motivate the need for a network of neurons and how they can be connected to form a Multi Layered Perceptron(MLPs). This is followed by a rigorous understanding of back-propagation algorithms and its limitations from the 1980s. Students study how modern deep learning took off with improved computational tools and data sets. We teach moremodern activation units (like ReLU and SeLU) and how they overcome problemswith the more classical Sigmoid and Tanh units. Students learn weight initializationmethods, regularization by dropouts, batch normalization etc., to ensure that deepMLPs can be successfully trained. The module teaches variants of Gradient Descent that have been specifically designed to work well for deep learning systems likeADAM, AdaGrad, RMSProp etc. Students also learn AutoEncoders, VAEs andWord2Vec as unsupervised, encoding deep-learning architectures. We apply all ofthe foundational theory learned to various real world problems using TensorFlow 2and Keras. Students also understand how TensorFlow 2 works internally with specific focus on computational graph processing.
This course provides a comprehensive overview of Computer vision problems and how they can be tackled using various Convolutional Neural networks (CNNs). Students start with classical image processing operations like edge detection, convolution, shape detectors and colour space conversions. This is followed by a foundational understanding of Deep-Convolutional Neural networks and how their training and evaluation works. We introduce various CNN specific layers like pooling-layers and upsampling layers. We also introduce various Data Augmentation techniques that are very helpful for image-related problems. This is followed by a dive deep into the internals of popular CNN architectures like: AlexNet, VGGNet, ResNet etc. Students also learn how to use these methods practically for transfer learning. Students will study how various computer-vision related tasks like image segmentation, image-generation, object detection and localization, contrastive learning etc., can be performed using state of the art algorithms for each of these tasks. Most of these techniques would be studied directly from the original research papers and open-source code provided by the authors. Students would also implement some of these algorithms from scratch in this course.
This course focuses on modelling sequences (text, music, time-series, genes) using deep-learning models. We start with a simple Recurrent Neural Network and its limitations with long-sequences. Students learn LSTMs and GRUs which can handle significantly longer sequences to model sequence data like text, music, gene-sequences and time-series data. We study variations of LSTM like bi-directionalLSTMs and encoder-decoder architectures. This is followed by a detailed study of attention mechanism and Transformer based models which are currently the state-of-the-art for NLP and sequence modelling. The module teaches encoder decoder Transformers, BERT, BERT-variations, GPT-1,2 &3 models from both the architectural and mathematical viewpoints and also a practical viewpoint. Students learn to implement many of these complex models from scratch (using TensorFlow2 and Keras) to gain a deeper understanding of how they work internally. Students will study popular applications of deep-learning in NLP like parts-of-speech tagging, question-answering systems, conversational engines (chatbots), Semantic search with low-latency etc. For each of these problems, Students will study cutting edge deep-learning models along with code implementations.
This course aims to build the core competency of building real world end-to-endML systems and deploy them into production for a variety of problems and scenarios. Students would learn a variety of ML systems ranging from high throughput and low latency internet scale systems to low compute power and energy constrained IoT devices like smart watches. Students will study the ML lifecycle and various components in detail. We also use real world ML platforms likeGoogle’s KubeFlow, TensorFlow Lite, and Amazon’s SageMaker to implement real world systems and understand the engineering trade-offs and challenges. Students also learn relevant technologies and tools like Containerization (Docker) and Container Orchestration (Kubernetes) and Git which are often used extensively in real world scalable ML systems. This course is a hands-on course where we solve multiple real world cases and discuss solutions built by various companies and organizations to provide the students a comprehensive understanding of varied systems and design choices.
This course provides an in-depth understanding of distributed systems for ML andDeep Learning using CPU, GPU and TPU clusters. It starts with foundations of Map-reduce framework and in-memory distributed and resilient data structures that form the backbone of Spark. Students will learn the architectural details of these distributed system platforms and how they can be leveraged to perform data analysis and model training on petabyte scale datasets. We cover how distributed training is achieved for popular ML algorithms on Spark by understanding the internal working of SparkMLLib. The module then focuses on understanding distributed graph processing using GraphX. Students move on to Deep-Learning algorithms and how distributed algorithms can be designed for them when we have GPU or TPU clusters at our disposal. We also dive deep into how TensorFlow archives distributed computing for popular Deep Learning algorithms. Students will study distributed data stores and how they can be used for ML using popular datastore systems like Hive and SparkSQL. The module concludes by discussing state of the art distributed, low latency approximate nearest neighbour algorithms along with their implementations in Elastic Search
Advanced Applied Computer Science is a capstone project, an end-to-end deployable solution to a real-world computational problem that students build in the last phase of the programme. Its objective is to help students rigorously solve a technically challenging problem where they would apply all of the concepts, techniques and tools learned in the programme. Students typically pick a problem from their specialisation after discussing it with the course instructor(s). Students also have the option of working on a real-world problem in their company/ organization/ institution. They can be mentored by an expert supervisor from their organization along with an academic supervisor from Woolf. All external expert-supervisors and projects need to be approved by the instructor(s) to ensure that the project is technically challenging and the solution being built is rigorous
and of high quality. Students start with identifying a technically challenging problem. Once approved by the instructor(s), they start the literature survey to read research papers and technical reports of prior related work. Then, they build the system design and write a design document to solve the problem. This would be followed by designing and implementing individual modules and testing them. This would be followed by deploying the solution and making it available to end users while satisfying the problem’s real-world constraints and objectives. Students then document their work into a detailed technical report.