UCL DEPARTMENT OF COMPUTER SCIENCE Information for Students
	Home	Admissions	Students	Careers	Research	Business	People	Help

Text size A A A A A

| STUDENTS > Supervised Learning |

Supervised Learning

Note: Whilst every effort is made to keep the syllabus and assessment records correct for this course, the precise details must be checked with the lecturer(s).

Code:	M055 (Also taught as: GI01)
Year:	4
Prerequisites:	Basic mathematics, Calculus, Probability and statistics, Linear algebra
Term:	1
Taught By:	John Shawe-Taylor (100%)
Aims:	This module covers supervised approaches to machine learning. It starts by reviewing fundamentals of statistical decision theory and probabilistic pattern recognition followed by an in-depth introduction to various supervised learning algorithms such as Perceptron, Backpropagation algorithm, Decision trees, instance-based learning, support vector machines. Algorithmic-independent principles such as inductive bias, side information, approximation and estimation errors. Assessment of algorithms by jackknife and bootstrap error estimation, improvement of algorithms by voting methods such as boosting. Introduction to statistical learning theory, hypothesis classes, PAC learning model, VC-dimension, growth functions, empirical risk minimization, structural risk minimization.
Learning Outcomes:	Gain in-depth familiarity with various classical and contemporary supervised learning algorithms, understand the underlying limitations and principles that govern learning algorithms and ways of assessing and improving their performance, understand the underlying fundamentals of statistical learning theory, the complexity of learning and its relationship to generalization ability.

Content:

Overview and Introduction to Bayes Decision Theory	Machine Intelligence and Applications Pattern Recognition concepts Classification, Regression, Feature Selection Supervised Learning Class conditional probability distributions Examples of classifiers Bayes optimal classifier and error Learning classification approaches
Linear machines	General and Linear Discriminants Decision regions Single layer neural network Linear separability, general position, number of dichotomies General gradient descent Perceptron learning algorithm Mean square criterion and Widrow-Hoff learning algorithm
Multi-Layer Perceptrons	Introduction to Neural Networks, Two-Layers Universal approximators Backpropagation learning, on-line, off-line Error surface, important parameters
Learning decision trees	Inference model, general domains, symbolic Decision trees, consistency Learning trees from training examples Entropy, mutual information ID3 algorithm criterion C4.5 algorithm Continuous test nodes, confidence Pruning Learning with incomplete data
Instance-based Learning	Nearest neighbor classification k-Nearest neighbor Nearest Neighbor error probability, proof Simplification, Editing Example: Document retrieval Case-based reasoning Example: learning graphical structures
Machine learning concepts and limitations	Fundamental algorithmic-independent concepts Hypothesis class, Target class Inductive bias, Occam's razor Empirical risk Limitations of inference machines Approximation and estimation errors Tradeoff
Machine learning assessment and Improvement	Statistical Model Selection Structural Risk Minimization Practical methods for risk assessment based on resampling, Jackknife, Bootstrap Improving accuracy of general algorithms, Bagging, Boosting
Learning Theory	Formal model of the learnable Sample complexity Learning in zero-Bayes and realizable case Growth function, VC-dimension VC-dimension of Vector space of functions, proof Empirical Risk Minimization over finite classes, sample complexity, proof Empirical Risk Minimization over infinite classes, risk upper bound, proof Lower bound on sample complexity
Support Vector Machines	Margin of a classifier Dual Perceptron algorithm Learning non-linear hypotheses with perceptron Kernel functions, implicit non-linear feature space Theory: zero-Bayes, realizable infinite hypothesis class, finite covering, margin-based bounds on risk Maximal Margin classifier Learning support vector machines as a dual-optimization problem

Method of Instruction:

Lecture presentations with associated class problems

Assessment:

The course has the following assessment components:

Written Examination (2.5 hours, 75%)
Coursework Section (1 piece, 25%)

To pass this course, students must:

Obtain an overall pass mark of 50% for all sections combined

The examination rubric is:
Answer three questions out of five. All questions carry equal marks. N.B. This course is examined in the pre-Easter examination session.

Resources:

Text Book 1: The Elements of Statistical Learning: Data Mining, Inference and Prediction, Hastie.T., Tibshirani.R., and Friedman.J., Springer [2001]

Reference Book 1: Pattern Classification, Duda.R.O., Hart.P.E., and Stork.D.G., John Wiley and Sons (2001)

Reference Book 2: Pattern Recognition and Machine Learning, Bishop, Christopher M., Springer (2006)

Reference Book 3: An Introduction to Support Vector Machines, Shawe-Taylor J. and Cristianini N., Cambridge University Press (2000)

Reference Book 4: Kernel Methods for Pattern Analysis, Shawe-Taylor.J, and Cristianini N., Cambridge University Press (2004)

Lecture notes and module information

This page last modified: 26 May, 2010 by Nicola Alexander