Home Admissions Students Careers Research Business People Help
Text size A A A A A

| STUDENTS > Supervised Learning |

Supervised Learning

Note: Whilst every effort is made to keep the syllabus and assessment records correct for this course, the precise details must be checked with the lecturer(s).


Code: M055 (Also taught as: GI01)
Year:4
Prerequisites:Basic mathematics, Calculus, Probability and statistics, Linear algebra
Term: 1
Taught By: John Shawe-Taylor (100%)
Aims:This module covers supervised approaches to machine learning. It starts by reviewing fundamentals of statistical decision theory and probabilistic pattern recognition followed by an in-depth introduction to various supervised learning algorithms such as Perceptron, Backpropagation algorithm, Decision trees, instance-based learning, support vector machines. Algorithmic-independent principles such as inductive bias, side information, approximation and estimation errors. Assessment of algorithms by jackknife and bootstrap error estimation, improvement of algorithms by voting methods such as boosting. Introduction to statistical learning theory, hypothesis classes, PAC learning model, VC-dimension, growth functions, empirical risk minimization, structural risk minimization.
Learning Outcomes:Gain in-depth familiarity with various classical and contemporary supervised learning algorithms, understand the underlying limitations and principles that govern learning algorithms and ways of assessing and improving their performance, understand the underlying fundamentals of statistical learning theory, the complexity of learning and its relationship to generalization ability.

Content:

Overview and Introduction to Bayes Decision TheoryMachine Intelligence and Applications
Pattern Recognition concepts
Classification, Regression, Feature Selection
Supervised Learning
Class conditional probability distributions
Examples of classifiers
Bayes optimal classifier and error
Learning classification approaches
Linear machinesGeneral and Linear Discriminants
Decision regions
Single layer neural network
Linear separability, general position, number of dichotomies
General gradient descent
Perceptron learning algorithm
Mean square criterion and Widrow-Hoff learning algorithm
Multi-Layer PerceptronsIntroduction to Neural Networks, Two-Layers
Universal approximators
Backpropagation learning, on-line, off-line
Error surface, important parameters
Learning decision treesInference model, general domains, symbolic
Decision trees, consistency
Learning trees from training examples
Entropy, mutual information
ID3 algorithm criterion
C4.5 algorithm
Continuous test nodes, confidence
Pruning
Learning with incomplete data
Instance-based LearningNearest neighbor classification
k-Nearest neighbor
Nearest Neighbor error probability, proof
Simplification, Editing
Example: Document retrieval
Case-based reasoning
Example: learning graphical structures
Machine learning concepts and limitationsFundamental algorithmic-independent concepts
Hypothesis class, Target class
Inductive bias, Occam's razor
Empirical risk
Limitations of inference machines
Approximation and estimation errors
Tradeoff
Machine learning assessment and ImprovementStatistical Model Selection
Structural Risk Minimization
Practical methods for risk assessment based on resampling, Jackknife, Bootstrap
Improving accuracy of general algorithms, Bagging, Boosting
Learning TheoryFormal model of the learnable
Sample complexity
Learning in zero-Bayes and realizable case
Growth function, VC-dimension
VC-dimension of Vector space of functions, proof
Empirical Risk Minimization over finite classes, sample complexity, proof
Empirical Risk Minimization over infinite classes, risk upper bound, proof
Lower bound on sample complexity
Support Vector MachinesMargin of a classifier
Dual Perceptron algorithm
Learning non-linear hypotheses with perceptron
Kernel functions, implicit non-linear feature space
Theory: zero-Bayes, realizable infinite hypothesis class, finite covering, margin-based bounds on risk
Maximal Margin classifier
Learning support vector machines as a dual-optimization problem

Method of Instruction:

Lecture presentations with associated class problems

Assessment:

The course has the following assessment components:

  • Written Examination (2.5 hours, 75%)
  • Coursework Section (1 piece, 25%)
To pass this course, students must:
  • Obtain an overall pass mark of 50% for all sections combined
The examination rubric is:
Answer three questions out of five. All questions carry equal marks. N.B. This course is examined in the pre-Easter examination session.

Resources:

Text Book 1: The Elements of Statistical Learning: Data Mining, Inference and Prediction, Hastie.T., Tibshirani.R., and Friedman.J., Springer [2001]

Reference Book 1: Pattern Classification, Duda.R.O., Hart.P.E., and Stork.D.G., John Wiley and Sons (2001)

Reference Book 2: Pattern Recognition and Machine Learning, Bishop, Christopher M., Springer (2006)

Reference Book 3: An Introduction to Support Vector Machines, Shawe-Taylor J. and Cristianini N., Cambridge University Press (2000)

Reference Book 4: Kernel Methods for Pattern Analysis, Shawe-Taylor.J, and Cristianini N., Cambridge University Press (2004)

Lecture notes and module information

This page last modified: 26 May, 2010 by Nicola Alexander

Computer Science Department - University College London - Gower Street - London - WC1E 6BT - Telephone: +44 (0)20 7679 7214 - Copyright © 1999-2007 UCL


Search by Google
Link to UCL home page