 |
| STUDENTS
> Supervised Learning
|
Supervised Learning
Note:
Whilst every effort is made to keep the syllabus and assessment records correct
for this course, the precise details must be checked with the lecturer(s).
Code: | M055
(Also taught as: GI01)
|
Year: | 4 |
Prerequisites: | Basic mathematics, Calculus, Probability and statistics, Linear algebra |
Term: | 1 |
Taught By: | John Shawe-Taylor (100%)
|
Aims: | This module covers supervised approaches to machine learning. It starts by reviewing fundamentals of statistical decision theory and probabilistic pattern recognition followed by an in-depth introduction to various supervised learning algorithms such as Perceptron, Backpropagation algorithm, Decision trees, instance-based learning, support vector machines. Algorithmic-independent principles such as inductive bias, side information, approximation and estimation errors. Assessment of algorithms by jackknife and bootstrap error estimation, improvement of algorithms by voting methods such as boosting. Introduction to statistical learning theory, hypothesis classes, PAC learning model, VC-dimension, growth functions, empirical risk minimization, structural risk minimization.
|
Learning Outcomes: | Gain in-depth familiarity with various classical and contemporary supervised learning algorithms, understand the underlying limitations and principles that govern learning algorithms and ways of assessing and improving their performance, understand the underlying fundamentals of statistical learning theory, the complexity of learning and its relationship to generalization ability.
|
Content:
Overview and Introduction to Bayes Decision Theory | Machine Intelligence and Applications Pattern Recognition concepts Classification, Regression, Feature Selection Supervised Learning Class conditional probability distributions Examples of classifiers Bayes optimal classifier and error Learning classification approaches |
Linear machines | General and Linear Discriminants Decision regions Single layer neural network Linear separability, general position, number of dichotomies General gradient descent Perceptron learning algorithm Mean square criterion and Widrow-Hoff learning algorithm |
Multi-Layer Perceptrons | Introduction to Neural Networks, Two-Layers Universal approximators Backpropagation learning, on-line, off-line Error surface, important parameters |
Learning decision trees | Inference model, general domains, symbolic Decision trees, consistency Learning trees from training examples Entropy, mutual information ID3 algorithm criterion C4.5 algorithm Continuous test nodes, confidence Pruning Learning with incomplete data |
Instance-based Learning | Nearest neighbor classification k-Nearest neighbor Nearest Neighbor error probability, proof Simplification, Editing Example: Document retrieval Case-based reasoning Example: learning graphical structures |
Machine learning concepts and limitations | Fundamental algorithmic-independent concepts Hypothesis class, Target class Inductive bias, Occam's razor Empirical risk Limitations of inference machines Approximation and estimation errors Tradeoff |
Machine learning assessment and Improvement | Statistical Model Selection Structural Risk Minimization Practical methods for risk assessment based on resampling, Jackknife, Bootstrap Improving accuracy of general algorithms, Bagging, Boosting |
Learning Theory | Formal model of the learnable Sample complexity Learning in zero-Bayes and realizable case Growth function, VC-dimension VC-dimension of Vector space of functions, proof Empirical Risk Minimization over finite classes, sample complexity, proof Empirical Risk Minimization over infinite classes, risk upper bound, proof Lower bound on sample complexity |
Support Vector Machines | Margin of a classifier Dual Perceptron algorithm Learning non-linear hypotheses with perceptron Kernel functions, implicit non-linear feature space Theory: zero-Bayes, realizable infinite hypothesis class, finite covering, margin-based bounds on risk Maximal Margin classifier Learning support vector machines as a dual-optimization problem |
Method of Instruction:
Lecture presentations with associated class problems
Assessment:
The course has the following assessment components:
- Written Examination (2.5 hours, 75%)
- Coursework Section (1 piece, 25%)
To pass this course, students must:
- Obtain an overall pass mark of 50% for all sections combined
The examination rubric is: Answer three questions out of five. All questions carry equal marks. N.B. This course is examined in the pre-Easter examination session.Resources:
Text Book 1: The Elements of Statistical Learning: Data Mining, Inference and Prediction, Hastie.T., Tibshirani.R., and Friedman.J., Springer [2001]
Reference Book 1: Pattern Classification, Duda.R.O., Hart.P.E., and Stork.D.G., John Wiley and Sons (2001)
Reference Book 2: Pattern Recognition and Machine Learning, Bishop, Christopher M., Springer (2006)
Reference Book 3: An Introduction to Support Vector Machines, Shawe-Taylor J. and Cristianini N., Cambridge University Press (2000)
Reference Book 4: Kernel Methods for Pattern Analysis, Shawe-Taylor.J, and Cristianini N., Cambridge University Press (2004)
Lecture notes and module information
|
 |