Semi-Supervised Learning Using an Unsupervised Atlas*

      Nikolaos Pitelis, Chris Russell, Lourdes Agapito,
      Machine Learning and Knowledge Discovery in Databases (
      ECML/PKDD 2014), Nancy, France, September 2014. Lecture Notes in Computer Science Volume 8725, 2014, pp 565-580.

    • Abstract
    • In many machine learning problems, high-dimensional datasets often lie on or near manifolds of locally low-rank. This knowledge can be exploited to avoid the “curse of dimensionality” when learning a classifier. Explicit manifold learning formulations such as lleare rarely used for this purpose, and instead classifiers may make use of methods such as local coordinate coding or auto-encoders to implicitly characterise the manifold. We propose novel manifold-based kernels for semi-supervised and supervised learning.
      We show how smooth classifiers can be learnt from existing descriptions of manifolds that characterise the manifold as a set of piecewise affine charts, or an atlas. We experimentally validate the importance of this smoothness vs. the more natural piecewise smooth classifiers, and we show a significant improvement over competing methods on standard datasets. In the semi-supervised learning setting our experiments show how using unlabelled data to learn the detailed shape of the underlying manifold substantially improves the accuracy of a classifier trained on limited labelled data.

    • Motivation

    • Knowledge of the underlying manifold structure of data can improve classification accuracy.

    • Method Overview

    • As a complete method ours is a two-step approach:
      • Unsupervised learning of the underlying manifold
        Approximate the manifold of data on the original space by fitting an atlas of low-dimensional overlapping affine charts.
      • Supervised training of an SVM
        A new family of Mercer Kernels for SVM-based supervised learning that make use of a soft assignment of datapoints to the underlying low-dimensional affine charts.

    • Learning Outcome
    • The huge improvement on classification error, especially for very few training samples, shows how powerful the manifold structure is --our intuition for Semi-Supervised Learning was right!
      • 1/100 of the training data is only 600 labelled points.

    • Results