On Pairwise Clustering with Side Information,  S. Pasteris, F. Vitale, C. Gentile, M. Herbster, ALT 2018.  

Mistake Bounds for Binary Matrix Completion,  M. Herbster, S. Pasteris, M. Pontil, NIPS 2016.  

Online Prediction at the Limit of Zero Temperature,  M. Herbster, S. Pasteris, S. Ghosh, NIPS 2015.  

Predicting a Switching Sequence of Graph Labelings, M. Herbster, S. Pasteris, M. Pontil, Journal of Machine Learning Research pp 2003-2022, September 2015  

Online Similarity Prediction of Networked Data from Known and Unknown Graphs,  C. Gentile, M. Herbster, S. Pasteris, COLT 2013.  [paper]

Online Sum-Product Computation over Trees,  M. Herbster, S. Pasteris, F. Vitale, NIPS 2012.  [paper]

Efficient Prediction for Tree Markov Random Fields in a Streaming Model, M.Herbster, S. Pasteris, F. Vitale, NIPS Workshop on Discrete Optimization in Machine Learning (DISCML) 2011: Uncertainty, Generalization and Feedback.   [paper] 

A Triangle Inequality for p-Resistance , M. Herbster, Networks Across Disciplines: Theory and Applications : Workshop @ NIPS 2010.   [paper] 

Predicting the Labelling of a Graph via Minimum p-Seminorm Interpolation, M. Herbster and G. Lever, COLT 2009.   [paper]  [slides]1

Fast Prediction on a Tree, M. Herbster, M. Pontil, S. Rojas Galeano, NIPS 22, 2008.   [paper]  [slides]

Online Prediction on Large Diameter Graphs, M. Herbster, G. Lever, and M. Pontil, NIPS 22, 2008.   [paper]  [1-slide]

Exploiting cluster-structure to predict the labeling of a graph, M.Herbster, Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT'08), 2008.   [paper]  [slides]

A Linear Lower Bound for the Perceptron for Input Sets of Constant Cardinality, M. Herbster, Research Note, Dept. of Computer Science, UCL, March 2008. (Updated 18 May 08)

A fast method to predict the labeling of a tree S.R. Galeano, and M.Herbster, Graph Labeling Workshop (ECML-2007), 2007.

Prediction on a graph with a perceptron M. Herbster, and M. Pontil, NIPS 20, 2006.  

Combining graph laplacians for semi--supervised learning A. Argyriou, M. Herbster, and M. Pontil, NIPS 19, 2005.  

Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005.  [paper]  [slides]

Relative Loss Bounds and Polynomial-time Predictions for the K-LMS-NET Algorithm  M. Herbster, Proceedings of The 15th International Conference on Algorithmic Learning Theory, October 2004.

An online algorithm is given whose hypothesis class is a union of parameterized kernel spaces, for example the set of spaces induced by Gaussian kernel when the width is varied. We give relative loss bounds and a tractable algorithm for specific kernels.

Relative loss bounds for predicting almost as well as any function in a union of Gaussian reproducing kernel spaces with varying widths  M. Herbster, Poster at Mathematical Foundations Learning Theory, June 2004

Tracking the best linear predictor Mark Herbster and Manfred Warmuth, Journal of Machine Learning Research pp 281-309, September 2001.

We extend the results of "Tracking the best expert" (see below) to linear combinations.

Learning additive models online with fast evaluating kernels Mark Herbster, An abstract appeared in Proceedings 14th Annual Conference on Computational Learning Theory pp 444-460, July 2001.

Exponentially many local minima for single neurons Peter Auer, Mark Herbster and Manfred Warmuth, Neural Information Processing Systems 1996
We show that for a single neuron with the logistic function as the transfer function the number of local minima of the error function based on the square loss can grow exponentially in the dimension.
Tracking the best expert [Long Version] Mark Herbster and Manfred Warmuth, Machine Learning, Aug. 1998, vol.32, (no.2):151-78
We generalize the recent worst-case loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts of each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single expert case the additional loss is proportional to $\log n$, where $n$ is the number of experts and the constant of proportionality depends on the loss function. When the number of segments is at most $k+1$ and the sequence of length $\ell$ then we can bound the additional loss of our algorithm over the best partitioning by $O(k \log n + k \log(\ell/k))$. Note that it takes the same order of bits to denote the sequence of experts and the boundaries of the segments. When the loss per trial is bounded by one then we obtain additional loss bounds that are independent of the length of the sequence. The bound becomes $O(k\log n+ k \log(L/k))$, where $L$ is the loss of the best partition into $k+1$ segments. Our algorithms for tracking the best expert are simple adaptations of Vovk's original algorithm for the single best expert case. These algorithms keep one weight per expert and spend $O(1)$ time per weight in each trial.
RNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars, Leslie Grate, Mark Herbster, Richard Hughey, David Haussler I. Saira Mian, and Harry Noller, Proceedings of Intelligent Systems in Molecular Biology 1994
A new method of discovering the common secondary structure of a family of homologous RNA sequences using Gibbs sampling and stochastic context-free grammars is proposed. Given an unaligned set of sequences, a Gibbs sampling step simultaneously estimates the secondary structure of each sequence and a set of statistical parameters describing the common secondary structure of the set as a whole. These parameters describe a statistical model of the family. After the Gibbs sampling has produced a crude statistical model for the family, this model is translated into a stochastic context-free grammar, which is then refined by an Expectation Maximization (EM) procedure to produce a more complete model. A prototype implementation of the method is tested on tRNA, pieces of 16S rRNA and on U5 snRNA with good results.

  1. Sparsity in Machine Learning and Statistics Workshop @ Cumberland Lodge, April 1-3, 2009"

maintained by Mark Herbster /