NIPS Workshop: 11 December 2009, Whistler, Canada
Now is the time to revisit some of the fundamental grammar/language learning tasks such as grammar acquisition, language acquisition, language change, and the general problem of automatically inferring generic representations of language structure in a data driven manner.
Though the underlying problems have been known to be computationally intractable for the standard representations of the Chomsky hierarchy, such as regular grammars and context free grammars, progress has been made by modifying or restricting these classes to make them more observable. Generalisations of distributional learning have shown promise in unsupervised learning of linguistic structure using tree based representations, or using non-parametric approaches to inference. More radically, significant advances in this domain have been made by switching to different representations such as the work in Clark, Eyrand & Habrard (2008) that addresses the issue of language acquisition, but has the potential to cross-fertilise a wide range of problems that require data driven representations of language. Such approaches are starting to make inroads into one of the fundamental problems of cognitive science: that of learning complex representations that encode meaning. This adds a further motivation for returning to this topic at this point.
Grammar induction was the subject of an intense study in the early days of Computational Learning Theory, with the theory of query learning largely developing out of this research. More recently the study of new methods of representing language and grammars through complex kernels and probabilistic modelling together with algorithms such as structured output learning has enabled machine learning methods to be applied successfully to a range of language related tasks from simple topic classification through parts of speech tagging to statistical machine translation. These methods typically rely on more fluid structures than those derived from formal grammars and yet are able to compete favourably with classical grammatical approaches that require significant input from domain experts, often in the form of annotated data.