ABSTRACT

Can we make Information Extraction more adaptive?

Prof. Yorick Wilks, Dept. of Computer Science, University of Sheffield

It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values of around 70%-- and much higher at some sub-tasks---which puts it in about the same class as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains.

The seminar will cover some methods that have been tried to ease this problem, and to create something more rapid than the benchmark 3-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have, at best, pre-existing templates ready to hand, as opposed to a user having only a vague idea of what is needed from a corpus.

There will also be coverage of attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in IR of tuning to a user's needs with feedback at an interface can be transferred to IE

Maintained by rbennett@cs.ucl.ac.uk