ABSTRACT

Tagging and Parsing Linguistic Corpora for Language and Speech Applications

Alex Chengyu Fang, Department of Phonetics and Linguistics, University College London

Linguistic "corpora" are collections of texts organised, and often analysed, according to specifics of their intended applications. They have been applied extensively in linguistics, e.g. to show what usages and idioms actually exist in language used in different places (such as the different countries where English is an official language) or by different groups, and in cognitive psychology.

Before computer programs can exploit corpora for practical applications, e.g. support for translation or language teaching, the texts must be preprocessed to add linguistic information (part-of-speech tags, other information derived from parsing and even semantic information).

The seminar will introduce the subject of how to do this automatically by using knowledge extracted from corpora at lexical, grammatical and syntactic levels, and summarise the state of the art in such computing. Open research issues, and possible future applications of corpora (the two topics are linked), will also be described.

Computer Science Home Page

Maintained by rbennett@cs.ucl.ac.uk