Before computer programs can exploit corpora for practical applications, e.g. support for translation or language teaching, the texts must be preprocessed to add linguistic information (part-of-speech tags, other information derived from parsing and even semantic information).
The seminar will introduce the subject of how to do this automatically by using knowledge extracted from corpora at lexical, grammatical and syntactic levels, and summarise the state of the art in such computing. Open research issues, and possible future applications of corpora (the two topics are linked), will also be described.
Computer Science Home Page