NLP

Home NLP

Introduction

Technology applied

Syntax

Semantics

Conclusions

[Download formato Word]

2 - Technology Applied

Where do we come in touch with NLP applications?

Basically, we will now examine NLP applications in real life. Naturally, you can find many more and different uses of machines dealing with language, but here only a few of the more widespread and interesting possible features are portrayed and commented.

On this page: 2.1 - Translation | 2.2 - Database Access| 2.3 - Dealing with text |

Dealing with text	Nowadays, instead of using structured constructs to query the given database, you first need to find out where the data in question is stored and then deduce the relevant facts from it. While searching for documents, you can hardly foresee where and what information will turn up, and so you need some flexibility in posing your questions. Mostly you are interested in texts concerning a specific topic, thus you can classify all documents in scope according to certain criteria like word occurrence etc. This is the task of information retrieval, text categorization and data extraction applications. From a given set of documents, information retrieval strives to find a subset that corresponds best to the query. In one way or another, such a program searches all documents for occurrences of the user-provided keywords and returns the relevant ones. In the beginning of IR history, documents that existed on paper were stored in machine-readable form and manually indexed. Performing the indexing in the computer's memory allows for much larger and more precise indexes, but is also more time-consuming and therefore expensive. So it would be preferable to skip the indexing at all and operate not on pre-made indexes but rather on the whole texts. This is the idea behind free-text searching which soon became popular. Nevertheless, critics argued that free-text search might turn out not to be as successful as indexed searching. Indexing is not merely taking words from the document and putting them in a structured database but making a choice as to the quality of the selected keywords, too. Not all words have equally important roles in a text - e.g., conjunctions as "and" and "or" or articles like "a" are so ubiquitous that using them as search keywords would be very inappropriate, as nearly all texts include them. Eventually it turned out that automatic free-text search has no worse results than manual indexing, and consequently this information retrieval technique experienced intensified research. These models plainly leave out any syntactic or semantic aspects, they work "almost entirely at word level". The artificial intelligence community tried to prove that analyzing the text with more sophisticated natural language processing methods can give better results. In the news service market, several providers perform automatic text categorization ("sorting texts into fixed categories"). While there was usually a large human work factor involved in this task, NLP systems can reliably take over and minimize both costs and inconsistencies while maximizing speed. A similar task is data extraction. Here, you are given a text; from this you try to extract whatever information is enclosed and put it in a database available for later querying. <---

Dealing with text

Nowadays, instead of using structured constructs to query the given database, you first need to find out where the data in question is stored and then deduce the relevant facts from it. While searching for documents, you can hardly foresee where and what information will turn up, and so you need some flexibility in posing your questions. Mostly you are interested in texts concerning a specific topic, thus you can classify all documents in scope according to certain criteria like word occurrence etc. This is the task of information retrieval, text categorization and data extraction applications.

From a given set of documents, information retrieval strives to find a subset that corresponds best to the query. In one way or another, such a program searches all documents for occurrences of the user-provided keywords and returns the relevant ones.

In the beginning of IR history, documents that existed on paper were stored in machine-readable form and manually indexed. Performing the indexing in the computer's memory allows for much larger and more precise indexes, but is also more time-consuming and therefore expensive. So it would be preferable to skip the indexing at all and operate not on pre-made indexes but rather on the whole texts. This is the idea behind free-text searching which soon became popular.
Nevertheless, critics argued that free-text search might turn out not to be as successful as indexed searching. Indexing is not merely taking words from the document and putting them in a structured database but making a choice as to the quality of the selected keywords, too. Not all words have equally important roles in a text - e.g., conjunctions as "and" and "or" or articles like "a" are so ubiquitous that using them as search keywords would be very inappropriate, as nearly all texts include them.
Eventually it turned out that automatic free-text search has no worse results than manual indexing, and consequently this information retrieval technique experienced intensified research.
These models plainly leave out any syntactic or semantic aspects, they work "almost entirely at word level". The artificial intelligence community tried to prove that analyzing the text with more sophisticated natural language processing methods can give better results.
In the news service market, several providers perform automatic text categorization ("sorting texts into fixed categories"). While there was usually a large human work factor involved in this task, NLP systems can reliably take over and minimize both costs and inconsistencies while maximizing speed.

A similar task is data extraction. Here, you are given a text; from this you try to extract whatever information is enclosed and put it in a database available for later querying.

<---

Natural Language Processing | Project of Multimedia Systems EECS 579 | update: 22/12/2000 | Daniele Quercia