Fishing for information using WordNet.

 

 

Trying to find and organize information in a systematic way in knowledge management can be overwhelmingly challenging. Perhaps it could be put on a par with trying to catch a fish with your bare hands, or trying to do a sculpture by chiseling jelly!

 

Much of the knowledge management approach rides on knowledge being in textual form at some stage. Since much information is already in textual form, and processes are being implemented in many organization to get more information on the knowledge in the organization into textual form, one of the key goals of technology for knowledge manegement is to better harness this textual information.

 

Information management obviously is very important and applicable in this: The vast experience in using lexical resources such as theasuari is readily applicable. However, knowledge management raises wider problems than have been addressed with information management – knowledge management is covering much more diverse types of information and much greater volumes of information. Consider for example trying to glean information on the state of internet retailing by a trawl of the web. So the problems of finding and organizing relevant information is much more complex, and handling word sense is a significant factor in these problems (see Box 1).

 

One route to getting on top of this wider ranging information is to use more sophisticated lexical resources that are coming out of research in artificial intelligence. Of particular relevance is the technology of semantic networks (see Box 2). These have been incorporated in a variety of software systems including machine translation tools, information retrieval systems, and web search engines. The best way of structuring semantic networks is still the subject of much research but a highly influential example is WordNet® -- not least because of its size.

 

WordNet is an online lexical reference system. It based on a massive semantic network containing over 90,000 word senses. The design of the system is inspired by current psycholinguistic theories of human lexical memory, and it has been developed over the last ten years by the Cognitive Sciences Laboratory at Princeton University led by George Miller. The basic building block is a synset, which is essentially a context-sensitive grouping of synonyms (for examples see Box 3), and these are linked using various types of relationship (for examples see Box 4)

 

Whilst WordNet has been a vehicle for much research on cognitive sceince, the system is intended to be used by anyone as a general purpose reference source on words. It is freely available for consultation over the web (www.cogsci.princeton.edu/~wn/) and it can be downloaded for installation on your own machine. Furthermore, it can be used, copied, modified and distributed as part of commercial software, and it is possible to extend and adapt the lexical knowledge for particular applications.

 

Given the valuable resource of WordNet, a number of projects have spawned that are attempting to exploit the semantic network. Information retrieval is an obvious application area. The WordNet goup has been working on a mapping of the WordNet noun hierarchies to hierarchies for subject headings such as the Library Congress Subject Headings (LCSH). In particular, they have developed heuristics to map the Thesaurus of Graphical Materials (TGMI) to WordNet. Other groups have been trialing its use in machine translation and in web search engines.

 

It is likely that general purpose semantic networks of this magnitude will be incorporated in products in the relatively near future. Most likely, these be will be semantic networks for particular communities such as example for the pharmaceuticals industries or for the legal profession.

 

.

Anthony Hunter is a lecturer in computer science at University College London and can be contacted on a.hunter@cs.ucl.ac.uk.

 

 

 

 

 

 

Box 1: What is a word sense?

Most words are polysemious (i.e. most words have multiple meanings). Consider a dictionary entry for the word "bank".

 

Bank noun (1) raised shelf of ground , slope; (2) ground at the edge of river; (3) mass of clouds; (4) establishment for custody of money; (5) money before keeper of gaming table.

 

Each of the numbered statements gives a different meaning, or word sense, for the word. A difficulty with this breakdown is that we may wish to sub-divide the definition. For example, sense (4) could be divided into (4a) a building you go to cash a cheque, and (4b) a company that holds your savings. This then means sense (4) is both a type of "company" and a type of "building". Another difficulty is that the meaning of the word is partly dependent on the overlaps between the different senses of the word. So for example, (1) and (2) overlap and intuitively reinforce each other. Finding the optimal breakdown for a definition is difficult, but if done appropriately it can be an important resource in finding and organizing information.

 

 

 

 

 

 

 

Box 2: What is a semantic network?

A semantic network is a way of representing relationships between concepts. Often each concept is represented by a word or set of words. A simple example is a hierchical network where the concepts are taxonomic terms from biology, and the only type of relationship is type-of:

 

 

 

 

 

 

 

 

More complex semantic networks many include a variety of types of relationship such as hardness, temp, made-of, texture and colour:

 

 



 

 

 

 

 

 

 

 

 

 

Box 3: What is a synset?

A set of words that can be regarded as strict synonyms (i.e. the words can be interchanged in a sentence) is called a synset. An example of a synset is the following

 

{Molotov cocktail, petrol bomb, gasoline bomb}

 

Words that are in the same synset are not necessarily synonyms in all contexts. For example, "grease" and "lard" are synonyms in the context of cooking, but not necessarily so in other contexts. For a given word, locating the different synsets that contain the word, and finding related synsets, can be useful. As an example, there are 8 synsets containing the word "board". We list four of them below.

 

SYNSET 1: {board, mess, ration}

SYNSET 2: {board, plank}

SYNSET 3: {control paenl, display panel, panel, board}

SYNSET 4: {board, diving board}

 

An important and useful form of relationship is hyponymy. A word X is a hyponym of Y if X is a "type of " Y. Continuing the example above, we list a few of the hyponymic relations involving the synsets above.

 

SYNSET 1 is a hyponym of {food, nutrient}

SYNSET 2 is a hyponym of {lumber, timber}

SYNSET 3 is a hyponym of {electrical device}

SYNSET 4 is a hyponym of {springboard}

 

 

 

 

 

Box 4: Some further relationships in WordNet

Some of the relationships in WordNet are between synsets. Below are listed some of these relationships together with an example of each of them.

 

HYPERNYM: {oak} -> {tree}

HAS-MEMBER: {family, family unit} -> {child, kid}

HAS-STUFF: {tank, army tank} -> {steel}

ENTAIL: {snore, saw wood} -> {sleep, slumber}

CAUSE-TO: {develop} -> {grow, become larger}

ATTRIBUTE: {hypocritical} -> {insincerity}

Some of the relationships in WordNet are between words. Some examples are:

 

PERTAINYM: academic -> academia

ANTONUM: presence -> absence

SIMILAR-TO: abridge -> shorten

SEE-ALSO: touch -> touch down