Taxonomies in knowledge management

Taxonomies

Setting up a knowledge management system across an organization may sometimes seem like an impossible goal. Trying to systematically organize knowledge, whether documented or tacit, within a company calls for structuring of information that seems to cover anything that the company can potentially touch upon. This in turn calls for extensive meta-knowledge about the organization.

A surprisingly simple yet effective approach for helping meet this need for meta-knowledge are organizational taxonomies. See Box 1 for "What is a taxonomy?" Taxonomies have been used in many fields for a long time. For example, in botany, taxonomies are used for plant classification. Here the complexity may be in the shear number of entities that need to be considered rather than the need for sophisticated constructs. There are many species of plant, but the ontology is based on a relatively small number of groupings such as phylum, class, family, etc, with typeof used as a relationship.

Taxonomies are the basis of classfication schemes and indexing systems in information management such as the Dewey Decimal System. Taxonomies are even more wide spread with applications including post codes (zip codes) used by postal services, and job categories used by tax collection agencies. With the advent of the internet, there has been increased interest in using taxonomies for structuring information for easier management and retrieval.

One of the first big ebusiness organizations to harness taxonomies was Yahoo (www.yahoo.com). To help users navigate the web, they developed a broad and deep structuring of topics covered on the web. Starting from a general topic, users can navigate to desired topics of interest at an appropriate granularity. Whilst this is a large taxonomy, it is not a sophisticated in terms of the underlying formalisation. Yet it is an approach that is being pushed by further organizations such as Wordmap (www.wordmap.co.uk) who have added some context-sensitive disambiguation of search terms.

To illustrate the taxonomy used in Wordmap, typing in the search term Lotus will result in fragments of the taxonomy being returned that end in Lotus. Some of these fragment include Shopping > Vehicles > Cars > Lotus , Computers > Software > Groupware > Lotus Notes , Society > Religion > Yoga > Postures > Lotus position , and Regional > North America > United States > Regions > California > Localities > Lotus . Each of these branches therefore ends in a different interpretation of the search term. Clicking on one of the leaves will take you to a set of corresponding web pages.

Within ebusiness, taxonomies are implicitly or explicitly the subject of much deliberation at the development stage of more complex websites. Any site that gives users a number of routes through the site to a particular destination should be managed consistently so that updates take account of inter-dependencies. A common ontology, based on a taxonomy, can help both the content managers and the users to be clear about the inter-dependencies between terms.

Consider a navigation path in an online catalogue that starts at computers, goes to multimedia computers, and ends at multimedia computers at home. Now consider another path starting at home electronics, goes to home entertainment, and ends at home multimedia computers. Probably, these two end points refer to the same item. If so, it would be desirable that the same term is used in both cases. By adopting a common ontology, the use of equivalent terms may be obviated.

Structural problems with content can also be addressed with taxonomies. An error that can easily arise is for a cycle to occur in a hierarchy. So for example, we could have a path that says Product A is a type of Product B, Product B is a type of Product C, and Product C is a type of Product A. Whilst this violates the definition of a hierarchy, it is difficult to spot.

A third kind of problem involves deciding on an appropriate set of constructs. Consider an online catalogue for a motor manufacturer such as Ford. We can start with a set of vehicles and then identify subsets of different brands Ford, Volvo, Land Rover, etc. and subsets for each of these for models such as Mondeo, Range Rover, etc. But now, suppose we want to capture parts. Some parts only work for certain models built in certain periods. What is the most efficient way of capturing this? Now consider accessories. If we have a customer who is navigating down a path to Range Rover models, how do we capture the relationship with Land Rover Accessories, or even Land Rover Branded Clothes?

A number of software supplier have developed products to support the creation and mangement of taxonomic catalogues for B2B ecommerce. A leading specialist supplier of catalogue content management software is Requisite Technology (www.requisite.com) based in Colorado in the US. Products include eMerge which allows organizations to construct an online catalogue for procurement. Suppliers load product information into eMerge, and this information is organized into a consistent structure and staged for review and approval before loading into an eProcurement catalogue. Searching an online catalogue is then via text searching, including key word searching, or tree searching. In tree searching, the user navigates through the taxonomy of catagories of items. Organizations using this software include Reuters and Delta Airlines. Recently various net marketplaces have also adopted it. These include SciQuest, PlasticsNet.com, and Petrocosm.

Within knowledge management, the role of taxonomies can be pushed even wider. A taxonomy provides a perspective on an organization. Each taxonomy breaks the organization in some way, and the range of possibilities includes: types of revenue stream, types of services offered by the organization, types of knowledge experts offered by the organization, types of customers, and types of services bought in. Each of these taxonomies can be illuminating for the participants involved in their construction, and they constitute valuable transferable knowledge that can support decision making. They may also lead to creative improvements in the organizations.

One of the things to note in the examples of taxonomies for an organization is the use of the word "type of". This gives an important handle on ways of constructing taxonomies. We focus on this in Box 2 on "Constructing hierarchies". Whilst, there are no hard and fast rules for constructing taxonomies, they draw easily on established knowledge about an organization, including its products, people, or customers.

In the short term, the two key applications of taxonomies in knowledge management are likely to be in helping users navigate to web-based resources such as web-pages and pdf files on a knowledge management intranet, and in the construction of taxonomic breakdown of experts in an organization. Users navigate these taxonomies to find the information or experts that they require. There are clearly pros and cons of using tree searching, but it can be combined with key-word search to offer a hybrid approach. One of the key advantages of tree searching is that users can browse more easily than with key-word searching. And of course having users browsing information can be an ideal form of knowledge dissemination in an organization.

Anthony Hunter is a lecturer in computer science at University College London. He can be contacted at: a.hunter@cs.ucl.ac.uk

Box 1: What is a taxonomy?

A taxonomy is a classification system. Normally, the aim of a taxonomy is to group things according to similarities in some respect such as similarities in structure, role, behaviour, etc. As the Greek root "taxis" implies, it is about putting things in order.

The use of taxonomies has had a particularly profound role in biology where for a long time much progress was made in understanding the natural world during the course of developing a taxonomy for all living things. So for example, animals are grouped into sets including mammals, reptiles, feline, and domestic cat. Notice also how these sets can be related by the subset relation. So mammal is a subset of animal,feline is a subset of mammal, and domestic cat is a subset of feline, whereas for example, mammal and reptile.

In the simplest case a taxonomy is represented by a tree. This is a set of nodes and set of connections between the nodes such that for any pair of nodes, there is a unique path (sequence of connections) that connects them. A tree is like a simplistic sketch of a real tree - though normally it is drawn upside down. The node at the top is called the root, and the nodes at the bottom are called leaves. A simple example is givne below.

Any path from from the root a leaf is called a branch. So in the example taxonomy below, animals is the root, and the leaves include tigers, domestic cats, bovines, and snakes. Each node except the root has a parent node, e.g. mammals is the parent of felines, and each node except the leaf nodes has one or more children, e.g. mammals has felines and bovines as children.

Clearly simple set theory provides a formal basis for developing and using a taxonomy. The set operations of union, intersection, and complement, allow us to manipulate the groupings directly and completely. So the root node is the set of all things that we are interested in. Then each child is a subset of the parent. Normally, children are disjoint sets (i.e. the intersection of each pair of children is empty).

Box 2: Constructing taxonomies.

A taxonomy involves finding an appropriate breakdown. We start with the most general category which will be the root of the tree. Then we need to find the subcategories for this. For any category, each subcategory is a taxonym. For this, we need a taxonym test: X is a taxonym of Y if X is a type or kind of Y. For example, a labrador is a taxonym of dog

Unfortunately, the taxonym test is not fool-proof, since "kind of" can give us false postives. It can be incorrect when "kind of" is used in the sense of drawing an analogy. Consider, for example an abacus is a kind of computer. Further examples of "kind of" relationships that might tbe problematical include a kitten is a kind of cat, a queen is a kind of monach, a waiter is a type of man.

A way of flagging problems cases of "X is a type of Y" is to consider whether X would be an appropriate answer to the question "what types of Y are there". For example, if we ask what types of cat are there, normally we wouldn't include kittens in the reply. Of course there are situations where we might, such as if we were considering the stock of a pet shop.

Essentially, creating a taxonomy involves splitting a set into subsets, and repeating the process on the subsets by recursion. The criteria used to choose appropriate splits depends on the application. For example, criteria for splitting customers into a taxonomy for marketing could be based on geography, types of product bought, or budgets, whereas criteria for splitting a product catalogue for use by customers should be based on the categories and subcategories of product that the customer is likely to be familiar with.

Splitting sets into appropriate subsets takes care so that the criteria for division is systematic. For example, the division of animals into sheep and horses is a different sort of division from that of sheep into ewes and rams. The division used should therefore be consistent with the expectation of the users, otherwise it is hard for users to navigate intuitively.

Normally, each set needs to be split into disjoint sets otherwise there can be confusions over which branch to take. For example, in a catalogue, if the set books is split into the sets novels and paperbacks, then both branches might need to be traversed to ascertain whether or not some particular book is listed in the catalogue. Though, in some situations it is necessary to drop this condition and allow some elements to appear in multiple branches.

The choice of categories also needs consideration. Normally, in taxonomies, each category is chosen so that there is the highest possible degree of resemblance between members of the category. Though this may need to be off-set against having the maximum distinctiveness from members of other categories. Often identifying a prototypical example of a member of a category is a helpful tool. A prototypical example has all the attributes that one would expect of a member of the category, and normally is an example that can be easily articulated. For the birds, we could use blackbirds, but not penguins. Prototypes are important when there are multiple participants involved in developing a taxonomy. The prototypical example does not even have to be a member of the set - it is just a tool to help conceptualize the category. They are also helpful is explaining a taxonomy to users.