Question Answering

Why?, why?, why?. A common plea from toddlers. It can be exasperating to hear and yet we know that at that age it is critically important to try ones best to co-operate by providing answers. Yet providing the right answer can be difficult. Toddlers can ask questions ranging from the remarakable such as "why is the sky blue?" or "why don't dogs talk?" to the quite mundane.

I was reminded of this last option when walking through a park last week. A van pulled up along the path and a toddler who was walking with her mother in front of me wanted to know what the van was doing. The mother replied that she didn't know. The child then asked exactly the same question, and the mother gave exactly the same reply. This cycle was repeated for a few times before, I am glad to say, they fell out of earshot.

Irrespective of how interesting a kid's question may be, we do need to consider what is really required as an answer? What would the questioner understand? Do they really need to know? Can we just just tell them that we don't know? And of course, the same issues arise with adults asking questions.

Questions and answers offer an important way of transmitting information. It can be a quick way to communicate information. I see this with teaching. If a student misses a one hour lecture perhaps because of a medical appointment they may come to see me to find out what they missed. It sometimes surprises me that I can compress a one hour lecture into a 10 minute session with the student, with the student also asking questions, and at the end of the 10 minutes the student seems to be no worse, and perhaps better off, than colleagues who went to the one-hour lecture. Of course there are a number of issues here including one-to-one teaching puts more pressure on the student to participate in the teaching process, and asking questions is a form of active learning, whereas lectures are largely passive learning. However, amongst all these issues is the fact that asking questions is a useful way to acquire information.

This fact has long been recognized in library and information services where information professionals are very important in helping users locate information. Via a process of questions and answers by both parties, the information required can be ascertained. Automation of this very important role is very limited because it is not possible to replicate the intelligence and knowledge that an information professional obviously brings to the role. However, there are ways that software can be harnessed to address some of the tasks that an information professional may undertake in question answering.

Perhaps the simplist kind of question answering capability comes with FAQs (Frequently Asked Questions). Whilst these can quite commonly be seen in paper-based documentation, they really came to prominence with the internet. There is even a website dedicated to the topic (www.faqs.org). Often a set of FAQs were collated by organisers of internet newsgroups as an efficiency and quality measure, in response to numerous users asking the same questions. These FAQs then become the first place users would look to solve problems. And since the answers were being reused, more effort was put into better quality answers.

With more people being familiar with the idea and utility of FAQs, they are becoming increasingly common in many areas beyond newsgroups. Much documentation now includes FAQs with application in areas as diverse as IT, financial products, and administration processes. FAQs offer a more accessible alternative to locating information than say searching documentation using a contents list, index, or section headings. A FAQ gives a much clearer indication of the information being imparted at that point in the documentation, and perhaps more importantly the writer of the documentation is assuming that the reader is not reading the documentation linearly from start to finish, and so each answer must be as self-contained as possible. Though this may call for a hierarchical structure to relate the different FAQs. So for example, if a reader is looking at a question on a very specific topic, then it is assumed that the reader has also looked at the more general questions first.

In the unregulated, decentralised, way of the internet, vast archives of FAQs have built up. For the users of these archives, they are an interesting and useful resource. However, if the idea of creating and using large sets of FAQs within organisations is to be appealing, then there is a need for tools for managing the FAQs.

One company that is trying to address this need is Orbital Software with its Organik system (www.orbitalsw.com). The approach is to build communities within an organization who are interested in a particular topic. Some members of a community may be experts in the topic, and be in a position to provide answers. Other members of a community may normally be consumers of this expertise. When a question is posted, the system first looks in the knowledgebase of previously asked question to see if an answer can be obtained there. If not, then the question is directed to the most appropriate expert. To facilitate this, experts provide a resume of their knowledge and expertise to the system.

Within an organisation, there may be a number of communities. Each would have questions and answer sessions, and also dialogues that can be viewed by members of the community, in the spirit of newsgroups. If appropriate, these communities can be extended to suppliers and even customers. Companies that are using the Organik system include Ericson, PR Newswire, and Novell.

From the online demo of the Organik system, the natural language understanding capability does not seem much different from key word search. However, many software systems that incorporate natural language processing, only really offer useful performance in more focussed domains.

The problems of getting natural language understanding to work well is only too apparent from the AskJeeves system (www,askjeeves.co.uk). Given the inclination of users to be drawn to posing natural language queries, it is an interesting technical and commercial goal to field queries on potentially any topic. AskJeeves is really a natural language front-end to a search engine. You ask a question, it tries to determine what you ask about, and it then matches this to websites that may have the answer. Central to understanding each query is determing the keywords in the query. In a sense, it is a slightly more sophisticated version of a normal keyword search engine. Though of course this does not mean it is necessarily better than a keyword search engine because some such as Google (www.google.com) do have impressive indexing of useful webpages.

AskJeeves is also available for use as a front-end for websites. Customers including Ford, Nike and Dell, have tailored versions of AskJeeves to help their customers navigate their websites more easily. Looking at the Ford version (ford.ask.com) perhaps gives a better idea of how natural language querying may be useful in more focussed domains. Though of course, using a tailored version of AskJeeves to help users navigate a website does not necessarily have to be about offering superior performance to a key word search engine. Maybe customers prefer the interaction offered by natural language queries. Furthermore, the website gains useful information about customers in the form of queries, and AskJeeves offers the Insight product to track and analyse these customer queries.

The promise shown by companies like AskJeeves is leading other companies to further develop technologies for fielding natural language queries. LingoMotors (www.lingomotors.com) a start-up in Cambridge MA, with funding from companies including Reuters and Softbank, is developing a web search technology called TurboSearch for use on websites and portals. Underlying this technology is an idea called Qualia that capture concepts used in language to allow a deeper understanding of any given question. Unfortunately, there is no demo of this system available on their website.

Another company that is trying to address the need to capture concepts in question understanding is Inquizit (www.inquizit.com) based in Santa Monica CA. They have developed a Concept Engine which has a vocabulary of 300,000 concepts and 1.2 million word forms for translating users queries into corresponding context-specific concepts. Clients include the US Army which have used it to enable logistics staff to locate the status of Army equipment and AllergyAve.com which provides information on allergies via a public website.

A partial list of further companies working on search engines that accept natural language queries include EasyAsk (www.easyask.com), AnswerLogic (www.answerlogic.com), and AnswerFriend (www.answerfriend.com). So it seems we are now only begining to see how question answering is going to change with new software developments. At this stage it seems difficult to differentiate the performance of these offerings. Though, it will be interesting to see whether these suppliers make any attempts to characterise what types of query their systems can and can't handle.

Anthony Hunter is a lecturer in computer science at University College London. He can be contacted at: a.hunter@cs.ucl.ac.uk