Introduction to EVA

Introduction

Motivation

The Goals of this thesis

Scope of this thesis

Contributions

Introduction

Advances in science and commerce have often been characterised by inventions that allow people to see old things in new ways. Computers combine both new instruments and new visual representations, resulting in the emerging field of information visualisation. By taking advantage of the processing speed and graphical capabilities of computers, information visualisation enables users to interpret large amounts of information to revealing structure, extract meaning, and navigate large and complex information worlds. This project describes research that is investigating the use of naturalistic visual structures for representation of multivariate data sets with the aim of allowing better and faster understanding of our data set.

Motivation

The most interesting data is multi-dimensional. How can this be represented on a graph? The data is large in the sense that it cannot be viewed and understood by conventional statistical techniques. Moreover, only a relatively small class of problems are directly amenable to statistical analysis - those where the variables are understood, potential relationships are already known, and there are hypothesis to be tested. Numerous techniques have been described in the literature that attempt to visualise such data with their advantages and disadvantages also being described.

Here we are considering cases of data where even the questions to be posed on the data are not fully formulated. A classic example described in the proposal of this thesis, is that of financial, say, stock exchange data over a number of years - the movement of share prices, for example. Here the question might be ``What determines the changes in share prices?'' Such a vague question is not yet ready for statistical analysis - there are a huge number of potential variables (economic indicators, social conditions, the interrelationships of world stock exchange movement), and the amount of data itself is massive (over how many years?).

In such situations information visualisation helps in ``understanding'' a set of data (which may be dynamically unfolding in time) by allowing users to visualise representations of the data, thus using vision to build ``understanding'', and allowing the formation of hypothesis for later statistical analysis.

There are two main problems involved in this:

The Goals of this thesis

For this PhD we have in mind the investigation of an automatic mapping paradigm on a naturalistic visual representation. For reasons that should become apparent below we call this ``Empathic Visualisation''. This image shows the placement of EVA on the classification proposed in the background

Instead of processing individual details and have as our output numbers and text, we examine the use of the visual system to process visual structures holistically and thus obtain an overall global view of a data set. We are gaining information from an overall view of qualitative measurements. It allows us to examine the important features of the visual structures and notice abnormalities very quickly. It can be viewed as the process of ``using vision to think'', or the ``mind's eye'' to gain insight into what are often complex abstract data systems.

The method is based on a technique devised by Slater to automatically map data to visual structures using genetic programming techniques. We call the method Empathic Visualisation Algorithm (EVA) since we are taking into consideration the impact of the visual structure to user's emotions. The objective of EVA is, given a data set and an observer, to construct a visualisation such that salient features of the data can be intuitively recognised by the observer. In other words, humans can detect patterns that reveal the underlying structure in the data more readily than a direct analysis of the numbers would. In order to achieve this we use visual structures that are naturalistic in the sense that no human requires special knowledge for interpretation and also the mapping from data to the visual structure is automated. The mapping should be such that the `important' features of the data set are mapped to features of the visual structure that are `important' and significant to human perception or human emotions.

EVA, the method introduced here to visualise complex data sets, can be thought of as the initial step to establish a clearer pictorial representation of a problem. It is by no means an alternative to statistical analysis, but a complementary one. It can be seen as the front line of a battle to unveil the intricate structure of data.

The purpose of the thesis will be to construct a system for such a representation, and then to test this in an experimental setting. The system should be such that it can be used with as different data sets as possible- i.e. that is a generic rather than tied to a particular form of data.

Scope of this thesis

For this project we are interested at abstract data that has the following attributes: the data is large in the amount of data there is, multidimensional, of non-physical nature and has hidden information. Examples include, financial data, business information, collections of documents and abstract conceptions. Such information has no obvious spatial mappings.

The visual structure should be naturalistic. Something encountered in everyday life, something that needs no special knowledge for interpretation by a normal human observer.

We are only interested at mappings from data to visual structure that are automated. Arbitrary mappings, are out of the scope in this project since we believe are the barrier to a generalisable system and widen the gap between experts and non-expert users.

EVA the method used throughout this thesis requires a search technique to find a good mapping from data to visual structure. We have chosen to use genetic programs (GP) but we do not perform research in GP. However, since the validity of the method depends upon convergence of the search technique in a ``good'' solution, the convergence of the GP will be tested experimentally.

Contributions

The overall contribution of the thesis can be summarised as: the automated mapping from multi-dimensional abstract data to naturalistic visual structures is achievable and gives us certain advantages. More specifically, the contributions are:

  1. A comprehensive survey of information visualisation research to date.
  2. Implementation of genetic programming techniques to automate mapping from data to visual structures.
  3. The use of naturalistic visual cues as the representation of this data in this automated mapping.
  4. Results from experiments on this novel method.