MIXED REALITY TOOLKIT (MRT)
5.1. Marker Tracking
Mixed Realities (MR) are created by combining Computer Generated Graphics (CGI) with elements of our real world, in one of two ways. Augmented Reality (AR), where the computer graphics are superimposed over the real world, or Augmented Virtuality (AV), where images of the real world are integrated within a Virtual Environment (VE)).
Knowing what is present in the real world and where, is a vital component for the effective operation of many MR systems. This real world description can be used to aid virtual object placement, occlusion culling, collision detection and many other visual or simulated effects. However, building up and reconstructing this real world description, whilst also minimising user interaction and processing time, is a common hurdle for many MR systems.
For a MR to operate in real-time, all its constituent processes must be efficient enough to be performed many times its second. Therefore, the significant time required to build up this real world model often means that its reconstruction and specification is frequently performed prior to normal system operation, in an off-line initialisation phase - a considerable limiting factor for many MR systems.
As an alternative to more conventional approaches, we propose a novel on-line video based modelling solution that can be rapidly applied to interactively reconstruct, register and semantically describe observed geometry.<\p>
Real World Scene Description Used to Augment Computer Generated Graphics: Imaged scene modelled described (left) and subsequently augmented with virtual characters (right).
Lightweight and flexible, we created the Mixed Reality Toolkit (MRToolKit or MRT) using our new video-based modelling solution. The simple interactive primitive modelling techniques, which enable users to align and deform shapes directly over live video, not only give an operator the opportunity to identify and reconstruct each object, component, volume or location of an imaged scene, but also perform these tasks in an almost instantaneous manner. The toolkit consists of a small library of C++ classes, and can be freely downloaded and used in accordance with the GNU Lesser General Public License.
To examine the effectiveness of using primitive modelling techniques, as implemented in the Mixed Reality Toolkit (MRT), we produced a demonstration application which could model multiple cubed shaped objects lying on a common flat surface. The application included an integrated MR run-time component that augments animated virtual content.
Scene Objects Modelled From Video Images: Two cubed shaped objects modelled using the MRT (left) before coloured virtual animated bouncing balls interact with the scene (right).
The MRT uses a single pre-calibrated camera, in a fixed position, which is assumed not to move throughout the demonstration. Whilst it is possible to hold the camera in a fixed position, it was often prone to jitter or scene movement.
As many scenes contain reoccurring classes of object, such as chairs or tables, the problem of scene reconstruction can be simplified. Instead of combining multiple primitive shapes to build up more complex shapes, by identifying, deforming and registering various classes of object to fit an image view the process can potentially be made more rapid.>
Virtual Reality (VR) Scenario Turned into Mixed Reality Scenario: Original FOPS VR experiment (left) made into MR experiment (right) with virtual audience integrated into real world scene.
To examine this idea further we adapted an existing Virtual Reality (VR) experiment, called the Fear of Public Speaking (FOPS) experiment, in an attempt to make it become an MR experiment. The original FOPS experiment asked human participants to give a short presentation to a virtual audience. The participant’s reactions to the varying behaviours of the audience could then be studied to find its negative or positive effects. Therefore, instead of the participants giving a presentation to a virtual audience situated within a completely virtual scene, we wanted the virtual audience to become integrated into a real world environment.
MR Modeller and MR Runtime Separation.
Because the FOPS experiment was originally implemented for the VR DIVE platform, which did not support video underlay, it was necessary to incorporate a video underlay to augment the virtual audience members (or avatars) to an imaged scene. Whilst it was technically feasible for the MRT to be directly integrated into the DIVE platform, it would have taken considerable time and effort to achieve. Therefore, the scene modelling and the MR run-time components were operated separately on the same machine. This separation enabled us to also begin to explore the communications gap between an MR run-time application and the modelling and scene specification application that uses the MRT.
Scene Modelling and Specification: Initial object (table) modelled to extrinsically calibrate camera relative to the object (left). Subsequent objects modelled using ground plane constraints defined by first object (middle and right).
All the modelled scene objects in this example rest on a common flat floor surface, and so also have a common vertical free axis of rotation. It was therefore possible to further constrain the modelling process using the Rodrigues Formula. Once the initial object was registered to the image the floor surface is therefore also defined, making it possible to drag subsequent objects along the floor surface into the required position whilst only rotating about its vertical axis.
The scene description in this case contains two major types of objects: real objects, such as the tables, and place-holder objects, in this case the chairs that indicate where the virtual audience members sit. However, the chairs are not required for occlusion purposes. We used the MRT to register the fixed bounding volumes of pre-fabricated virtual furniture VRML models over a single live video stream. It was therefore possible to rapidly ‘mark-up’ a scene’s layout whilst systematically specifying each individual objects class type: e.g. chair or table.
Scene Description Used to Register Virtual Avatars: Initial scene model loaded into DIVE (left). The fully configured MR scene (right).
The scene description was passed to the MR Run-Time component via a collection of files, including VRML 3D model files and 3D DIVE files for the avatars. To bring together all of these, a single scene description file was created, which could not only link all these files together but also specify the individual placements and orientations of the various scene objects within the environment. The format used for this scene description file was the DIVE ‘.vr‘ format. However, a single file format that could effectively bring together all of the information required to fully specify a scene has not yet been adequately addressed. Durng the implementation of the above system it became obvious, due to the seperation of MR Modeller and MR Run-Time, that for many MR applications a new file format needs to be developed that can not only describe those aspects that have been discussed here, but also other aspects, such as lighting, relighting or scene and camera tracking information. Therefore, in our paper published in the proceedings of the 2005 VRST conference, we discussed the need for a new modelling language we have termed the Mixed Reality Modelling Language or MRML.
Although the above work was able to very effectively demonstrate how interactive primitive-based modelling techniques could be rapidly applied over live video images to interactively reconstruct, register and/or semantically describe an observed scene, it was limited to using a camera held in a fixed position. However, as many MR systems would ideally wish to use a freely moving camera, a more flexible approach was required. Combining camera position and orientation tracking techniques with these model reconstruction techniques was therefore an obvious addition.
To demonstrate the effectiveness of the combining these two modelling and tracking techniques we used the [ARToolKitPlus] to find a camera's extrinsic parameters (orientation and position) relative to a flat visibl marker placed into the scene. The position and orientation of the marker was used to define a ground plane, relative to which the was used to register, locate and reconstruct geometry.