1 Executive Summary

Executive Summary

The aim of this report is to state the current understanding of the Virtual Reality Transfer Protocol (vrtp) while providing comments with the intent of requesting elucidation or instigating further discussion. It should be stated up front that the way that vrtp resembles more of a framework than a protocol. As such the document will frequently use the word framework when referencing vrtp.

The thoughts and ideas contained herein result from the interpretation of the author regarding the existing documentation available either on-line or through publications, meaning that there is no warranty that what is written corresponds to what is expected. The document has benefited from the input from several people…

Introduction

The design rationale of vrtp was published in mid 1997 [Brutzman97] and currently has been proposed as a Web3D working group promoting the contribution from the entire community. The main idea of vrtp is to support 3D content the same way that http supports HyperText Markup Language (HTML) in a simple and scalable way.

Unarguable, one of the main reasons why the www has an exponential growth is due to the simple reality that anyone can provide content by building a webpage, using HTML. Whether the webpage is built from either a powerful WYSIWYG tool or a simple text application is irrelevant. The equivalent to HTML for producing 3D content is known as the Virtual Reality Modelling Language (VRML) [VRML97] and despite its flaws [Marin99] and inherent problems, the language has been embraced by the industry as the means of creating 3D on the World Wide Web (WWW). This adoption may be justified by the fact of the language being an approved standard that empowers anyone with the ability to build virtual worlds in 3D without any programming effort. This capability is of utmost importance since it permits people to focus on content rather on technological details. Nevertheless VRML is not the ideal solution and remains plagued by deficiencies mainly due to its heavyweight nature that leads to bulky implementations with too many architectural interdependencies. It is hoped that the next generation of the standard (VRML-NG) will reduce this complexity and increase simultaneously its flexibility. This report will not concern itself with these issues unless it has direct impact upon the design criteria of the underlying vrtp layer.

One of the most acclaimed shortcomings of the current version of VRML is related to the impossibility of a virtual world to be simultaneously shared amongst all users. Although people point out the necessity of this requirement, it does not belong within the scope of the objectives of any current or future VRML version. This should be a pitfall to avoid when designing the next generation of VMRL-NG, since no explicit or implicit constructs should be present in the language to support multi-users. The reason for this concern is twofold:

The world modeller can not predict accurately what will be the emerging behaviour of the users because of their unpredictability volatile nature. Therefore all design assumptions that are made intended to optimise the usage of network is for naught.

The world modeller should not be burdened with unnecessary skills, neither should the modeller be constrained by anything. Nevertheless guidelines should be provided to ease the demands of the network layer much in a similar way as with the polygon budgets for 3D modelling of characters in the gaming industry.

These arguments demonstrate the need of delegating the responsibility of managing multi-users and other management tasks to an underlying layer. The interconnection of multi-users sharing a common virtual environment implies the usage of networking to extend the illusion of presence, thus incorporating all those involved although geographically distributed. The most prominent network infrastructure that encompasses the world is, without possible debate, the Internet. However it has been clearly identified that the existing protocol HTTP, although very well suited for HTML, fails miserably regarding support to multi-user large-scale virtual environments. Therefore vrtp proposes to provide the protocol that will provide the much sought interconnectivity amongst many users while making it completely transparent to the person building a world, much in the same way as building a webpage. Since it is assumed that the target network will be the Internet, or some incarnation of it, naturally vrtp will focus on the problems associated to the inherent characteristics of this infrastructure, although not constrained to it.

Upon analysing the WWW it is concluded that regarding the building of HTML content the main artifice that provides access to the network is the usage of Uniform Resource Location (URL). People may not know what exactly a URL represents, but they definitively know what is its purpose and what is achieved with their usage, thus the WWW consists of content linked via URLs. The vrtp borrows the same philosophy by exposing the networking layer by usage of URLs. Although the approach of using URLs provides a simple abstraction there is a series of components necessary to make vrtp a reality as the remainder of the document will demonstrate.

Data Streams

The core to any application or system is the data it handles. When considering the scope of a virtual environment it is important to consider that there exists a rich set of data requirements with different characteristics, which makes it unfeasible the simple adoption of a single data protocol that satisfies all the situations.

For purposes of establishing terms, the word stream will be used to convey the concept of a virtual channel where data is communicated. These streams will reflect the related topology, being one-to-one, one-to-many or many-to-many, depending upon the requirements of the data to be communicated and the resources available. The nature of a stream is either continuous or discrete, where the former is appropriate for TCP connections while the latter requires UDP connections. Independent of their nature, every stream has associated to it a protocol stack, containing all the allocated protocols that are valid for that particular stream.

The problem domain regarding the network layer within virtual environments only becomes interesting with the consideration of a large number of multi-users.

In [Brutzman95] a careful categorisation of types of data within a virtual environment is presented that is the foundation of vrtp. These categories are the primitive types from which all the remainder may be constructed from by their composition. The explanation of these categories will be discussed in the following subsections but with further elaboration to stress the need of adopting specific protocols that are most appropriate to the particular data stream. The same terminology is used to facilitate the matching of terms and concepts.

Light Weight Interactions

Light-weight interactions encompasses all the data that is discrete in time, the amount of information is small and unique in the timeline without any guarantee that similar data preceded or will follow. The very nature of these interactions makes UDP an ideal candidate to support these streams due to the flexibility and time efficiency.

Adopting UDP places the burden of network management upon the application since no error checking; package loss control or package sequencing is done. The light weight streams should implemented by using a common structure for both ordinary unicast UDP and multicast. Considering the wealth of existing protocols that abound, each with its strengths and weaknesses it should be the decision of the application which one is adopted. However vrtp should allow the integration of different types of protocols, where each is most appropriate to a particular task at hand, since there is a vast family of data types with either similar or completely different requirements.

With the purpose of clarifying further what light weight interactions are, some concrete examples will be presented.

Update Data

This data corresponds to the one that is most frequently present on the network [ref]. When analysing the nature of the data it is possible to discern that it has the following characteristics:

No temporal memory. The main interest is to have the most updated version of the data. Should packets arrive containing data that is outdated then they will simply be discarded without requiring further processing. It is undesirable for an avatar to move two steps forward to retrace a step backwards just because the second packet containing the intermediate step arrived after the one with the final position.

Predictable. Considering that the co-ordinates are absolute then it is possible to interpolate intermediate data that may be missing. There exist numerous interpolation techniques in numerical analysis however they do not produce realistic results. The best approach is to add information and fit the path of data with functions from the domain of physics that describe movement. The simplest movement formula is based on a given position x0 and the velocity vector x’₀:

Naturally this is a linear approach however it is possible to increase the accuracy by increasing the order of the equation. All systems that use dead reckoning resort to the predictably of the update data to avoid sending data unless an error of prediction exceeds a pre-defined threshold. In [Singal thesis] this approach is improved with the usage of past information, which allows smoother and more realistic results.

This kind of data is ideal for multicast however it only requires a very light weight protocol due to the weak reliability constraints. However the decision of what protocol to associate to the stream should be application driven, although a default setting may be provided.

Event Data

This data is transmitted using light streams because of its size however unlike the move update, this data is much more demanding towards reliability. The packets must be processed in the same order they were transmitted without any missing data. When analysing the data it is possible to verify trends that are completely opposite to the above:

Temporal Sequence. An event must happen in the appropriate time slot. The sequence is important because of there exists time correlation between states. It is important that if an avatar knocks over a glass of water which breaks upon hitting the floor, that the event of knocking over the glass is processed on the remote hosts before the event of breaking. Otherwise some clients will have serious inconsistencies and visualise a glass breaking mysteriously and then appearing on the floor. The situation worsens as the number of interrelated events increases.

Idempotent. The event data is completely chaotic with no temporal causality. However it is not crucial the reception of the data in the exact order as transmitted since the end results are identical.

Control Data

This type of data has specific requirement needs from the network and quite possibly may require TCP connections in some cases where total reliability is required. The control data is responsible for a set of tasks such as control congestion, enabling join/leave multicast groups, dynamic management of groups of clients, common service requests, amongst many others.

The control data may either have a constant rate or possess bursts resulting in high peaks of traffic. While the former may be appropriate for protocols such as SRM [Floyd97], the later does not suit it.

System Data

The requirements of system data are analogous to the control data, however it is a stream used for the purpose of the application. Examples of system data include opening audio data stream, receiving indication to retrieve portions of the scenegraph, etc.

Network Pointers

This type of data is basically a pointer to a bulk of data, which may be provided upon demand or announced to a multicast group. An example of a network pointer is a URL. Also network pointers should indicate the location of services and possibly the means of accessing them.

In [Brutzman95] this was a category itself, however when analysing the nature of the data, it is another form of light interaction. Nevertheless network pointers have specific properties, which differ from the above descriptions, since they may be used either in the context of the application or the content itself in the form of a URL.

Heavy Weight Objects

This type of data streams consists of transferring large chunks of data from a source to a destination, consisting of a point to point connection between two entities. The uploading of a portion of a scenegraph from a server to a client application is a clear example of the usage of this type of stream.

The data involved is of continuous nature and requires additional properties such as reliability and error checking thus being an ideal candidate for TCP connections. The time overhead of setting up a connection initially is minimal when compared with the overall time consumed in transferring the data.

Another type of heavy weight object is the transmission of mobile code, which is dynamically integrated into runtime application kernel of client as the need arises.

Real Time Streams

This type of data connection is required when handling any form of streaming data such as video and audio. Naturally the implementation of these streams is highly dependent upon the protocol adopted by the application.

VRTP Component Analysis

The block diagram of Fig.1 provides a general overview of the current identified components that are part of vrtp. Each of the displayed components will be detailed in the following subsections.

The existing documentation and publications allowed the perception of a more or less accurate understanding of the components general functionality.

Although the overview and understanding of the principal concepts is possible, more serious problems arise when drilling into the detailed internals of each component. The obscurity worsens when trying to discern how the components interact with each other and what is their combined functionality. This is the direct consequence of a total absence of a clear specification of the interfaces of the components involved. It would be interesting to have a description of various scenarios, based upon state transitioning diagrams, to have a better understanding how all of it works together.

Also absent are the actual protocols to support the interaction between remote hosts. Although vrtp seems to be more of a framework, it initially was thought to be a protocol, however no proposal of its details is currently available. Even though it is possible to use the Dial a Behaviour Protocol to define customised protocols, it is necessary to at the very least a simple protocol, indicating how a new host connects itself to the virtual environment. Once connected proprietary protocols may start being used but it only is possible after the connection is established.

Universal Platform

The universal platform is portrayed in Fig. 2, which is composed of the Java Virtual Machine (JVM), Bamboo [Watsen98a, Watsen98b] plugins and the Adaptive Communications Environment (ACE) [Schmit94].

This is the major starting point for the vrtp stack upon which all the remainder components are based, with the Bamboo architecture being the underlying backbone. The core foundation of Bamboo is the existence of a small kernel, containing the absolute minimum but allowing extensibility by the integration of plug-ins. The current version of Bamboo is implemented in C++ along with the ACE library. Although ACE provides and implements a lot of functionality to support distributed applications, Bamboo was required to develop and implement the plug-in architecture. This provides the basis for the universal platform.

Interesting issue is that most, if not all, of the work undertaken by Bamboo could have been greatly simplified if Java had been adopted for the development of the kernel since the aim is the development of cross-platform architecture that allows the dynamic integration of code modules upon requirement. It is true that the JVM is included but merely to support the definition of behaviours because of the integration of Java with VMRL.

The ClassLoader from Java allows to dynamically load code on the fly and employ the security policies enforced by the active SecurityManager. Also the implement interface facility allows the adoption of a delegation mechanism where incoming modules register themselves to receive specific events of certain characteristics from other modules much in the same way the JFC (formerly known as AWT) works. The implementation of such a strategy is greatly facilitated by the whole architecture and inherent properties of the Java language. Also Java reflection model could be adopted thereby allowing runtime discovery of the interfaces of each module.

Not all the advantages are technical since the adoption of Java implied the usage of the Java Virtual Machine (JVM), which is widely supported across several leading vendors of Internet browsers. Should the JVM not be readily available then it is always possible to download the Java Plug-in. Therefore Jamboo (Java + Bamboo) would benefit from a much wider dissemination strategy.

The only advantage seen in the development of a generic platform is the possibility of C/C++ programmers to interact with the whole architecture without penalties regarding the usage of mechanisms such as Java Native Interface (JNI). However the number of man years invested into the JVM makes it an attractive alternative, so most likely there will exist (hopefully) an implementation as is and another in the near future Jamboo.

When Bamboo is stripped of the actual implementation details, there does not seem to exist an established interface definition for plug-in interaction. It is true that this could be defined by the application, however Bamboo would benefit from establishing one that could be extended if necessary.

Server

When analysing the current topologies of existing systems, one is confronted with a chaotic mess where the dominance of a best practise case is non existent. Therefore the idea of vrtp was to provide the means to support client-server and peer-to-peer with all the various variations in-between. This means that the essential capabilities for networking were provided, delegating to the application the responsibility of how the services were used and interconnected, thus defining the required topology as necessary. Therefore vrtp would not present barriers to adoption by either client-server or peer-to-peer camps.

Unfortunately according to the components proposed, this does not seem to be the case. The server component in particular, which is illustrated in Fig. 3, far exceeds the responsibilities of the vrtp since the optimisation mechanisms proposed by QuICk belong to the application domain.

The component as presented also provides the means of the local host to become a temporary server from where other hosts may download content such as the avatar representation of that user.

This is the component that is responsible for the dissemination of heavy weight objects as described in the previous section 3. Considering that it has the HTTP server, perhaps it should also be responsible for all the data streams that require high reliability when the overhead time associated to establishing TCP connections may be considered negligent. Unfortunately it seems that according to existing documentation, this type of data streams is completely neglected by this component and any other.

The layers of this component are as follows:

HTTP Server. This is responsible for receiving vrtp requests and allows remote parties to download content. No mention to security mechanisms is made, neither is there any concern of validating if the local content made available will not disrupt the illusion of the virtual environment as a whole. The latter is of utmost importance to particular applications, such as a game where users may cheat and have avatars that possess properties that should be illegal to have.

QuICk. This seems to be the core layer of this component. The acronym stands for Quality, Interest and Cost. Currently according to [Capps98] the work being developed here concentrates mainly on the local host perception of the environment without any global considerations. Although important, the placement of this layer at this level makes vrtp unnecessarily heavyweight. In [Oliveira99] proper argumentation will be provided to suggest that QuICk should be moved to the application and made optional.

Scenegraph Management. This layer is tightly coupled with QuICk since it is managed by it. Basically the operations available are loading and unloading portions of the scenegraph with either information available locally or retrieved from remote parties.

Peer-to-Peer (Transient Behaviour)

This component is probably the one that provides the most information, proving to be currently the one with most thought given. Therefore it is only natural that a better understanding results in more accurate description. All the data streams mentioned in section 3, corresponding to light interactions, network pointers and real-time streams, are handled by this component. Thereby the communication model is based upon different variations of UDP and naturally multicast.

The various layers that are involved are depicted in Fig. 4 and are briefly explained as follows:

Area Of Interest Manager (AOIM). This is the low end of the stack responsible for the data streams at the level of the sockets. Interesting work is being conducted as a 3-tier approach is taken [Abrams98], the results of an implementation of the first tier has been presented, however a simulation running on a single computer where the assumptions taken are ideal is not sufficient to validate the idea.

The AOIM has 3 tiers, where the first is responsible for the actual joining and leaving of multicast groups (address plus port) along with the dynamic partitioning of the octree, which is the supporting data structure. Naturally the partitioning of the octree depends upon the size and scale of the area of interest (AOI) of the user. While the principle is sound for a local basis it does not say what is relation between different hosts that share overlapping AOI. There is a need to understand how different AOIM keep consistent considering that although their octrees are completely different, there remain common groups.

The second tier applies filtering and this goes along the principle that no matter what the address space is, there will never be enough address groups to provide each entity with the necessary addresses for per-entity subscription. Thus the second tier groups together those entities that share a common protocol.

The third tier applies further filtering to the previous tier and isolates entities that meet particular characteristics, meaning that they share common interests.

Although the approach of 3-tier is innovative, there remains the issue of how exactly are the second and third tier implemented. It seems that both tiers are merely the application of two sequential filters to the packets received via the multicast address group. A clearer understanding of this approach of addressing combined with filtering can be found in [Levine99]. So maybe a more accurate description would be a 3-tier approach where the first is addressing and the second is filtering. The former would not be restricted to the application of merely one filter.

Behaviour Streaming Buffer (BSB). This layer is where the data streams are created. These streams are associated to dedicated sockets that are managed by the AOIM. Naturally maximum parallelism is sought so there exists a separate thread for each stream. Unfortunately there are no hints to the possible flexibility of BSB to allow the different settings according to the requirements of the data from a particular stream. The setting of these parameters is highly dependent upon the parameters of the given protocol such as Scalable Reliable Multicast [Floyd97], Lightweight Reliable Multicast Protocol [Liao??], amongst others.

Real Time Protocol (RTP). This should be part of the BSB as another protocol.

Dial a Behaviour Protocol (DaBP). Although there is no concrete documentation on how it is done, the design principles and objectives of DaBP are very attractive. The possibility of creating and designing new protocols based upon eXtensible Markup Language (XML) [Bray9?] allows a person to have the system running and add at run-time additional protocols. However the details remain elusive. The role of DaBP is to define the payload of packets, independently of what is the stream protocol. For this purpose the use XML is ideal, where a person defines the various fields, along with the appropriate Data Type Definition (DTD). Nevertheless for efficiency purposes there is the possibility of having a local compiler that receives the XML along with the DTD and produces the necessary instance of an object that allows serialisation of the protocol. An interesting issue may be raised, which is instead of having at each host a XML parser and compiler, Bamboo (Jamboo) makes it possible to upload the appropriate code to start understanding the new protocol. This alternate approach assumes that not every user will be interested in creating behaviour protocols, so only those with this requirement will have the capabilities of doing so (XML compiler and parser). Once a protocol is created then the code is generated and becomes available to be downloaded. The details of how this is announced should be defined by the application although the facilities of doing so are not present at the moment. The DaBP has been designed within the context of only defining behaviours for the scenegraph, however it is possible to extend DaBP to define all application protocols such as control data.

Event Dispatcher. This layer is responsible of receiving the events after they have been decoded and pass them on to the respective destination. The way this event dispatcher is described seems to have it tightly coupled with the scenegraph since it seems to deliver the events directly to the corresponding nodes. Also this seems to indicate that the behaviours being defined are basically for scenegraph purposes so no control or application protocols are foreseen to take advantage of the DaBP. This approach would provide powerful flexibility since the system designer would only have to produce new protocols and replace the old ones that the underlying network related components would adjust themselves.

The dispatcher currently seems to be tightly coupled with the application layer where the scenegraph resides. An improved approach would be to adopt a delegation model where events were announced and those interested in them would use them, thus no prior knowledge of how the application is organised would be necessary.

Although to a much lesser degree, the Peer-to-Peer component shares the same obscurity as the others where several questions arise to the proposed and supported functionality.

Monitoring

The inclusion of monitoring is a major capability, which has been overlooked in existing virtual environment systems since it will empower the user with more precise information than merely stating that some node is inaccessible. As illustrated in Fig. 5, this component is further divided into 3 distinct layers:

Network Time Protocol (NTP). This layer is required to have accurate absolute time measurements, however it is completely obscure why not include it in the universal platform as one of the required modules that are part of the bamboo kernel or one of the recommended layers residing immediately above the universal platform. The reason for this observation stems from the fact of requiring some time synchronisation amongst all the hosts involved within a virtual environment session for the various data streams. In MiMaze [Gautier98] the requirement of NTP is crucial.

Simple Network Monitoring Protocol (SNMP). This protocol is the core of the monitoring component since it is well documented and supported on the routers. Therefore a reasonable functionality may be expected.

Monitoring. This block is ambiguous and not enough relevant documentation was found describing what it aims at. Maybe this targets more at world administrators that are informed of problems and require a more sophisticated suite of diagnosis and monitoring tools. Alas this is merely speculation, considering no details were found to what was its role.

Client

The motivation for a client component remains unclear. Once again the vrtp may be exceeding its boundaries becoming to heavyweight in the process. The component has 3 layers as illustrated in Fig. 5, which are briefly explained as follows:

Graphics acceleration card & Monitor. This is the layer that actually displays visually the image to the end-user. The device may be the monitor or the HMD. Therefore the vrtp should be completely oblivious to how the data is visualised and what are the output devices. Actually this should be the responsibility of the API used for rendering.

Scenegraph and Rendering. This layer is where the API for rendering resides which transforms a scenegraph into the whatever is necessary for viewing purposes.

Scenegraph Abstraction. The motivation of this layer is unnecessary if vrtp does assume the responsibility of having any client component. The delegation model for event dissemination will allow this, making the need for a general scenegraph abstraction completely unnecessary.

Conclusions

The first and most important comment is that vrtp is a wonderful concept and the community has been yearning for such a thing for some time now. The main problem is that vrtp as it is has nothing to do with being a protocol and resembles much more with a framework, which has identified its main components. However it seems that what is presented is much too heavyweight and quite possibly will end up as yet another framework for sharing 3D content. Some of the components are necessary but others belong to the application layer.

Unfortunately there exists an absence of documentation with well-defined design both in terms of an integrated overview and individual component basis. This may produce working implementations that do not integrate well together.

The design goals of VRML-NG to reduce VRML to components in order to have a light kernel should be used as guideline in the design of vrtp. Another document [Oliveira99] contains proposed modifications and extensions to the current vrtp framework based upon the principle of a reduced functional kernel, which reduces the barriers of adoption by not only browsers but also current systems.

References

[Abrams98] H. Abrams, K. Watsen and M. Zyda, "Three-Tiered Interest Management for Large-Scale Virtual Environments", VRST’98, Taipei, 1998. http://watsen.net/Bamboo/vrst98.pdf

[Bray9?] T. Bray, J. Paoli and S. McQueen, "Extensible Markup Language (XML)", approved W3C recommendation, http://www.w3.org/TR/REC-xml

[Brutzman97] D. Brutzman et al, "Virtual Reality Transfer Protocol (vrtp) Design Rationale", WET ICE: Sharing a Distributed Virtual Reality, Massachusetts, June 1997. http://www.stl.nps.navy.mil/~brutzman/vrtp/vrtp_design.ps

[Capps98] M. Capps, "The QuICk Model for Virtual Environments System Optimization", Proposal for PhD dissertation, Naval Postgraduate School, Montrey, 1998

[Floyd97] S. Floyd et al, "A Reliable Multicast Framework for Light-Weight Sessions and Application Framing", IEEE/ACM Transactions on Networking, Vol. 5, Nº6, pp. 784-803, December, 1997.

[Gautier98] L. Gautier and C. Diot, "Design and Evaluation of MiMaze, a Multi-Player Game on the Internet", ftp://ftp_sophia.fr/rodeo/diot/acm_multimedia.ps.gz

[Levine99] B. Levine et al, "Consideration of Content and Receiver Interest in IP Routing", TBD, 1999

[Liao??] T. Liao, "Light-weight Reliable Multicast Protocol", ??

[Marrin99] C. Marrin, "Beyond VRML", Whitepaper, http://www.marin.com/vrml/private/EmmaWhitePaper.htm

[Oliveira99] M. Oliveira, "Extensions and Modification Proposal for VRTP", Still writing it so I could not include this here.

[Schmidt94] D. Schmidt, "The Adaptive Communications Environment: Object-Oriented Network Programming Components for Developing Client/Server Applications", 12^th Sun Users Group, http://www.cs.wustl.edu/¬schmidt/SUG-94.ps.gz

[VRML97] "Virtual Reality Modelling Language", ISO/IEC DIS 14772-1, April 1997

[Watsen98a] K. Watsen and M. Zyda, "BAMBOO – A Portable System for Dynamically Extensible, Real-time, Networked, Virtual Environments", IEEE VRAIS, Georgia, 1998

[Watsen98b] K. Watsen and M. Zyda, "BAMBOO – Supporting Dynamic Protocols for Virtual Environments", IMAGE Conference, Arizona 1998