1. Executive Summary
  2. The aim of this report is to state the current understanding of the Virtual Reality Transfer Protocol (vrtp) while providing comments with the intent of requesting elucidation or instigating further discussion. It should be stated up front that the way that vrtp resembles more of a framework than a protocol. As such the document will frequently use the word framework when referencing vrtp.

    The thoughts and ideas contained herein result from the interpretation of the author regarding the existing documentation available either on-line or through publications, meaning that there is no warranty that what is written corresponds to what is expected. The document has benefited from the input from several people…

  3. Introduction

The design rationale of vrtp was published in mid 1997 [Brutzman97] and currently has been proposed as a Web3D working group promoting the contribution from the entire community. The main idea of vrtp is to support 3D content the same way that http supports HyperText Markup Language (HTML) in a simple and scalable way.

Unarguable, one of the main reasons why the www has an exponential growth is due to the simple reality that anyone can provide content by building a webpage, using HTML. Whether the webpage is built from either a powerful WYSIWYG tool or a simple text application is irrelevant. The equivalent to HTML for producing 3D content is known as the Virtual Reality Modelling Language (VRML) [VRML97] and despite its flaws [Marin99] and inherent problems, the language has been embraced by the industry as the means of creating 3D on the World Wide Web (WWW). This adoption may be justified by the fact of the language being an approved standard that empowers anyone with the ability to build virtual worlds in 3D without any programming effort. This capability is of utmost importance since it permits people to focus on content rather on technological details. Nevertheless VRML is not the ideal solution and remains plagued by deficiencies mainly due to its heavyweight nature that leads to bulky implementations with too many architectural interdependencies. It is hoped that the next generation of the standard (VRML-NG) will reduce this complexity and increase simultaneously its flexibility. This report will not concern itself with these issues unless it has direct impact upon the design criteria of the underlying vrtp layer.

One of the most acclaimed shortcomings of the current version of VRML is related to the impossibility of a virtual world to be simultaneously shared amongst all users. Although people point out the necessity of this requirement, it does not belong within the scope of the objectives of any current or future VRML version. This should be a pitfall to avoid when designing the next generation of VMRL-NG, since no explicit or implicit constructs should be present in the language to support multi-users. The reason for this concern is twofold:

These arguments demonstrate the need of delegating the responsibility of managing multi-users and other management tasks to an underlying layer. The interconnection of multi-users sharing a common virtual environment implies the usage of networking to extend the illusion of presence, thus incorporating all those involved although geographically distributed. The most prominent network infrastructure that encompasses the world is, without possible debate, the Internet. However it has been clearly identified that the existing protocol HTTP, although very well suited for HTML, fails miserably regarding support to multi-user large-scale virtual environments. Therefore vrtp proposes to provide the protocol that will provide the much sought interconnectivity amongst many users while making it completely transparent to the person building a world, much in the same way as building a webpage. Since it is assumed that the target network will be the Internet, or some incarnation of it, naturally vrtp will focus on the problems associated to the inherent characteristics of this infrastructure, although not constrained to it.

Upon analysing the WWW it is concluded that regarding the building of HTML content the main artifice that provides access to the network is the usage of Uniform Resource Location (URL). People may not know what exactly a URL represents, but they definitively know what is its purpose and what is achieved with their usage, thus the WWW consists of content linked via URLs. The vrtp borrows the same philosophy by exposing the networking layer by usage of URLs. Although the approach of using URLs provides a simple abstraction there is a series of components necessary to make vrtp a reality as the remainder of the document will demonstrate.

  1. Data Streams
  2. The core to any application or system is the data it handles. When considering the scope of a virtual environment it is important to consider that there exists a rich set of data requirements with different characteristics, which makes it unfeasible the simple adoption of a single data protocol that satisfies all the situations.

    For purposes of establishing terms, the word stream will be used to convey the concept of a virtual channel where data is communicated. These streams will reflect the related topology, being one-to-one, one-to-many or many-to-many, depending upon the requirements of the data to be communicated and the resources available. The nature of a stream is either continuous or discrete, where the former is appropriate for TCP connections while the latter requires UDP connections. Independent of their nature, every stream has associated to it a protocol stack, containing all the allocated protocols that are valid for that particular stream.

    The problem domain regarding the network layer within virtual environments only becomes interesting with the consideration of a large number of multi-users.

    In [Brutzman95] a careful categorisation of types of data within a virtual environment is presented that is the foundation of vrtp. These categories are the primitive types from which all the remainder may be constructed from by their composition. The explanation of these categories will be discussed in the following subsections but with further elaboration to stress the need of adopting specific protocols that are most appropriate to the particular data stream. The same terminology is used to facilitate the matching of terms and concepts.

    1. Light Weight Interactions
    2. Light-weight interactions encompasses all the data that is discrete in time, the amount of information is small and unique in the timeline without any guarantee that similar data preceded or will follow. The very nature of these interactions makes UDP an ideal candidate to support these streams due to the flexibility and time efficiency.

      Adopting UDP places the burden of network management upon the application since no error checking; package loss control or package sequencing is done. The light weight streams should implemented by using a common structure for both ordinary unicast UDP and multicast. Considering the wealth of existing protocols that abound, each with its strengths and weaknesses it should be the decision of the application which one is adopted. However vrtp should allow the integration of different types of protocols, where each is most appropriate to a particular task at hand, since there is a vast family of data types with either similar or completely different requirements.

      With the purpose of clarifying further what light weight interactions are, some concrete examples will be presented.

      1. Update Data

This data corresponds to the one that is most frequently present on the network [ref]. When analysing the nature of the data it is possible to discern that it has the following characteristics:

Naturally this is a linear approach however it is possible to increase the accuracy by increasing the order of the equation. All systems that use dead reckoning resort to the predictably of the update data to avoid sending data unless an error of prediction exceeds a pre-defined threshold. In [Singal thesis] this approach is improved with the usage of past information, which allows smoother and more realistic results.

This kind of data is ideal for multicast however it only requires a very light weight protocol due to the weak reliability constraints. However the decision of what protocol to associate to the stream should be application driven, although a default setting may be provided.

      1. Event Data

This data is transmitted using light streams because of its size however unlike the move update, this data is much more demanding towards reliability. The packets must be processed in the same order they were transmitted without any missing data. When analysing the data it is possible to verify trends that are completely opposite to the above:

      1. Control Data
      2. This type of data has specific requirement needs from the network and quite possibly may require TCP connections in some cases where total reliability is required. The control data is responsible for a set of tasks such as control congestion, enabling join/leave multicast groups, dynamic management of groups of clients, common service requests, amongst many others.

        The control data may either have a constant rate or possess bursts resulting in high peaks of traffic. While the former may be appropriate for protocols such as SRM [Floyd97], the later does not suit it.

      3. System Data
      4. The requirements of system data are analogous to the control data, however it is a stream used for the purpose of the application. Examples of system data include opening audio data stream, receiving indication to retrieve portions of the scenegraph, etc.

      5. Network Pointers

This type of data is basically a pointer to a bulk of data, which may be provided upon demand or announced to a multicast group. An example of a network pointer is a URL. Also network pointers should indicate the location of services and possibly the means of accessing them.

In [Brutzman95] this was a category itself, however when analysing the nature of the data, it is another form of light interaction. Nevertheless network pointers have specific properties, which differ from the above descriptions, since they may be used either in the context of the application or the content itself in the form of a URL.

    1. Heavy Weight Objects
    2. This type of data streams consists of transferring large chunks of data from a source to a destination, consisting of a point to point connection between two entities. The uploading of a portion of a scenegraph from a server to a client application is a clear example of the usage of this type of stream.

      The data involved is of continuous nature and requires additional properties such as reliability and error checking thus being an ideal candidate for TCP connections. The time overhead of setting up a connection initially is minimal when compared with the overall time consumed in transferring the data.

      Another type of heavy weight object is the transmission of mobile code, which is dynamically integrated into runtime application kernel of client as the need arises.

    3. Real Time Streams

This type of data connection is required when handling any form of streaming data such as video and audio. Naturally the implementation of these streams is highly dependent upon the protocol adopted by the application.

  1. VRTP Component Analysis
  2. The block diagram of Fig.1 provides a general overview of the current identified components that are part of vrtp. Each of the displayed components will be detailed in the following subsections.

    The existing documentation and publications allowed the perception of a more or less accurate understanding of the components general functionality.

    Although the overview and understanding of the principal concepts is possible, more serious problems arise when drilling into the detailed internals of each component. The obscurity worsens when trying to discern how the components interact with each other and what is their combined functionality. This is the direct consequence of a total absence of a clear specification of the interfaces of the components involved. It would be interesting to have a description of various scenarios, based upon state transitioning diagrams, to have a better understanding how all of it works together.

    Also absent are the actual protocols to support the interaction between remote hosts. Although vrtp seems to be more of a framework, it initially was thought to be a protocol, however no proposal of its details is currently available. Even though it is possible to use the Dial a Behaviour Protocol to define customised protocols, it is necessary to at the very least a simple protocol, indicating how a new host connects itself to the virtual environment. Once connected proprietary protocols may start being used but it only is possible after the connection is established.

    1. Universal Platform
    2. The universal platform is portrayed in Fig. 2, which is composed of the Java Virtual Machine (JVM), Bamboo [Watsen98a, Watsen98b] plugins and the Adaptive Communications Environment (ACE) [Schmit94].

      This is the major starting point for the vrtp stack upon which all the remainder components are based, with the Bamboo architecture being the underlying backbone. The core foundation of Bamboo is the existence of a small kernel, containing the absolute minimum but allowing extensibility by the integration of plug-ins. The current version of Bamboo is implemented in C++ along with the ACE library. Although ACE provides and implements a lot of functionality to support distributed applications, Bamboo was required to develop and implement the plug-in architecture. This provides the basis for the universal platform.

      Interesting issue is that most, if not all, of the work undertaken by Bamboo could have been greatly simplified if Java had been adopted for the development of the kernel since the aim is the development of cross-platform architecture that allows the dynamic integration of code modules upon requirement. It is true that the JVM is included but merely to support the definition of behaviours because of the integration of Java with VMRL.

      The ClassLoader from Java allows to dynamically load code on the fly and employ the security policies enforced by the active SecurityManager. Also the implement interface facility allows the adoption of a delegation mechanism where incoming modules register themselves to receive specific events of certain characteristics from other modules much in the same way the JFC (formerly known as AWT) works. The implementation of such a strategy is greatly facilitated by the whole architecture and inherent properties of the Java language. Also Java reflection model could be adopted thereby allowing runtime discovery of the interfaces of each module.

      Not all the advantages are technical since the adoption of Java implied the usage of the Java Virtual Machine (JVM), which is widely supported across several leading vendors of Internet browsers. Should the JVM not be readily available then it is always possible to download the Java Plug-in. Therefore Jamboo (Java + Bamboo) would benefit from a much wider dissemination strategy.

      The only advantage seen in the development of a generic platform is the possibility of C/C++ programmers to interact with the whole architecture without penalties regarding the usage of mechanisms such as Java Native Interface (JNI). However the number of man years invested into the JVM makes it an attractive alternative, so most likely there will exist (hopefully) an implementation as is and another in the near future Jamboo.

      When Bamboo is stripped of the actual implementation details, there does not seem to exist an established interface definition for plug-in interaction. It is true that this could be defined by the application, however Bamboo would benefit from establishing one that could be extended if necessary.

    3. Server

When analysing the current topologies of existing systems, one is confronted with a chaotic mess where the dominance of a best practise case is non existent. Therefore the idea of vrtp was to provide the means to support client-server and peer-to-peer with all the various variations in-between. This means that the essential capabilities for networking were provided, delegating to the application the responsibility of how the services were used and interconnected, thus defining the required topology as necessary. Therefore vrtp would not present barriers to adoption by either client-server or peer-to-peer camps.

Unfortunately according to the components proposed, this does not seem to be the case. The server component in particular, which is illustrated in Fig. 3, far exceeds the responsibilities of the vrtp since the optimisation mechanisms proposed by QuICk belong to the application domain.

The component as presented also provides the means of the local host to become a temporary server from where other hosts may download content such as the avatar representation of that user.

This is the component that is responsible for the dissemination of heavy weight objects as described in the previous section 3. Considering that it has the HTTP server, perhaps it should also be responsible for all the data streams that require high reliability when the overhead time associated to establishing TCP connections may be considered negligent. Unfortunately it seems that according to existing documentation, this type of data streams is completely neglected by this component and any other.

The layers of this component are as follows:

    1. Peer-to-Peer (Transient Behaviour)

This component is probably the one that provides the most information, proving to be currently the one with most thought given. Therefore it is only natural that a better understanding results in more accurate description. All the data streams mentioned in section 3, corresponding to light interactions, network pointers and real-time streams, are handled by this component. Thereby the communication model is based upon different variations of UDP and naturally multicast.

The various layers that are involved are depicted in Fig. 4 and are briefly explained as follows:

The AOIM has 3 tiers, where the first is responsible for the actual joining and leaving of multicast groups (address plus port) along with the dynamic partitioning of the octree, which is the supporting data structure. Naturally the partitioning of the octree depends upon the size and scale of the area of interest (AOI) of the user. While the principle is sound for a local basis it does not say what is relation between different hosts that share overlapping AOI. There is a need to understand how different AOIM keep consistent considering that although their octrees are completely different, there remain common groups.

The second tier applies filtering and this goes along the principle that no matter what the address space is, there will never be enough address groups to provide each entity with the necessary addresses for per-entity subscription. Thus the second tier groups together those entities that share a common protocol.

The third tier applies further filtering to the previous tier and isolates entities that meet particular characteristics, meaning that they share common interests.

Although the approach of 3-tier is innovative, there remains the issue of how exactly are the second and third tier implemented. It seems that both tiers are merely the application of two sequential filters to the packets received via the multicast address group. A clearer understanding of this approach of addressing combined with filtering can be found in [Levine99]. So maybe a more accurate description would be a 3-tier approach where the first is addressing and the second is filtering. The former would not be restricted to the application of merely one filter.

The dispatcher currently seems to be tightly coupled with the application layer where the scenegraph resides. An improved approach would be to adopt a delegation model where events were announced and those interested in them would use them, thus no prior knowledge of how the application is organised would be necessary.

Although to a much lesser degree, the Peer-to-Peer component shares the same obscurity as the others where several questions arise to the proposed and supported functionality.

    1. Monitoring

The inclusion of monitoring is a major capability, which has been overlooked in existing virtual environment systems since it will empower the user with more precise information than merely stating that some node is inaccessible. As illustrated in Fig. 5, this component is further divided into 3 distinct layers:

 

    1. Client

The motivation for a client component remains unclear. Once again the vrtp may be exceeding its boundaries becoming to heavyweight in the process. The component has 3 layers as illustrated in Fig. 5, which are briefly explained as follows:

  1. Conclusions
  2. The first and most important comment is that vrtp is a wonderful concept and the community has been yearning for such a thing for some time now. The main problem is that vrtp as it is has nothing to do with being a protocol and resembles much more with a framework, which has identified its main components. However it seems that what is presented is much too heavyweight and quite possibly will end up as yet another framework for sharing 3D content. Some of the components are necessary but others belong to the application layer.

    Unfortunately there exists an absence of documentation with well-defined design both in terms of an integrated overview and individual component basis. This may produce working implementations that do not integrate well together.

    The design goals of VRML-NG to reduce VRML to components in order to have a light kernel should be used as guideline in the design of vrtp. Another document [Oliveira99] contains proposed modifications and extensions to the current vrtp framework based upon the principle of a reduced functional kernel, which reduces the barriers of adoption by not only browsers but also current systems.

  3. References

[Abrams98] H. Abrams, K. Watsen and M. Zyda, "Three-Tiered Interest Management for Large-Scale Virtual Environments", VRST’98, Taipei, 1998. http://watsen.net/Bamboo/vrst98.pdf

[Bray9?] T. Bray, J. Paoli and S. McQueen, "Extensible Markup Language (XML)", approved W3C recommendation, http://www.w3.org/TR/REC-xml

[Brutzman97] D. Brutzman et al, "Virtual Reality Transfer Protocol (vrtp) Design Rationale", WET ICE: Sharing a Distributed Virtual Reality, Massachusetts, June 1997. http://www.stl.nps.navy.mil/~brutzman/vrtp/vrtp_design.ps

[Capps98] M. Capps, "The QuICk Model for Virtual Environments System Optimization", Proposal for PhD dissertation, Naval Postgraduate School, Montrey, 1998

[Floyd97] S. Floyd et al, "A Reliable Multicast Framework for Light-Weight Sessions and Application Framing", IEEE/ACM Transactions on Networking, Vol. 5, Nº6, pp. 784-803, December, 1997.

[Gautier98] L. Gautier and C. Diot, "Design and Evaluation of MiMaze, a Multi-Player Game on the Internet", ftp://ftp_sophia.fr/rodeo/diot/acm_multimedia.ps.gz

[Levine99] B. Levine et al, "Consideration of Content and Receiver Interest in IP Routing", TBD, 1999

[Liao??] T. Liao, "Light-weight Reliable Multicast Protocol", ??

[Marrin99] C. Marrin, "Beyond VRML", Whitepaper, http://www.marin.com/vrml/private/EmmaWhitePaper.htm

[Oliveira99] M. Oliveira, "Extensions and Modification Proposal for VRTP", Still writing it so I could not include this here.

[Schmidt94] D. Schmidt, "The Adaptive Communications Environment: Object-Oriented Network Programming Components for Developing Client/Server Applications", 12th Sun Users Group, http://www.cs.wustl.edu/¬schmidt/SUG-94.ps.gz

[VRML97] "Virtual Reality Modelling Language", ISO/IEC DIS 14772-1, April 1997

[Watsen98a] K. Watsen and M. Zyda, "BAMBOO – A Portable System for Dynamically Extensible, Real-time, Networked, Virtual Environments", IEEE VRAIS, Georgia, 1998

[Watsen98b] K. Watsen and M. Zyda, "BAMBOO – Supporting Dynamic Protocols for Virtual Environments", IMAGE Conference, Arizona 1998