Real-Time Massive Model Rendering
Jean-Daniel
Nahmias
Supervised by Anthony Steed
Computer Science Department,
Master’s Thesis
Msc Vision, Imaging, & Virtual Environments
September 2002
Real-Time Massive Model Rendering
Jean-Daniel
Nahmias
Supervised by Anthony Steed
Computer Science Department,
Master’s Thesis
Msc Vision, Imaging, & Virtual Environments
September 2002
Abstract
Disclaimer
This report is submitted as part requirement for the MSc Degree
in Vision Imaging and Virtual
Environments at
2.3.1 Per-Object Image Warping with
Layered Impostors [5]
Figure 13 Example of remaining gap
artefacts
5.2 OpenGL Issues & Getting the
most out of the GPU
5.3 Oct-Tree Implementation Details
5.4 Other details addressed after
testing
6.2.1 Windows Timing Functions
6.2.2 Evaluating the Rendering
Pipeline
7.2 Tuning Pipeline 3-5 by Balancing
Oct-Tree
7.3 Tuning Pipeline 5 with Temporal
Coherence
7.4 Rendering Pipeline Comparisons
There
are various fields including Computer Aided Design and Scientific Visualization
that require the interactive display of large and complex 3D virtual
environments. There is a very strong motivation for the performance of the
display of these systems to be high (i.e. Real-Time). The performance of
display systems is usually measured in frames per second. This is the total
number of images that are sent to the display in one second and is a good
indicator of the performance levels of a virtual reality system. This is not
the refresh rate of a display system. Various studies have been carried out to
find what the lowest acceptable performance should be in order for users not to
be impeded. Different studies come to different conclusions, certain
performance levels that are considered reasonable for certain applications are
considered insufficient for others. However most developers of such systems
would agree that in general 30 fps per eye is considered adequate and 60 fps is
considered excellent [45]. When these frame rates are not achieved the results
can vary from a user wearing a Head Mounted Display feeling the sensation of
motion sickness to the user not achieving the desired productivity from the system
and experiencing frustration. Another motivation for theses frame rates come
from interaction. The simplest form of interaction would be an environment that
the user can navigate (i.e. walkthrough or fly through). For this interaction
to be continuous the system lag must be below a certain threshold [10]. This
allows the user to interact with the system without having to wait for a
confirmation of his/her actions. One factor determining lag is frame rate the
others are input sampling rate, output sampling rate, and software overhead. Unfortunately
computer systems cannot display infinitely complex environment infinitely
quickly. Therefore a compromise has to be made between visual quality and
rendering speed. A multitude of solutions have been utilized to solve this
problem. One example of a very naïve solution that has been used time and time
again in the domain of computer games, is for the
renderer to simply draw as much as it can in its allocated time slot starting
from the objects closest to the user and then ignoring the rest. Developers
using this technique try to mask it by using an effect called fog. This gives
the user a very short viewing distance and is very noticeable. If user was to
try and navigate a vast virtual environment it would be very difficult for the
user to make any kind of navigational decisions when he/she could not see the
target. This shortfall is just unacceptable today considering the current state
of technology in this field.
The
last few years has seen an incredible development of high performance consumer
level Graphics Processing Units. The development of these GPUs has exceeded
This
report will examine ways of improving rendering performance of general 3D
environments. It will do so by looking at techniques for high level
optimization as well as low level optimization. The former was achieved with
the use of carefully chosen algorithms while the later was obtained by careful
use of hardware as well as implementation details. Section two will discuss high
level optimization; this includes Visibility
Determination, Level of Detail,
Impostors. Optimizing is an iterative procedure and therefore the
development cycle of the software produced for this report was also iterative. Section
three will describe the various cycles of development. The design decisions
that where made along the way will be discussed in section four. The design
section is partitioned in two; firstly the Code
Base Design is described and secondly the Algorithmic Design is explained. Section five discusses some of the
various implementation details. This sectioned is followed by an overview of
the Testing procedures as well as the
techniques that were used to Evaluate performance.
Section seven presents the results of the different rendering algorithms as
well as some of the parameter tuning. The report then concludes with
suggestions for further improvement in section eight.
This
section of the report will focus on some of the current state of the art
algorithms used to optimize visibility determination, level of detail and
impostors.
Visibility Determination: The fastest
polygons are the ones that are not displayed. These are the polygons that are
not viewed by the user. These polygons are the ones that lie outside the
viewing frustum or that are occluded by other objects. Extensive research has
been carried out in order to find algorithms that can quickly discard these
polygons and therefore speed up rendering by the fact that less triangles or
polygons need to be drawn.
Level
of Detail: Another tool that can be used to improve rendering performance
is level of detail. Level of Detail refers to simplifying complex 3d geometric
objects. There are various algorithms that can achieve this. However once an
object has been simplified one looses the visual quality. This is why these
algorithms are generally applied to objects that are positioned far away from the
user. When objects are at a great distance from the viewer, the difference between the
original object and its simplified counterpart is greatly diminished by the
fact that when these objects are actually displayed they only occupy a small
number of pixels and therefore the lower detail version of the object is
indistinguishable from its high detail counterpart.
Impostors:
Impostors are used to fake geometric detail. There are various forms of
impostors. One common impostor is a billboard; this is a 2D representation of a
3D object that is then placed into the scene as a textured quad. This quad is
then continuously rotated as the viewer navigates in order to always be facing
parallel to the viewer. This is one type of impostor that falls into a general
category of impostors referred to as textured impostors. There are also other
forms of impostors such as point based impostors. Triangles located far from
the viewer may have their projection onto the screen smaller then a pixel. If
this is the case it is then redundant to represent the geometric surface with
the use of a triangle, one can simply use a point based representation for the
surface. This may not only speed the rendering performance but may also improve
the visual quality due to aliasing of triangles.
Visibility determination has been an active
research topic that has produced many algorithms over the past decades [30]. It
is the process of determining which objects are visible and which are not. The
3D virtual environment objects presented in this report are represented with
the use of triangles therefore this report will focus on visibility algorithms that
determine which sets of triangles are visible at any given point in time. They
are visible because they are inside the cameras View Frustum and are not occluded by other objects. One very simple
solution to the visibility problem is Hidden Surface Removal Z-buffer [23]. The
Z-buffer is a buffer that stores the depth values of the currently projected
triangles. Before a triangle is rasterized it is project onto the Z-buffer and
if any of its projected pixels have a greater depth value than current values,
those pixels are not overwritten and subsequently not drawn. The Z-buffer
offers a simple way of determining which pixels of which triangles should be
drawn at the cost of a little amount of memory. However this operation is
performed almost at the very end of the rendering pipeline. This entails that
should a triangle not be visible it would still have to be sent down most of
the rendering pipeline and therefore wasting resources and impacting
performance. This is why the Z-buffer is often used in conjunction with various
other algorithms. Computing the exact visibility can be very expensive, more so
than sending the extra triangles through the rendering pipeline therefore most
algorithms computed the potentially visible set of triangles (PVS). These usually
fall into three categories.
Exact
Visibility: This is the most precise form of visibility computation; the
exact visibility means that all visible triangles are sent down the pipeline
clipped if necessary and that all non-visible triangles are culled. This type
of visibility algorithm is usually very slow, and requires a vast amount of
pre-processing if used for real-time applications.
Conservative
Visibility: These algorithms determine a superset set of the visible
triangles. They remove most but not all of the non-visible triangles. The
Z-buffer can then be used to remove the remaining triangles. These algorithms
are not as precise as exact visibility algorithms but usually perform better in
terms of speed.
Approximate
Visibility: Approximate visibility algorithms are not exact or conservative and
can therefore in certain situations actually remove triangles that would in fact
be visible. If these algorithms are well designed, the errors that they produce
are in actual fact barely noticeable. The performance gain of these algorithms
can sometimes more then justify their errors.
Classification of Visibility Culling Techniques
Cohen-Or,
Chrysanthou, and Silva [30] use a set of features to classify various
visibility algorithms, these include:
·
Conservative vs.
Approximate:
Most
visibility algorithms are conservative and are used in combinations to produce
exact visibility. Techniques such as the Z-buffer are often referred to as
hidden surface removal algorithms and are used as a back end for other
conservative or approximate culling algorithms.
·
Point vs. Region:
Certain
techniques calculate the visibility for a certain view point while others
perform visibility calculations for certain regions of space. Point based
algorithms tend to be more flexible than region based algorithm at the cost of
performance or pre-processing.
·
Pre-computed vs.
Online:
Online
algorithms actually perform the visibility calculations during each rendering
loop they may however use certain pre-computed data structures while pre-computed
algorithms compute all the visibility off line. One example of pre-computed
visibility is the algorithm use the Quake engine [46]. These algorithms usually
require a great amount of pre-processing and usually impose great constraints
on the user’s navigation and the 3D environment.
·
Image space vs.
Object space:
This
refers to where the visibility calculations are performed. Certain algorithms
perform their calculation on the 2D projections of a 3D environment while
others perform their calculations on the 3D objects before they are projected
into 2D.
·
Software vs.
Hardware:
Certain
techniques are implemented on specialized hardware giving them a considerable
speed increase. Usually a less efficient algorithms implemented in hardware
will outperform a more efficient algorithm implemented on a general purpose
processing unit.
·
Dynamic vs. Static
scenes:
Static
scenes are environments that can be navigated by the user but have the
restriction of their subset of objects not being able to move in relation to
each other. This constraint allows the use of spatial coherence to be taken
advantage of with certain amount of pre-processing. Dynamic scenes have no such
restrictions.
·
Individual vs.
Fused occluders:
Both
these terms are used when discussing occlusion culling. Occlusion culling is
the process of determining which objects (occludee)
are obscured by which other objects (occluder).
However certain objects that are not occluders individually can become
occluders when combined. Occluder fusion is a technique that takes advantage of
this fact.
Visibility algorithms are usually designed
around one fundamental principle. That principle is to remove as much as
possible as soon as possible. This reduces the amount of unnecessary processing
of triangles along the rendering pipeline. One good method of achieving this is
with the use of View Frustum Culling.
The View Frustum is defined by the
virtual camera. It represents a volume or region of space that the camera can
see. It is therefore a region based
visibility algorithm.
Figure 1 View Frustum
Sutherland and
Hodgman [23] developed a 2D algorithm that is very efficient at clipping
polygons against clipping boundaries. This algorithm can trivially be extended
to 3D, for more detail descriptions please refer to [23]. It is still unfeasible to clip every triangle
against the view frustum no matter how efficient the algorithm used might be
with scene containing millions of triangles. However this limitation can be
easily overcome using spatial coherence. The following section will describe
various hierarchical data structure enabling one to exploit spatial coherence
to speedup view frustum culling. However all of these techniques are based more
or less on the same principles. They work by testing regions or volumes; if
these volumes are discarded by the clipping algorithm then all objects
contained by these volumes can also be discarded. If the volume being tested is
fully contained by the view frustum then all of its contents can be included,
however if the volume intersects the view frustum then further testing is
required to determine exactly which objects or triangles are to be clipped.
This last case is the worst case. Since most of the scene is usually clipped or
visible the penalty for this worst case is greatly out weighed by its benefits.
Spatial coherence usually reduces the number of clipping test. Most of the data
structures that will be presented in the following section where developed to
speedup ray-tracing but are being updated for VFC. One can find some good
reviews of these data structures in the area of ray-tracing research such as [41].
The following data
structures fall into one of two categories; the top-down and the bottom-up
approach. The former is based on selecting a region of space and recursively
subdividing it, while the later works by starting at the bottom of the scene
hierarchy and recursively grouping objects. This section will present some of
techniques and data structures used to exploit spatial coherence starting with
the simplest.
· Bounding Volumes
These
are the smallest geometric volumes encapsulating the object, usually spheres or
boxes. Because these are simple geometry shapes they are very easy to test
against the View Frustum, one can therefore quickly discard objects that are
outside the View Frustum without the need to test each triangle of the
contained object. Spheres have the advantage of quick intersection test while
boxes have the advantage of fitting the geometric objects better (i.e. tighter
fit). Therefore reducing the number of redundent intersection tests. Boxes come
in two main varieties Axis Aligned Bounding Box and Object Oriented Bounding
Box [22]. Again there is a trade off between cost of intersection and tighter
fit. Generally OOBB are able to fit objects more tightly than AABB. OOBB can be
calculated by using Principal Component Analysis (PCA), this is a statistical
tool that involves building a covariance matrix and the finding its eigen solution. From the eigen
vectors can then be used to find the orientation of the OOBB. The size of the
box can then be inferred by finding the largest distance from the centre of the
object to any vertex along each of the eigen vectors for more information on
PCA see [18].
Figure 2 Bounding Boxes
·
Hierarchical
Bounding Volumes
This
again is an extension to bounding volumes. This is a bottom-up approach where
groups of objects can be formed recursively. Objects closer together can be
grouped and encapsulated by another bounding volume and so on. This can yield
advantages in the sense that whole groups of objects can quickly be discarded.
However this comes at a cost, if a group intersects one of the view frustum’s
clipping planes more intersection test are required. The
deeper the hierarchy the greater the cost of the worst case. Balancing
these data structures can be difficult [24].
Figure 3 Bounding Sphere Hierarchy
· BSP Trees
Binary
Space Partitioning Trees where introduced by Fuchs et.
al. [47]. The BSP tree for a scene is constructed
recursively; this is done by selecting a partitioning plane against which the
scene’s triangles are tested. This splits the triangles into two groups, those
that are outside the plane and those are inside. If a polygon intersects the
plane it is split into two and each part is allocated to its correct section of
the tree. This process is then repeated for each subsection recursively.
Usually the partitioning planes contain a polygon from the scene for BSP tree
construction. The end result is a tree where each node represents a
partitioning plane and the leaves represent polygons. This data structure can
be traversed in a front to back order. Starting at the root node, the tree is
traversed recursively starting with the polygons at the far side of the node
with respect to the centre of projection than the root node, then the near side.
This is based on the idea that objects on the same side of the plane as the
centre of projection or view point cannot be obscured by polygons on the far
side. BSP tree can yield significant speed up of static geometry such
architectural walkthroughs [13].
·
Oct-Trees
The
Oct-tree is the 3D equivalent of a Quad-tree [24]. It is constructed by taking an axis aligned cubic volume that
encompasses the whole virtual environment. This cube is recursively subdivided
into eight identically sized axis aligned cubes. If any of the children cubes
satisfy certain conditions they are then in tern subdivided and so on. These
conditions are met if a cube contain a certain number of polygons and the cube
itself has not reached a minimum size threshold or the tree has reached a
maximum depth. If any polygon intersect the axis aligned cube it can either be
split or allocated to both children. Once the tree is constructed the view
frustum culling is recursively performed starting with the root node. If any
node intersects the view frustum then each of its children nodes is tested.
Figure 5 Oct-Tree
·
Kd-Trees
Kd-trees,
Oct-trees and BSP-trees can be seen as generalizations or specializations of
each other. A Kd-tree can be view as a BSP-tree where each partitioning plane is
axis aligned. It is also recursively constructed by selecting each axis in turn
and then selecting a partitioning plane along that axis to subdivide the space.
In practice Kd-trees are look very similar to Oct-trees but with much more
optimized bounding volumes.
As a user navigates a
virtual environment in real-time, most of the time the difference between one
frame and the next can be marginal. One can exploit this temporal
coherence to achieve culling optimizations. This is performed by assuming that
if an object is visible in one frame it is most likely to be visible in the
next. Therefore each object that fails to be culled by the view frustum can be
assumed to be visible for the next n frames, where is a threshold. However we
cannot make the same assumption about objects that did get culled, therefore
all objects that were culled in the previous frame must also be tested in the
current frame. The worst possible case for this assumption is if the user
rotates 180 degrees; if this is the case all of the previously objects are no
longer visible but would still be sent down the rendering pipeline. One could
however vary the n threshold depending on the user’s movement. One could cancel
the previously made assumption if the user moves beyond a certain threshold.
In the previous
section we examined how to take advantage of View Frustum Culling and how to
improve it by taking advantage of spatial and temporal coherence. This section
will examine another area of visibility determination based on occlusion
culling. This section will examine one of the more recent occlusion culling
algorithms developed by Zhang et. al.
[4] known as HOM (Hierarchical Occlusion Maps). This algorithm makes no
assumptions of the scene and works very well for densely occluded scenes. The
algorithm can be broken down into two sections;
Construction
of the
Figure 6 Rendering Pipeline of HOM algorithm
Visibility
Culling with the use of the
Figure 7 Example of Hierarchical Occlusion Map
For a more
detailed explanation of this algorithm please refer back to [4]. This algorithm
has some interesting properties. It is an image space algorithm that implicitly
performs occluder fusion with the
Figure 8 Occluder Fusion
When rendering
incredibly large datasets even the best visibility determination algorithms may
not suffice. In these situations one must look for alternatives. One such
alternative is Level of Detail algorithms. These algorithms purpose is to
simplify 3D meshes Oliveira et. al.
[44]. Level of Detail algorithms work by taking the highest detail mesh
and reducing the number of polygons used to represent that mesh in such a way
as to preserve the topology of the mesh. The topology of the mesh defines its
general appearance. Good algorithms obtain this by using a combination of error
metrics. If the topology of the meshes is preserved then lower detail mesh will
appear almost indistinguishable from its higher detail equivalent when
projected from a distance. Like visibility algorithms, Level of Detail
algorithms can fall into various categories. Eriskon et.
al [35] classified these algorithms into three basic
categories.
·
Geometry removal
This
refers to simplifying a mesh by removing vertices or collapsing edges from it
representation.
·
Sampling
Here a
lower level of detail mesh is reconstructed from the sampled data of the higher
detail mesh.
·
Adaptive
subdivision
The
different level of detail meshes are constructed by starting with a rough model
that is recursively refined by adding detail locally where needed.
When examining
various Level of Detail algorithms one can deduce that there are also other
categories that these algorithms can fall into, these include:
·
Continuous vs.
Discreet
The
most noticeable artefacts created by level of detail algorithms are usually
exhibited when swapping the representation of an object from one level of
detail to another; this can create popping or gaps. When approach to minimizing
this effect involves blending from one level of detail to another, continuous
algorithms such as progressive meshes [8]
avoid this problem. When we refer to continuous Level of Detail with regards to
a polygonal representation we generally mean that any level of detail can be
obtained by sampling at any point in the space domain of the algorithm.
Therefore an infinite number of meshes can be created between any two distinct
levels of detail. In practice however we are limited to the displays that are
discreet approximations.
·
View Dependent vs.
View Independent
Certain
level of detail algorithms produce different results based of the viewing
position while others don’t. View dependent algorithms can produce more
pleasing results at the cost of extra resources.
·
Pre-processed vs.
Online
Some
algorithms such as progressive meshes [8]
pre-calculate all the possible mesh simplifications while others perform their
calculations online.
·
Screen-Error
Bounded vs. Screen-Error Unbounded
Screen
errors are the projected errors. This is the difference between the projected
simplified mesh and the projected high detail mesh. This error metric is much
more accurate with respect to the perceived result. It also has the advantage
of allowing users to select their own preference with respect to the visual
quality vs. performance trade off. It can also be used to reduce temporal
coherence aliasing artefacts.
·
Global vs. Local
Global
algorithms apply a change globally across the whole mesh while local algorithms
might apply changes to local features.
One of the more
elegant solutions to the Level of Detail problem is that of Hoppe et. al. [8]. His algorithm referred
to as progressive meshes is
continuous. This report will give a brief overview of his algorithm, for a more
detailed explanation, please refer to his paper [8].
Progressive Meshes address many problems.
However this report is only concerned with the aspects that speedup rendering
it will therefore ignore certain details of Hoppe et. al. [8] paper. Progressive meshes work on the premise that
one can define a sequence of meshes differing in level of detail with the
careful use of a sequence of edge collapses. However the sequence of edge
collapses must carefully be chosen since it determines the quality of the
approximating meshes. The edge collapse transformation is also invertible by a
vertex split transformation. Therefore one can represent an arbitrary mesh at
any given level of detail by a coarse mesh and a sequence of vertex split
transformations.
Figure 9 Example of edge collapse
The progressive
mesh algorithm has two very nice features. The first is Geomorphs. This feature
enables a smooth transition between two meshes of arbitrary level of detail
with the use of a linear interpolation see [8]. The other nice feature is
selective refinement. This enables certain local features of the mesh to be
refined. Progressive meshes can be constructed using various algorithms. These
different algorithms vary between speed and accuracy. One can use different
metrics to determine the accuracy, Schroeder et al. [48] uses a “distance to
plane” metric while Hoppe et al. [8] uses the Edist metric. Different metrics can be used to preserver different
aspects of the mesh such as preserving geometry, scalar attributes and
discontinuity curves.
Figure 10 Mesh at different levels of detail
Impostor
algorithms exploit image data from recently rendered frames. This data is
reused in subsequent frames to avoid the expensive re-rendering of the complete
3D scene. This effectively fakes geometric surfaces with the use of an
approximate representation. One of the most common representations used to
achieve the desired result is an opaque image textured onto a transparent polygon.
These types of impostors are known as textured impostors. There also other
types such as Point based impostors [7] or line based impostors. There are many
different impostor algorithms and like most algorithms examined by this report
they can be classified into various categories.
·
View dependent vs.
View independent
View
independent impostors can be used from any given view point while view
dependent impostors need to be updated for different view points.
·
Dynamically
generated vs. Pre-calculated
Dynamically
generated impostors are calculated on the fly (as they are needed) while pre–calculated
impostors as their name implies can be computed off line.
Single Layer view
dependent impostors have the disadvantage of not containing any depth
information. This makes them very vulnerable to errors. They therefore cannot
be used for points of view that vary to any significant extent from where the
image was taken. To avoid serious visual degradation one must either
dynamically update them very often or one must pre-calculate a great number of
impostors from various view points and swap between them frequently. Another
alternative to these is to use Multi-Layered impostors. As opposed to a single
layer impostor that contains a single transparent polygon onto which an image
is mapped to, multi-layered impostors consist of many of these transparent
polygons. Every layer represents a slice of the object at varying distances
from the user. Using a greater number of layers produces a better
approximation.
Figure 11 Example of
Multi-Layered Impostors
As opposed to
single layered impostors, multi layered impostors are stored using RGBz where z
is the depth information. Most of today’s graphics hardware
support RGBA where A is the alpha component, this alpha component can be used
to store the depth information. When these impostors are rendered
multiple semi-transparent polygons are rendered as a pyramidal stack with the
view point at the apex. For each layer only the areas where the stored z component closely match the distance of the view point to
the layer, are drawn. To avoid cracks slightly overlapping depth intervals are
drawn onto every layer. See figure 12.
Figure 12 Example of Overlap in Multi-Layered
Impostors
The advantage that
multi-layered impostors have over single layered ones is accuracy.
Multi-layered impostors have a much longer life span due to the fact that view
point can move greater distances without producing as much error as single layered
impostors. They do however have the draw back of taking longer to generate.
Another disadvantage of this algorithm is that it produces aliasing artefacts
in the form of gaps for objects that are occluding. Please refer to [5] for a
more detailed description of this algorithm.
Figure 14
This project
focuses on optimizing real-time rendering with the use of various techniques to
achieve this goal. However optimization is an iterative process. One must
optimize the appropriate areas of the rendering pipeline in order to achieve
the maximum possible benefits. Please refer to section 4.2 to see the different
rendering pipelines that were created during the development cycles. The
optimization was initially performed at a high level and then progressively
refined at lower levels. It would be very costly in terms of time resource to
focus on optimizing a certain algorithm’s implementation and then to come to
the conclusion that another algorithm altogether would achieve far superior
results. Due to the nature of this problem the development of this software
went through several cycles or iterations. Each cycle can be broken down into
the same number of steps.
·
Requirements
Analysis
·
Design
·
Implementation
·
Testing
·
Evaluation
The following
sections of this report will focus on each one of these steps and their various
iterations. Each one of these steps produces information that is passed onto
the following step. Once the last step was reached one
iteration is completed and the results were then used to start the next
iteration until satisfactory results were achieved. Note that initially the
first iterations had a tendency to focus more on the design, these where
usually high level optimizations while the last iterations that where low level
in nature focused more on implementation. This type of Life Cycle is often
referred to by software engineers as the prototype approach to developing
software. This is often used when the initial requirements are not clearly
defined. In this case the high level requirement is simple, “To render large
and complex 3D environments in real time”. However the more low level
requirements are not initially defined. For research oriented projects
prototyping approaches are often employed and this suited the needs of this
project perfectly.
Before embarking on any form of design a
feature set of the application had to be established. One subset of the
application feature set is the rendering feature set. Because the goal of this
project was to explore how to improve rendering performance of geometrically complex environments the
rendering feature set was kept to an adequate minimum.
·
Rendering Large and Complex 3D Environments
·
Navigation of these Environments
·
Benchmarking Capabilities
·
Simple Lighting (Point & Directional Lights)
·
Material Properties
·
Texturing
·
Geometric Objects
This was the
initial rendering feature set, as the development progressed
this feature set was expanded to encompass all the acceleration techniques.
This section of
the report will not only look at the design of the software but will study some
of the design decisions affecting implementations that were made. Due to the
nature of this work not all the design could be completed initially and some of
it changed during the development. Good design at the beginning of a project
can save a lot of unnecessary work later on. The design can be broken down into
two sections; Algorithmic Design and
Code Base Design. Most of the code
base design could be done at the outset while the Algorithmic design went
through a few iterations and changed along the way. Certain considerations that
could potentially influence the design had to be taken into account during this
phase of development. These had a strong impact on the choice made during this
project. One of these was the target platform. The target platform for this
project was consumer level hardware not a high end special purpose SGI Onyx. To
be more precise the target platform was a PC with a good consumer level openGL
capable 3D GPU (GeForce3 or higher), a Pentium IV or equivalent CPU and 512MB
of RAM. It would be unrealistic to expect to achieve anything near real time
performance when rendering models containing at least 1 million+ polygons
without the use of hardware acceleration. For this project this hardware
acceleration came in the form of a GeForce3. However we are still waiting for a
fully programmable GPU and have to rely on the currently available features
provided by the current crop of GPUs. This imposes certain limitations in terms
of what algorithms can be hardware accelerated. With this in mind certain
algorithms were chosen over others on the merit that they could be hardware
accelerated or that they could leverage the hardware acceleration. Another
underlying thought that influenced certain design decisions was portability.
This software is not portable to other platforms,
however it was designed and implemented with certain care in order to
potentially ease the transition to another platform.
It was important to develop a strong code
base that could be built upon and easily extended. The code base was also
designed with simplicity in mind. This code base included the data structures of
the application, certain basic tools as well as the main application
functionality. An Object Oriented approach was used to design the code base.
The core of the code base was the scene graph. This is the set of objects that
represent the 3D environment. Scene Graphs are common to all 3D rendering
applications. They usually share a lot in common. As opposed to using some of the more commonly
available open source scene graphs such as Open Scene Graph, a simple one was
developed specifically for this project. This added the benefits of flexibility
as well as simplicity. Benchmarking was also a fundamental part of the code
base. To improve the software application, appropriate benchmarks had to be
performed. The benchmarks were performed by recording camera paths and then
playing them back while recording the time spent for each frame see Testing and
Evaluation Section 6. Another important tool that was developed as part of the
code base was a 3D Studio Max Plugin. There are a lot of commonly used 3D file
formats (VRML, DXF, 3DS, OBJ, etc…) supporting most of them directly would be a
great undertaking. 3D Studio Max is a very successful 3D authoring package that
supports most 3D file formats. Therefore by developing a 3DS Max export plugin
that would export any scene one would implicitly support all the formats
supported by this package.
The scene graph was kept to a minimum and
extended as needed. It supports Lights, Cameras, Geometric Objects, and
Materials. The following will describe all the classes used to represent the
scene graph through their iterations of design.
Figure 15 Scene Graph Hierarchies
Camera
Vector3f View Reference Point
Vector3f View Up Vector
Vector3f View
Plane
VF View
Frustum
This class represents the view of the user.
It also supported writing its state to a file. This was used to record camera
paths to a file that where later used for benchmarking. User Navigation was
support by this class’s methods. This class also contained the View Frustum
class.
View Frustum
Plane TOP,
BOTTOM, FRONT, BACK, LEFT, RIGHT
This class
represents the View Frustum used for certain visibility calculations. It
contains six planes to define that viewing volume.
GeoObject
Vector3f* Vertices
Vector3f* Vertex
Normals
Vector3f* Bounding
Box Vertices
Vector2f* UVs
Integer* Triangle
Indices
Material* Material
Matrix4f Current
Transformation Matrix
This class is used
to represent the geometric objects occupying the virtual world. The geometric
information is represented by indexed triangles. There are many other data
structures that can be used for this purpose such as the winger edge data
structure [23]. For the purpose of this report indexed triangles sufficed. The
objects location can be determined by the use of the current transformation
matrix [23]. This data structure also contains bounding boxes see section 4.2.2.
Material
Colour Specular
Colour Diffuse
Colour Ambient
Float Transparency
Float Shininess
Char* Map
Path
Integer Id
Char* Material
Name
This class is used to describe the material
properties of each object. It contains all the necessary data to implement phong
shading [23]. The map path is used if the material has a texture associated
with it.
Light
Vector3f Light
Position
Vector3f Light
Direction (used if light is directional)
Colour Light
Colour
Integer Type
(i.e. Directional or Point)
Float Intensity
One of the rendering features was to have
simple lighting. This class provides support for this.
Oct-Tree Node
Vector3f* Boundary
Octree Node Children Nodes (Pointer to eight children
nodes)
Integer Intersecting
Object Index
Integer** Intersecting
Triangle Index
Integer Depth
of tree
This class was used to represent the
Oct-Tree. The Boundary defines the cube for that node. Each node contains a
pointer to its children should it have any. They also contain pointers to all
intersecting geometric objects. For each of these geometric objects the node
contains an index into all of its intersecting faces. The depth of each is also
stored see section 2.1.2.
Scene Graph
Light Lights
Material Materials
GeoObject Geometric
Objects
Oct-tree Node Oct-Tree
Camera Cameras
The scene graph
contains the lights, cameras, materials, geometric objects and the Oct-Tree.
This class is used to manage all these objects and provide a few necessary
utilities. It can read and write all its data to a file store.
For a much greater
detailed description of all these classes please refer to the Appendix.
In
order to view various 3D models stored in a multitude of different file
formats, a 3DS Max Plugin was developed. 3DS Max is a professional 3D authoring
package that supports a very large feature set. It also comes with an SDK that
allows developers to extend its capabilities by developing different plugins.
These plugins range from advanced renderers to animation tools. The plugin
developed for this report was an exporter plugin that allows any scene currently
loaded in 3DS Max to be saved to a file directly supported by the scene graph
for this renderer. The plugin supports a subset of the Max scene graph. It
supports materials as well as one diffuse texture map. It also supports basic
lights and cameras as well as geometric objects. 3DS Max however supports a
wide variety of different representations for geometric objects. These include
indexed face sets, curved surfaces in the form of NURBS, Bezier Curves and
various others. This plugin takes any one of these representations and converts
them to an indexed triangle list before saving them to disk. The plugin is very
simple and is used by entering the file menu and selecting export. The user
then selects the format, filename and location to which he/she wants to save
the file. Having selected the format designed for this renderer the user is
then presented with a very simple options menu. This menu allows the user to
select which parts of the scene to export, it also has
a debug option that was implemented for testing purposes.
Figure 16 Screen Shot from 3DS Max Plugin
4.1.2.1 3DS Max Plugin File Format
The file format used to store the Scene
Graph was designed along with the 3DS Max plugin. It is an ASCII format, this
was chosen because they can be easily modified by a standard text editor, no
care has to be taken with regards the endienness of the platform and it can
very easily be debugged. The draw back of ASCII is the space required to store
these types of files is greater than a binary equivalent and they take a
greater amount of time to be read and written. The structure of file is however
very simplistic and is as follows.
Example File *.JDN
Number
of Materials = 1
Material
Name = Sky
Ambient
= 0 0 0
Diffuse =
0.2 0.552941 0.176471
Specular
= 0.9 0.9 0.9
Shine
= 0.25
Shine
Strength = 0.05
Transparency
= 0
Material
ID = 0
Map
Path = C:\3dsmax4\maps\Skies\Duskcld1.jpg
:
:
:
Number
of Lights = 1
Pos
= -53.9952 455.163 0
Direction
= 0 0 0
Colour
= 1 1 1
Intensity
= 1
Fall
= 45
Type
= 2
:
:
:
Number
of Objects = 1
Material
ID = 0
Num
Vertices = 8
Num
Faces = 12
Num
textured vertices = 12
Vert
= 44.89 92.909 0
:
:
:
Vert
= 78.8883 171.119 60
Face
= 0 2 3
:
:
:
Face
= 4 6 2
Textured
Vert = 0 0 0
:
:
:
Textured
Vert = 1 1 0
Textured
Face = 9 11 10
:
:
:
Textured
Face = 3 2 0
Face
:
:
:
Face
Vert
:
:
:
Vert
The application code handles the
application initialisation, the user interface and also contains the main
rendering loop.
Application Initialization
This part of the
application consists of initializing the hardware and the Scene Graph. Once the
Scene Graph has been instantiated it then reads the file containing the 3D
scene data. Once this has been completed the necessary data structure
pre-processing occurs.
User Interface
The user interface
is a simple one. The navigation of the environment is performed with fly
through interaction. The mouse controls the viewing directions while the
keyboard allows the user the move forwards or backwards along the viewing
direction. Other keys allow a strafing action. The user can also enable certain
viewing cues to be enabled. These include the visualization of certain data
structures used to perform visibility operations. These structures are not
actually part of the 3D environment. Lastly there are two more keys. One of
them allows the user to record a camera path that is subsequently stored to the
file system. The other allows the user to playback a recorded path while
engaging a benchmarking feature that stores the results in a benchmark file.
The rendering loop
and the pre-processing is discussed in the algorithmic design section 4.2 of
this report while more detail with regards to benchmarking is presented on the
evaluation and result section 6&7.
Figure 17 Initialization
Figure 18 Rendering Pipeline
The
above diagram shows the various iterations of the rendering pipeline. Please
note that this is a very simplified representation of each pipeline and is by
no means complete. However it does illustrate the differences between each
pipeline as well as the evolution the pipeline went through in order to achieve
real time performance. Each rendering pipeline is spilt in two; the first part
represents the calculations performed by the CPU while the second part
represents the calculations performed by the GPU. As GPUs are evolving they are
taking more of the burden off the CPU. This section will review every iteration of the rendering pipeline and discuss the
different algorithms selected to perform different pipeline tasks. It will also
justify all the decisions that were made. These decisions where made with the
careful use of evaluation tools. The evaluation techniques will be covered in greater
depth further on in section 6.
The
GPU can perform visibility with View Frustum Culling. The GeForce3 also
supports a certain level of occlusion culling, this hardwired into its
architecture. Finally the hidden surface removal is performed with the
Z-Buffer. The first step was just to read the scene graph and throw all the
triangles across the AGP bus and see what happens. As expected this yielded
very poor results. It was quite obvious from the benchmarks see Appendix that
too many triangles were being passed across the AGP bus and into the GPU.
Careful care was taken to minimise the state changes that would have to occur
in the GPU in order to setup Materials. Therefore all the objects where sorted
by material.
One
very simple solution to reducing this burden was to remove as much as possible
as soon as possible. However there was no point in implementing as exact View
Frustum Clipping algorithm when the GPU could perform this task much more
efficiently than the CPU. The easiest solution was to implement conservative
View Frustum Clipping see Section 2.1.1 with the use of Bounding Volumes. For
this task an Axis Aligned Bounding Volume [section 2.1.2] was created for every
geometric object in the scene graph. These bounding volumes where then tested
against the view frustum. If any intersections with an AABB succeeded the whole
geometric object was then passed onto the GPU.
This produced much better results then the previous version of the
rendering pipeline see Section 7. This
was an improvement however it was nowhere close to achieving adequate real-time
performance. The target scene contained in excess of six thousand objects.
Testing each one of these objects bounding volumes was too costly. The
evaluation of this pipeline led to the conclusion that the rendering loop was
CPU limited see Section 6.
Having
come to the conclusion that the pipeline was CPU limited the CPU conservative
VFC had to be optimized. It was time to start taking advantage to spatial
coherence. This could potentially reduce the CPU burden greatly. A decision
between which spatial subdivision structure to use had to be made. With a
little foresight it was clear that even spatial subdivision would not suffice
and occlusion culling would eventually have to be used. The spatial subdivision
algorithm would have to be useful to both VFC and Occlusion Culling. Two
structures satisfied both those needs perfectly these included Oct-Trees and
K-DTrees. The Oct-Tree data structure was selected because of its simplicity
and elegance. The construction of an Oct-Tree data structure [Section 2.1.2] is
very simple and the only parameter that has to be set is the depth at which the
tree stops. This depth can depend on two factors; the minimum amount of
triangles contained by any one node and the minimum size that a node can reach.
These two factors can affect the performance of both the VFC and the Occlusion
Culling more on this in Section 7. There are also small implementation
decisions that can be made when implementing Oct-Trees [Section 5]. The
Oct-Tree creation was performed as a pre-processing step. This occurred
straight after the scene graph had been read. A conservative Hierarchical View
Frustum Culling algorithm would walk the tree and remove all the non-visible
nodes. This produced very good results. The performance of this algorithm
produced a variable frame rate. In certain situations it yielded very adequate
frame rates while in others the frame rates dropped considerably. At this point
in the development most of the triangles that where not contained by the View
Frustum where culled very early in the pipeline. However there where still
triangles that where not visible due to occlusions, that
were still being sent down most of the pipeline until they were removed by the
Z-buffer.
It was
clear from the previous evaluation that the geometric load was still too great
however the previous rendering pipeline was not fill rate limited [see Section
6]. This indicated that some of the fill rate could be sacrificed in order to
reduce the geometry throughput. This is referred to as balancing the rendering
pipeline between fill rate and geometry throughput [see Section 7]. The goal of
this iteration was to achieve Occlusion Culling [see Section 2.1.4] and
therefore further reducing the geometric load on the GPU. An Occlusion Culling
algorithm had to be employed. The first choice was which type of occlusion
algorithm should be selected. Because of this extra fill rate an image space
occlusion algorithm was selected over an object space one. This not only enables
the rendering pipeline to be balanced but image based algorithms have a
tendency to be much more flexible in terms of the information they can be given
as well as implicitly supporting occlusion fusion. There was still a limiting
factor, imposed by the architecture of the target platform. This was the AGP
bus, unfortunately the AGP bus can only achieve high throughput in one
direction. This means that one can feed a lot of data to the GPU but it is
incredibly slow at retrieving data from the GPU. The problem with some of the
previously discussed Image base Occlusion Culling algorithms is that they
require the reading back of the Z-buffer from the GPU. Doing this for each
frame would have a very serious impact on performance. An alternative had to be
found. This alternative came in the form of an OpenGL extension supported by
NVidia and various other manufacturers that will soon make it into the OpenGL 2
standard. This extension allows queries to be sent to the graphics card, these
queries return the number of Z-buffer pixels that were written to by a set of
primitives. I.E. one can create a query then proceed to draw a triangle and
subsequently obtain the number of pixels that where projected from that
triangle onto the depth buffer. For a pixel to be written to the depth buffer
it would have to have a lower depth value (i.e. be in front). With this
extension in mind the following algorithm was implemented. This algorithm
consisted of firstly performing Hierarchical View Frustum Culling of the
Oct-Tree. The Z-Buffer from the previous frame was kept. Then occlusion queries
where generated, these queries were used in conjunction with the rendering of
every Oct-Tree node that had not been culled previously. Once these nodes had
been rendered the queries would return exactly which nodes were visible, (the
Z-buffer is then cleared) the triangles contained in this subset of nodes were
then selected and rendered. It is important to note that this is an approximate
visibility algorithm. This means that in certain situations it will not render
objects that in actual fact should be visible. This seems worse then it
actually is. The worst possible case is for the camera to rotate 180 degrees
about its View Up Vector and for all the objects
contained within its newly placed viewing frustum to be further away then all
of the previously rendered objects and for the previously rendered objects to
have taken up most of the Z-buffer. This would create an almost empty frame
where very few objects would be visible. However it would only happen for one
frame, the following frame would clear the previous Z-buffer and all the
visible objects would reappear. In practice this very rarely occurs. The faster
the frame rate the less time that worst case frame would stay present on the
screen and the less noticeable this aliasing would be. This potential error
could also be capped to a maximum error by introducing some pre-emptive
mechanism that would detect very extreme movement from one frame to the next
and clear the Z-buffer should such an event occur. The advantage this approach
has over a more conservative equivalent is that this algorithm does not need to
perform any kind of occluder selection. Therefore no occluder pre-processing
steps are required and no resource are utilised in determining which object
should be selected as occluders. I would also like to mention that this
algorithm was originally implemented with some skepticism but was found to
produce very good results.
The evaluation of the 4th pipeline yielded very good result
real-time performance with the test scenes were achieved. However it did
introduce a fill rate limitation. This was due to the fact that all these
Oct-Tree nodes not removed by the Hierachical View Frustum Culling
where being rendered for each and every frame. One way to improve this
limitation was to use temporal coherence [see Section 2.1.3]. This can be
achieved by considering Oct-Tree node visible for certain number of frames. By
doing so it does need to be rendered for occlusion queries for the specified
number of frames. If its contents are in actual fact not visible it is the
responsibility of the View Frustum Culling on the GPU to discard. This allows
greater tuning of the graphics pipeline and enables a much more fine grain
control when balancing fill rate vs. geometry throughput [see Section 7]. Another
very simple way to double the fill rate was to introducing Back Face Culling, this is the process of removing all triangles
that are facing away fro the camera. This very easily implemented with the use
one OpenGL function call. It does put the constraint that double sided surfaces
are no longer supported. Unfortunately the nature of the scenes used to test
this pipeline is architectural in origin and subsequently the geometry
describes more the topology of the objects and to a much lesser extent surface
detail. This has a tendency of making level of detail less useful because
further reductions of triangles alter the topology. A little experimentation
was carried out and produced very poor results. An alternative had to be found.
The only alternative left was impostors. Texture impostors are very prone to
serious aliasing artefacts and for that reason a point based approach was
selected. This was also very well suited to the previous pipeline. As one may
recall extensive use was made of one OpenGL extension that enabled the number
of Z-buffer pixels that where touched to be retrieved. This could also be used
to determine which object representation to use. A threshold could be set; this
would determine which representation of an object to use. If the threshold was
exceeded the original object representation would be used otherwise a point
impostor would replace it. However due to time constraints the Point Based
impostor algorithm was not completed and therefore could not be tested.
Therefore the only difference separating the 4th and 5th
rendering pipeline was the temporal coherence technique.
Pseudo Code for Final Rendering
ClearColourBuffer();
pCamera->look(); //
Update Camera Location
pCamera->m_pFrustum->calculateFrustum(); // Calculate new view frustum
pOctree->CullOctree(pptrNodes,*(pCamera->m_pFrustum));
// Hierarchical View Frustum Culling
glGenOcclusionQueriesNV(g_NumVisibleNodes, ptrOcclusionQueries); // Generate
Occlusion Queries
glDisable(GL_LIGHTING); //
Disable Lighting
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE); // Disable
Colour Buffer
glDepthMask(GL_FALSE); //
Disable writing to Z-Buffer
for ( int i=0 ; i<g_NumVisibleNodes ; i++ ){ // For each visible Node
glBeginOcclusionQueryNV(ptrOcclusionQueries[i]);// Begin
Occlusion Query
pptrNodes[i]->DrawAABB(); //
Render Bounding Box the box was not visible 3frames ago
glEndOcclusionQueryNV(); //
End Occlusion Query
}
glEnable(GL_LIGHTING); //
Enable Lighting
glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE); // Enable Colour
Buffer
glDepthMask(GL_TRUE); //
Enable write access to Z-Buffer
glClear(GL_DEPTH_BUFFER_BIT); //
Clear Z-buffer
for ( int j=0 ; j<g_NumVisibleNodes ; j++ ){
glGetOcclusionQueryuivNV(ptrOcclusionQueries[j], GL_PIXEL_COUNT_NV,
&pixelCount); //
Retrieve Occlusion Query result
if (pixelCount > 0 && < 20 ){
pptrNodes[j]->DrawImpostor(); // If pixels touched smaller than 20 draw impostor
}else{
pptrNodes[j]->DrawNode(); //
Otherwise draw true representation
}
}
glDeleteOcclusionQueriesNV(g_NumVisibleNodes, ptrOcclusionQueries); // Delete Queries
This section will
focus on implementation details. It is very important to take extreme care
while implementing any kind of system that requires high levels of performance.
The key to the success of this project was selecting good algorithms that could
truly harness the power of the GPU. It is therefore necessary to fully
understand the underlying architecture of the platform. With this understanding
one can specifically tailor certain aspects of the implementation to take
advantage of the hardware or in other cases avoid some of its pitfalls.
Gone are the days when highly optimized
rendering loops where completely implemented in assembly. Developing in
assembly allows the developer full unrestricted control over the hardware.
However as compilers have evolved and can produce optimized assembly there is
no longer the need to develop at such a low level most of the time. The average
developer would not be capable of matching a compilers performance when it came
to developing a full blown application in assembly. However there are some
exceptions to the rule. This renderer was fully developed in C++. This language
was selected because it is Object Oriented which matches the design, its easier
to port, it has now reached a certain level of maturity and there are many good
compilers for it that can produce optimized code. Some may argue that standard
C can yield better performance but it is my opinion that generally if care is
taken when implementing software in C++ this is not the case. The advantages
that C++ offers in terms of productivity greatly outweigh its disadvantages.
To
harness the power of the GPU one must be able to interface with it. This
functionality is provided by an API. There are really only two to choose from;
OpenGL vs. DirectX. The distinction between them is gradually fading away.
DirectX has finally reached a certain level of maturity that OpenGL obtained a
while ago. The choice was simple; once again portability comes to mind. OpenGL
has much wider support for a variety of different platforms. However with the
very rapid evolution of GPUs the OpenGL standard has not had the time to catch
up. This has led to a multitude of different OpenGL extensions each supporting
different GPUs. Unfortunately using these extensions had become unavoidable.
There is hope with the new OpenGL 2 standard which hopes to unify most of these
conflicts. To achieve good performance levels one must be very careful in
choosing which OpenGL function to use when drawing triangles. OpenGL supports a
whole variety of different ways of drawing triangles these include:
All of these
methods with the exception of compiled vertex buffers were implemented with
very different results. The immediate mode is the easiest to use and probably
the most flexible however this comes at a very high price. Its performance is slow
see Section 7. The reason for this is
that too much information has to be sent across the AGP bus. This includes many
OpenGL commands at least three usually four per triangle. One
for each vertex and one for the vertex normal. This can be greatly
reduced by wrapping these commands into a display list that can then be cached
by the GPU. Display Lists improve performance to a small extent at the cost of
only being able to use them for static geometry. Vertex Buffers improved the
performance very substantially. This is because much less information has to be
sent across the AGP bus and shared vertices only have to be sent once. However
the fastest way to draw triangles on NVidia’s GPUs is to use one of their
extensions
There where
certain details that had to be addressed when implementing the Hierarchical
Oct-Tree Culling. Firstly during the
pre-processing stage when the Oct-Tree is constructed certain triangles
intersect two or more nodes. A decision had to be made on how to deal with
these triangles. They split into one or more triangles each associated with its
appropriate node. This would however greatly increase the number of scene
triangles something that would not be desired. The alternative would be to
associate the triangle with more than one node. To avoid drawing the same
triangle multiple times these types of triangles could be flagged each time
they are rendered for each frame and therefore avoiding overdraw.
When building the Oct-Tree there is one crucial decision that has to be made.
This decision involves determining when stop subdividing the Tree. One trivial
case is to stop subdividing if a node does not contain any triangles. However it
would be very costly to build a tree where each node contained one triangle.
This would also defeat the purpose of the tree. Oct-Trees are usually
subdivided until each contains no more than certain amount of triangles or the
node has reached a minimum size. The thresholds used for this determine how the
final tree is balanced. Obtaining a good balance is also crucial. This balance
is a trade off between VFC, Occlusion Culling efficiency as well as GPU cache
hits. This balance is quite tricky to achieve and also very dependent on the
type of scene being rendered see Section 7. Sparse Scene where the objects are
quite spread out will favour a coarser Oct-Tree while very dense scenes will
favour a finer grained Oct-Tree. Then again if the Oct-Tree is too fine grained
this will slow down the VFC and minimize vertex cache hits of the GPU, while
obtaining more accurate Occlusion Culling. Since building the Oct-Tree is an
offline process the best way to balance the tree would be to perform some kind
of cost benefit analysis for various parameters and a particular scene. There
are also small implementation details with regards the Hierarchical View
Frustum Culling of the Oct-Tree that can yield minor performance advantages.
Obtaining a back to front ordering of the tree will improve the rendering
performance due to the fact that modern GPUs also have some kind of Occlusion
Culling in their pipeline. Some other small advantages could be gained by not
testing the children of a node that falls completely within the View Frustum as
well as testing only a subset of a node’s children depending on which vertices
of the parent node were contained within the Frustum. All of these are small
examples of simple implementation details that help squeeze that little bit of
extra performance out of the culling algorithms.
This part of the
report will focus on the little details that had to be addressed during the
implementation. Most of these only became apparent after testing. To test the
system a model of the Millennium Dome was
used. This model was more or less polygon soup. This presented its own
challenges. Firstly the ordering of the triangles was not consistent. This
meant that consistent back face culling could not be achieved. Given the size
of this model 1million+ triangles it was not feasible
to fix this model by hand. Fortunately enough [26] developed an algorithm that
could automate this process. This could also be used to fix normals that were
pointing in opposite directions and therefore producing certain lighting
artefacts. Another issue that quickly came to attention was one of numerical
accuracy. Careful consideration when constructing bounding boxes had to be
taken. It also created tearing artefacts caused by the low precision of
Z-Buffers on consumer level graphic cards. This could be remedied in various ways;
firstly the viewing volume should be as small as possible while still
containing all the visible objects. This can be achieved by dynamically moving
the front and back clipping planes. It could also be achieved by firstly
rendering the front half of the scene and subsequently rendering the second
half after having appropriately changed the clipping planes. This would however
turn the rendering pipeline into a two pass pipeline. It was found that by
moving the front clipping plane as far as possible combined with moving the
back clipping plane as close as possible used in conjunction with the NVidia
VAR extension and Back Face culling eliminated most if not all the tearing
artefacts.
Figure 19 above example of tearing artefacts, below
example of triangle ordering artefacts
Various
tests where performed to verify algorithms and their implementations. Two test
scenes were utilised for this project. The first was a relatively simple scene
of islands and a sailing boat. This was a small scene that could be quickly
loaded. It was used mainly to test all the small changes that were
progressively made to the code base. If everything worked with this scene the
large Millennium Dome scene was then used. This scene was stored in a large
file that would take a few minutes to load and pre-process. Some of the testing
was performed with visual cues. One example of this was when generating AABB
and Oct-Trees they would be displayed on the screen. Other tests involved
rendering various camera positions with different algorithms and comparing
there results. The use of various debugging tools was also employed when
necessary as well as output of various data to the consol during the rendering
loop. Because this project was of a prototyping and research nature it was not
tested as rigorously as any commercial product. But the necessary care was
taken to insure that the results are accurate.
Figure 20 Simple Terrain Test Scene
Figure 21 Screen
shot of developing environment displaying the rendering
Window
containing AABB and the consol window indicating
The number
of Objects passing the VFC
Figure 22 Screen shot showing the Oct-Tree partitions
Evaluation is
absolutely crucial in order to optimize any type of algorithm. The techniques
used to evaluate also have to be relevant to the type of problem. Benchmarks
therefore have to be carried out on the various algorithms, these benchmarks
involve being able to consistently performming timing analysis on the same
scene with the same set of camera location. The timing functions used also have
to be very accurate. Sub millisecond accuracy is therefore needed. With the
correct evaluation techniques such as profiling, one can pin point the current
areas of which algorithms are the limiting factor.
The appropriate timing functions have to be
used. These have to be very accurate have a low latency and must require too
much system resources. Various timing functions were tested so that the
appropriate one could be selected. When timing it is very important to know how
much time the actual timing function takes. The following are the results of timing
function benchmarks. Please note that all these timing functions were
benchmarked consistently. The following is an output of various timing
functions that can be performed in windows on the PC platform. This output
measures the performance of varying timing functions over different frequencies
of use. It outputs for each method the number of times the function was called,
the total time taken for all the iterations to be performed and the average
time each call takes.
QueryPerformanceFrequency() freq = 0 1193182
method 0:
QueryPerfCntr..() 100 times
tot: 0 760
avg time:
6.36952e-006
method 0:
QueryPerfCntr..() 500 times
tot: 0 3842
avg time:
6.43992e-006
method 0:
QueryPerfCntr..() 1000 times
tot: 0 11492
avg time: 9.63139e-006
method 0:
QueryPerfCntr..() 10000 times
tot: 0 98118
avg time:
8.22322e-006
method 1:
GetTickCount() 100
times
tot: 0 10
avg time:
8.38095e-008
method 1:
GetTickCount() 500
times
tot: 0 20
avg time:
3.35238e-008
method 1:
GetTickCount() 1000
times
tot: 0 72
avg time:
6.03428e-008
method 1:
GetTickCount() 10000
times
tot: 0 233
avg time:
1.95276e-008
method 2:
TimeGetTime() 100
times
tot: 0 30
avg time:
2.51429e-007
method 2:
TimeGetTime() 500
times
tot: 0 111
avg time:
1.86057e-007
method 2:
TimeGetTime() 1000
times
tot: 0 214
avg time:
1.79352e-007
method 2:
TimeGetTime() 10000
times
tot: 0 2634
avg time:
2.20754e-007
method 3:
Pentium
internal high-freq cntr() 100 times
tot: 0 11
avg time:
9.21905e-008
method 3:
Pentium
internal high-freq cntr() 500 times
tot: 0 20
avg time:
3.35238e-008
method 3:
Pentium
internal high-freq cntr() 1000 times
tot: 0 38
avg time: 3.18476e-008
method 3:
Pentium
internal high-freq cntr() 10000 times
tot: 0 320
avg time:
2.6819e-008
Again various
techniques were used to evaluate the rendering pipeline. Evaluating the
pipeline as a whole was achieved with camera paths that were recorded and
played back. When they were played back the exact time spent for each frame was
recorded and written to a file. This data is used to tweak certain parameters
such as the Oct-Tree balancing [see Section 7]. Timing the frame rate only gives a certain idea of the
performance levels. This allows the rendering pipeline to be evaluated as a
combination of algorithms. However it does give a good indication of each
individual algorithm’s performance. Other techniques had to be employed for
this. The camera paths were also carefully created. Different algorithms work
more or less well for a variety of situations. Different camera paths where
created to try and explore the strengths and weaknesses of each algorithm or a
particular rendering pipeline. The test model of the Millennium Dome had the
advantage of containing a whole variety of different situations that could
challenge the pipeline. It contained dense as well as sparse separation of objects.
It had small and very large objects, from the thin wire structures used to hold
up the roof to the large towers used to hold the lighting equipment.
The key to re fining the rendering pipeline is to find bottlenecks. These are the
very specific stages of the pipeline that are limiting the overall pipeline.
Usually when discussing rendering pipeline bottlenecks the terms; Fill Rate
limited, CPU limited and Geometry limited are used.
Fill Rate limited: This can be thought of as the
pixel rate. GPUs can write a maximum amount of pixels to its buffers. Various
factors can influence this such as the GPU’s memory architecture. This
limitation occurs at the GPU. Detecting if a rendering pipeline is fill rate
limited is very simple. One just has to increase the display resolution and
monitor the frame rate. If the rate drops at higher resolutions then the
pipeline is fill rate limited. The causes are rasterisation and resolution. For
example textures use up fill rate, the higher the resolution of the texture the
more fill rate is used up. Image based occlusion culling also utilises fill
rate. This is because more objects need to be raterised such as the Oct-Tree
nodes in order to test them for occlusion. Fill rate limitation usually occurs
towards the end of the pipeline.
CPU limited: This is a very general term
that refers to any bottleneck occurring at the CPU stage of the rendering
pipeline. One example of this would be when using a conservative View Frustum
Culling algorithm on AABB with a scene containing thousands of objects. The
time taken by the CPU to perform the culling is stalling the rest of the pipeline.
Detecting CPU limitation again is usually straight forward. Profiling the
application can pinpoint which part of the rendering pipeline calculated by the
CPU is using up the most resources. CPU limitation usually occurs at the
beginning of the pipeline.
Geometry limited: This refers to the maximum
amount of geometry or triangles that can be sent down a rendering pipeline for
a given frame rate. The causes of this limitation are much more difficult to
pinpoint. They can be at CPU, GPU or the bus that links them both. Because of
the nature of this type of bottleneck it can occur more or less anywhere in the
rendering pipeline.
During
every iteration of development the appropriate tools were employed to
find the bottlenecks and remove them. There are various tools on the market
that are can help this process. One example is VTune, this is a tool designed
by Intel that specifically analyses code for their range of CPUs. It produces
information indicating which portions of code the CPU spends most of its time
executing. It also comes with a very specialized compiler that optimizes code
to maximise its performance on the Pentium processor. AMD also have an
equivalent tool. One can use such tools to help speed up the process of finding
bottlenecks. An alternative to using such tools is to time very specific
portions of the rendering pipeline. This was performed for this project. By
doing so, one can determine the exact percentage of the frame rendering time
that is spent at which stage of the pipeline. This can be a very powerful tool
in establishing where to focus attention to. For example the second rendering
pipeline [see Section 4.2] used
conservative view frustum culling on bounding boxes. With the use of specific
timing one could clearly see that too much time in proportion of the pipeline
was spent at this stage. With this in mind the next pipeline concentrated on
optimising this culling procedure with the use of spatial sub division. All of
these tools were used and reused time and time again with each generation of
the rendering pipeline. They where also used to optimize certain parameters
such as the Oct-Tree balancing. The following section of this report will
present in much greater detail some of the results that where achieved all of
which were obtained with such tools.
This section of the report will present
some of the different benchmark results that where obtained when exploring
different algorithms and their implementations as well as some of the effects
different parameters had on the performance. Please note that during the
Development Cycle [see Section 3] the algorithms were changed but so was the
implementation. Different OpenGL routines were used and various GPU specific
extensions were utilized. These extensions add a substantial performance to the
application. Before taking the final benchmarks some of the previous rendering
pipelines were re-implemented using some of the more advanced extensions in
order to be compared more subjectively. This however did change some of the
results that where previously obtained during the earlier development. These
effects will be described in more detail where appropriate. This section will
explore the various pipelines, their performance and their properties. Please
note that camera 2 represents a walkthrough while camera 3 is a worst case.
Camera3 exploits the rendering pipeline’s weaknesses. Camera 1 is a simple path
that consists of a 360 degree rotation about the centre of the scene.
The following is the technical
specification of the system these various benchmarks were run on.
CPU |
AMD Athlon
650Mhz |
GPU |
NVidia GeForce3 |
Driver Revision |
Detonator 28.32 |
RAM |
768 MB SDRAM
100Mhz |
OS |
Windows XP |
HD |
Ultra Wide SCSI
AtlasIV |
BUS |
AGP 2X |
MotherBoard |
ASUS K7M
revision 2 |
Monitor Refresh |
100 HZ |
The PC system used
to conduct the benchmarking was outdated, it did however have a good GPU but
one major bottleneck in this system was the AGP2x bus. This bus not did permit
the GPU to be fully exploited. Other benchmarks were performed on other systems
that were more recent and up to date. The results obtained on these machines
where substantially better. Again this was most probably caused by the fact
that these machines had a much faster bus from the CPU to the GPU.
As mentioned
previously [see Section 4.2.3] it is very important to balance the Oct-Tree. In
this case it was achieved by choosing two thresholds. The first is the maximum
recursion depth and the second is the maximum number of triangles a leaf node
can contain. While constructing the Oct-Tree, a node is subdivided if it
contains more triangles than the maximum threshold and it has not reached the
maximum depth. These two thresholds balance the tree. Benchmarks where taken
with four different threshold values for three given camera paths.
Series
1: Max Depth = 12, Max Triangles = 2000 Pipeline 5
Series
2: Max Depth =32, Max Triangles = 2000 Pipeline 5
Series
3: Max Depth = 46, Max Triangles = 1000 Pipeline 5
Series
4: Max Depth = 32, Max Triangles = 4000 Pipeline 5
As one can see
from the above graphs these values can affect the frame rate quite
substantially. The best threshold values also vary from scene to scene.
Therefore one must select the most appropriate values for a particular scene
individually. These thresholds allow the trade off between Culling Optimization
and Hardware Cache Hits to be set. Greater depth will improve culling both for
the View Frustum Culling and will optimize the Occlusion Culling by using up
less fill rate. However the greater depth will also decrease the number of
cache hits in the GPU when drawing triangles. Both these factors can impact
performance greatly. The Oct-Tree balance affects these two tradeoffs. One
should consider both the hardware platform and the type scene to determine
which values are best suited. It would also be possible to implement a
rendering application that would perform certain pre-processing test that would
determine how to balance the Oct-Tree for a given GPU and a given scene. This
could be achieved simply by running various sets of benchmarks to find the
perfect performance convergence and balance the tree appropriately.
Another threshold
that can have an impact on the frame rate is temporal coherence values used for
the occlusion culling. This value indicates how long a node is considered
visible for after it has passed the occlusion test. Please not that if that
node is culled by the view frustum this assumption is cancelled.
Series 1: No temporal coherence Pipeline 4
Series 2: 3 Frames Pipeline 5
Series 3: 5 Frames Pipeline 5
From these graphs
we can see that using temporal coherence does improve rendering performance.
One must however be very careful. If the user does not navigate the scene very
erratically this value can be relatively large. However if there is a lot of
user navigation then temporal coherence may actually slow down the rendering
pipeline. This is the trade off between small and large thresholds for temporal
coherence. One possible improvement would be to make this value dynamic based
of the user navigation. Should navigational movement increase this value would
decrease and oppositely should the user navigational movement decrease this
value would increase. This would add certain flexibility to the pipeline.
Finally we can compare the
differences between different rendering pipelines.
Series
1: View Frustum Culling on Bounding Boxes Pipeline 2
Series
2: Hierarchical View Frustum Culling Pipeline 3
Series
3: Hierarchical View Frustum Culling Occlusion Culling and the use of Temporal
Coherence of 5 frames Pipeline 5
There
is a problem with these graphs. They seem to indicate that pipeline 3 performs
better than pipeline 5 this should not be the case. The reason for this
contradiction is that to perform these benchmarks pipeline 3 was re-implemented
when doing so, asynchronicity was added between the Hierarchical View Frustum
Culling performed on the CPU and the rest of the stages performed on the GPU
[see Section 4.2]. This enabled the GPU’s pipe to get fuller than in the
pipeline 5 and therefore the GPU kept busy most of the time and its resources
were maximized. This produced superior results to the pipeline 5 implementation
where unfortunately the GPU is stalled waiting for the Occlusion Queries [see
Section 4.2.4] to return. However during the development of pipeline 5 the
initial implementation of pipeline 3 did not contain this asynchronicity and
its performance was significantly inferior to pipeline 5 most of the time.
These results suggest that a new algorithm for the occlusion culling should be
designed or that the current algorithm should be re-implemented to take
advantage of asynchronicity between the CPU and the GPU [see Section 8]. This
would probably yield much better performance than the current implementation.
This report has
examined various techniques to speed up rendering of large and complex
environments. It has emphasised on some of the more subtle parameter tuning
that is required to obtain the most out of a particular algorithm. The
importance of implementation has also been covered. What is the best way to get
the most out of the target platform?
Optimising a rendering pipeline to gain substantial speed improvements
is not a black art. This report has shown that with careful testing and
benchmarking as well as experimentation this can be achieved. The end pipeline
is by no means optimal. There is still a lot of potential to improve
performance further. One great way to achieve substantial gains is it to keep
the GPU pipeline full of data. This would involve introducing asynchronicity
between the CPU and the GPU [see Section 7.4]. One example of this would be to
start sending culled triangles to the GPU before the CPU has finished [see
Section 7.4]. This would effectively overlap some of the rendering pipeline
stages and keep the GPU busy for more of the rendering time. This same principle
could also be applied to the occlusion stages of the pipeline. This kind of
technique should yield some very substantial improvements in performance.
Another way the to further optimize the rendering
pipeline would be to use K-DTrees as replacement to Oct-Trees. The K-DTree has
much more optimal bounding structure than the Oct-Tree. This would introduce
certain questions that would have to be addressed such as how to balance the
K-DTree but it would also improve the View Frustum Culling as well as the
Occlusion Culling. The Occlusion Culling algorithm introduced in this report is
far from optimal. Looking at alternatives would obtain greater insights. As the
report has shown in certain circumstances the Occlusion Culling can reduce the
performance. Developing a rendering pipeline that could dynamically modify
itself depending on the rendering circumstances could potentially achieve
greater overall performance while overcoming the Occlusion Culling weakness.
For example such a pipeline would be able to disable Occlusion Culling if there
were no good Occluders. Another technique that could help speed the pipeline
would be to build an Adjacency Graph with the spatial subdivision data
structure. This would enable the data structure to be traversed in a specified
order and could be used to speed the occlusion algorithm as well as maximise
the performance on the GPU as GPUs perform better when they are given triangles
in a front to back order. These are just some of the ways to further improve on
the work performed for this project. It would also have been beneficial to have
fully explore point based impostors. This would have
improved performance as well as possibly removed some aliasing artefacts. Point
Based Impostor algorithms would be ideal for this type of project.
Figure 23 Screen shot of Millennium Dome
Figure 24 Screen shot of Millennium Dome
Figure 25 Screen shot of Millennium Dome
Figure 26 Screen shot of Millennium Dome
Figure 27 Screen shot of Millennium Dome
Figure 28 Screen shot of Millennium Dome
[1] William
V.Baxter III, Avneesh Sud, Naga K.Govindaraju, Dinesh Manocha, “GigaWalk:
Interactive Walkthrough of Complex Environments”, UNC-CH Technical Report TR02-013
[2] D.Aliaga,
J.Cohen, A.Wilson. H.Zang, C.Erikson, K.Hoff, T.Hudson, W.Stuerzlinger,
E.Baker, R.Bastos, M.Whitton, F.Brooks, D.Manocha, “A Framework for the
Real-Time Walkthrough of Massive Models”, UNC-CH
Technical Report TR# 98-013
[3] Thomas
A.Funkhouser and Carlo H.Sequin, “Adaptive Display Algorithm for Interactive
Frame Rates During Visualization of Complex Visual Environments”,
[4] H.Zang,
D.Manocha, T.Hudson, K E.Hoff III, “Visibility Culling using Hierarchical
Occlusion Maps”, University of North
Carolina, Chapel Hill
[5] Gernot
Schaufler, “Per-Object Image Warping with Layered Impostors”, EUROGRAPHICS Workshop ‘99
[6] Marc
Stamminger, George Drettakis, “Interactive Sampling and Rendering for Complex
and Procedural Geometry”
[7] Michael
Wimmer, Peter Wonka, Fancois Sillion, “Point-Based Impostors for Real-Time
Visualization”
[8] H.Hoppe, “Progressive
Meshes”, Computer Graphics SIGGRAPH ’96
Proceedings
[9] Xavier
Decoret, Grenot Schaufler, Francois Sillion, Julie Dorsey, “Mulit-Layered
impostors for accelerated rendering”, EUROGRAPHICS
‘99
[10]
Scott MacKenzie and Colin Ware, “Lag as a Determinant of Human
Performance in Interactive Systems”, Proceedings
of the ACM Conference on Human Factors in Computing Systems INTERCHI ’93,
488-493
[11] Seth
J.Teller, Carlo H.Sequin, “Visibility Preprocessing For Interactive
Walkthroughs”, ACM Computer Graphics,
Volume 25, Number 4, July 1991
[12] N.Greene,
M.Kass and G.Miller, “Hierarchical Z-buffer Visibility”, Proc of ACM Siggraph, pages 231-238, 1993
[13] J.Airey,
J.Rohlf, and F.Brooks, “Towards image realism with interactive update rates in
complex virtual building environments”,
Symposium on Interactive 3D Graphics, pages 41-50, 1990
[14] D.Blythe and
T.McReynolds, “Programming with OpenGL: Advanced course”, SIGGRAPH ’96 Course Notes, 1996
[15] S R.Ellis,
“Nature and Origins of Virtual Environments: A Bibliography Essay”, Computing Systems in Engineering Vol. 2,
No.4, pages 321-347, 1991
[16] Paeth,
“Graphics Gems 5”, AP Academic Press
[17] Weiss, “Data
Structures & Algorithm Analysis”, Addison
Wesley
[18] Press,
Teukoslky, Vetterling, Flannery, “Numerical Recipes in C”, Camberidge
[19] Foley, van
Dam, Feiner, Hughes, “Computer Graphics Principles and Practice”, Addison Wesley
[20] Angel,
“Interactive Computer Graphics with OpenGL”,
Addison Wesley
[21] Woo, Neider,
Davis, “OpenGL Programming Guide”, Addison
Wesley
[22] Alan Watt,
“3D Computer Graphics 3rd Edition”, Addison Wesley
[23] Mel Slater, Anthony
Steed, Yiorgos Chrysanthou, “Computer Graphics and Virtual Environments”, Addison Wesley
[24] Alan Watt,
Mark Watt, “Advanced Animation and Rendering Techniques Theory and Practise”, Addison Wesley
[25] Tomas Akenine-Moller, Eric
Haines, “Real-Time Rendering”, A.K.
Peters Ltd
[26] J. Oliveira, A.Steed, “Determining Orientation
of Laser Scanned Surfaces”, SIACG 2002, 2-5 Julho 2002
[27] Florent Duguet and George Drettakis, “Robust
Epsilon Visibility”, Proceedings of ACM SIGGRAPH 2002
[28] Seth Jared Teller, “Visibility
Computations in Densely Occluded Polyhedral Environments”,
[29] Szymon Rusinkeiwicz, Marc Levoy,
“QSplat: A Multiresolution Point Rendering System for Large Meshes”¸ SIGGRAPH 2000
[30] D.Cohen-Or, Y.Chrysanthou, and
C.Silva, “A survey of visibility for walkthrough applications.” SIGGRAPH Course Notes #30, 2001
[31] S.Coorg and S.Teller, “Real-time occlusion
culling for models with large occluders.” Proc
of ACM Symposium on Interactive 3D Graphics, 1997
[32] D.Bartz, M.Meibner and T.Huttner,
“OpenGL assisted occlusion culling for large polygonal models”, Computer and Graphics, 23(3):667-6679, 1999
[33] C.Andujar, C.Saona-Vazquez, I.Navazo
and P.Brunet, “Integrating occlusion culling and level of detail through
hardly-visible sets”, Proceedings of
Eurographics, 2000
[34] F.Durand, G.Drettakis, J.Thollot and
C.Puech, “Conservative visibility preprocessing using extended projections”, Proc. Of ACM SIGGRAPH, pages 239-248, 2000
[35] C.Erikson and D.Manocha, “Gaps:
General and Automatic Polygon Simplification”, Proc. Of ACM Symposium on Interactive 3D Graphics, 1999
[36] C.Erikson, D.Manocha and W.Baxter,
“HLODS for fast display of large static and dynamic environments”, Proc of ACM Symposium on Interactive 3D
Graphics, 2001
[37] J.El-Sana, N.Sokolovsky, and
C.Silva, “Integrating occlusion culling with view dependent rendering”, Proc. Of IEEE Visualization, 2001
[38] T.A Funkhouser, D.Khorramabadi,
C.H.Sequin, and S.Teller, “The ucb system for interactive visualization of
large architectural modes”, Presence,
5(1):13-44, 1996
[39] B.Garlick, D. Baum, and J.Winget,
“Interactive Viewing of Large Geometric Databases Using Multiprocessor Graphics
Workstations. SIGGRAPH ’90 Course Notes
(Parallel Algorithms and Architectures for 3D Image Generation), Volume 28,
1990
[40] J.Klowoski and C.Silva. “The
prioritized-layered projection algorithm for Visible set estimation”, IEEE Trans. On Visualization and Computer
Graphics, 6(2):108-123, 2000
[41] I.Wald, P.Slusallek, and C.Benthin,
“Interactive distributed ray-tracing of highly complex models”, Rendering Techniques, p274-285, 2001
[42] P.Wonka, M.Wimmer, and D.Schmalstieg,
“Visibility preprocessing with occluder fusion for urban walkthroughs”, Rendering Techniques, p71-82, 2000
[43] P.Wonka, M.Wimmer, and F.Sillion,
“Instant visibility”, Proc. of
EUROGRAPHICS, 2001
[44] J.Oliveira,
B.Buxton, “Non-linear simplification of scanned models, Numérisation”, 3D -
SCANNING 2002, Paris-France (invited paper)
[45] K.M.Stanney,
R.R.Mourant, and R.s.Kennedy, “Human Factors Issues in Virtual Environments: A
Review of the Literature”
[46] Alan Watt, Fabio
Policarpo, “3D Games Real-time Rendering and Software Technology”, Addison
Wesley
[47] Fuchs, H.,
Kedem, Z.M., Naylor, B. “On visible surface generation by a priori tree
structures”, Proc. SIGGRAPH
'80, Comput. Graph., 14 (1980), 124--133.
[48] Schroder, and Zarge,
and Lorensen, “Decimation of triangle meshes”, Computer Graphics (SIGGRAPH ’92 Proceedings) 26, 2(1992), 65-70
CPU Intel Pentium 3 or higher
GPU NVidia
GeForce 3 or higher
RAM 512
MB
HD 100 MB
Drivers OpenGL 1.2 Drivers
OS Windows 2000, XP
Navigation: Hold left mouse
button and move the mouse to change camera orientation the key A,W,S,D are used
to move the camera left, forwards, backwards, right respectively.
Keys
Z: Stop/Start Camera path recording
X: Play/Stop Camera path
O: Draw Oct-Tree
ESC: Close Application
MassiveMain.cpp
#include
"MassiveMain.h"
#ifdef
_WIN32
#include
<windows.h> // Header File For Windows
#endif
#include
<GL\gl.h> // Header
File For The OpenGL32 Library
#include
<GL\glu.h> //
Header File For The GLu32 Library
#include
<GL\glaux.h> // Header File For The Glaux Library
#include
<GL\glut.h>
#include
<iostream.h>
#include
<ostream.h>
#include
<istream.h>
#include
<fstream.h>
#include
<string.h>
#include
"WGLEXT.h"
#include
"glext.h"
#include
"SceneGraph.h"
#include
"Timer.h"
#include
"JDNCamera.h"
#include
"Octree.h"
//
PFNGLGENOCCLUSIONQUERIESNVPROC glGenOcclusionQueriesNV = NULL;
PFNGLDELETEOCCLUSIONQUERIESNVPROC
glDeleteOcclusionQueriesNV = NULL;
PFNGLISOCCLUSIONQUERYNVPROC glIsOcclusionQueryNV = NULL;
PFNGLBEGINOCCLUSIONQUERYNVPROC glBeginOcclusionQueryNV = NULL;
PFNGLENDOCCLUSIONQUERYNVPROC glEndOcclusionQueryNV = NULL;
PFNGLGETOCCLUSIONQUERYIVNVPROC glGetOcclusionQueryivNV = NULL;
PFNGLGETOCCLUSIONQUERYUIVNVPROC glGetOcclusionQueryuivNV = NULL;
PFNGLGENFENCESNVPROC glGenFencesNV = NULL;
PFNGLDELETEFENCESNVPROC glDeleteFencesNV = NULL;
PFNGLSETFENCENVPROC glSetFenceNV = NULL;
PFNGLTESTFENCENVPROC glTestFenceNV = NULL;
PFNGLFINISHFENCENVPROC glFinishFenceNV = NULL;
PFNGLISFENCENVPROC glIsFenceNV = NULL;
PFNGLGETFENCEIVNVPROC glGetFenceivNV = NULL;
PFNGLVERTEXARRAYRANGENVPROC glVertexArrayRangeNV = NULL;
PFNGLFLUSHVERTEXARRAYRANGENVPROC
glFlushVertexArrayRangeNV = NULL;
PFNWGLALLOCATEMEMORYNVPROC wglAllocateMemoryNV = NULL;
ofstream bfile;
ofstream ofile;
ifstream ifile;
int
g_DisplayListCtr;
int
g_NumNodes;
int
g_NumVisibleNodes;
Octree** pptrNodes;
SceneGraph*
pScene;
JDNCamera* pCamera;
bool bLigting;
int iWidth, iHeight;
TimingInfo timings;
DWORD tmp_tick, last_tick;
int lastX, lastY;
LARGE_INTEGER
freq;
char buffer[50];
Vector4f* pBBox;
int totalVertices;
Octree* pOctree;
bool bDrawOctree;
bool bDrawScene;
GLuint* ptrOcclusionQueries;
GLuint pixelCount;
bool isRecording;
bool isPlayingback;
void
setupExt(void){
// Occlusion query extensions
glGenOcclusionQueriesNV = (PFNGLGENOCCLUSIONQUERIESNVPROC) wglGetProcAddress("glGenOcclusionQueriesNV");
glDeleteOcclusionQueriesNV = (PFNGLDELETEOCCLUSIONQUERIESNVPROC) wglGetProcAddress("glDeleteOcclusionQueriesNV");
glIsOcclusionQueryNV =
(PFNGLISOCCLUSIONQUERYNVPROC) wglGetProcAddress("glIsOcclusionQueryNV");
glBeginOcclusionQueryNV = (PFNGLBEGINOCCLUSIONQUERYNVPROC) wglGetProcAddress("glBeginOcclusionQueryNV");
glEndOcclusionQueryNV =
(PFNGLENDOCCLUSIONQUERYNVPROC) wglGetProcAddress("glEndOcclusionQueryNV");
glGetOcclusionQueryivNV = (PFNGLGETOCCLUSIONQUERYIVNVPROC) wglGetProcAddress("glGetOcclusionQueryivNV");
glGetOcclusionQueryuivNV = (PFNGLGETOCCLUSIONQUERYUIVNVPROC) wglGetProcAddress("glGetOcclusionQueryuivNV");
// Fence Extensions
glGenFencesNV = (PFNGLGENFENCESNVPROC) wglGetProcAddress("glGenFencesNV");
glDeleteFencesNV = (PFNGLDELETEFENCESNVPROC) wglGetProcAddress("glDeleteFencesNV");
glSetFenceNV = (PFNGLSETFENCENVPROC) wglGetProcAddress("glSetFenceNV");
glTestFenceNV = (PFNGLTESTFENCENVPROC) wglGetProcAddress("glTestFenceNV");
glFinishFenceNV = (PFNGLFINISHFENCENVPROC) wglGetProcAddress("glFinishFenceNV");
glIsFenceNV = (PFNGLISFENCENVPROC) wglGetProcAddress("glIsFenceNV");
glGetFenceivNV = (PFNGLGETFENCEIVNVPROC) wglGetProcAddress("glGetFenceivNV");
// VAR extensions
glVertexArrayRangeNV =
(PFNGLVERTEXARRAYRANGENVPROC) wglGetProcAddress("glVertexArrayRangeNV");
glFlushVertexArrayRangeNV
=
(PFNGLFLUSHVERTEXARRAYRANGENVPROC)wglGetProcAddress("glFlushVertexArrayRangeNV");
wglAllocateMemoryNV
= (PFNWGLALLOCATEMEMORYNVPROC)wglGetProcAddress("wglAllocateMemoryNV");
}
static
void
output(
float x, float y, char *string )
{
int len, i;
glRasterPos2f(x, y);
len = (int)
strlen(string);
for (i = 0; i
< len; i++)
{
glutBitmapCharacter(GLUT_BITMAP_8_BY_13,
string[i]);
}
}
void
recordfunc(void){
if ( isRecording
){
//ofile << '\0';
ofile.close();
isRecording
= false;
cout
<< "Stop recording" << endl;
return;
}
if (
isPlayingback ){
cout
<< "Sorry cannot record camera path while playing back"
<< endl;
return;
}
ofile.open("CameraPath.txt");
if (
ofile.is_open() ){
isRecording
= true;
cout
<< "Start recording" << endl;
}
else{
cout
<< "Sorry could not create camera path file" << endl;
isRecording
= false;
}
return;
}
void playbackfunc(int
cam){
if (
isPlayingback ){
isPlayingback
= false;
ifile.close();
bfile.close();
cout
<< "Stop playback" << endl;
return;
}
if ( isRecording
){
cout
<< "Sorry cannot playing back camera path while recording"
<< endl;
return;
}
switch ( cam ){
case 1:
ifile.open("CameraPath1.txt");
bfile.open("Benchmarks.txt");
break;
case 2:
ifile.open("CameraPath2.txt");
bfile.open("Benchmarks.txt");
break;
case 3:
ifile.open("CameraPath3.txt");
bfile.open("Benchmarks.txt");
break;
}
if (
ifile.is_open() && bfile.is_open()){
isPlayingback
= true;
cout
<< "Start playing back " << endl;
}
else{
cout
<< "Could not open camera path file" << endl;
isPlayingback
= false;
}
return;
}
void drawGL(void)
{
glClearColor(0.0,
0.0, 0.0, 0.0);
glClear(GL_COLOR_BUFFER_BIT
);//| GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
int numObjects=0;
pCamera->update();
if (
isPlayingback ){
if (
pCamera->read(ifile) ){
}else{
ifile.close();
bfile.close();
isPlayingback
= false;
cout
<< "Stop Playingback" << endl;
}
}
pCamera->look();
pCamera->m_pFrustum->calculateFrustum();
if ( isRecording
){
pCamera->write(ofile);
}
//if ( bDrawScene
){
// pOctree->NaiveDraw(*(pCamera->m_pFrustum));
//}
g_NumVisibleNodes = 0;
pOctree->CullOctree(pptrNodes,*(pCamera->m_pFrustum));
//QueryPerformanceCounter(
&( timings.m_start_clk ) );
glGenOcclusionQueriesNV(g_NumVisibleNodes,
ptrOcclusionQueries);
glDisable(GL_LIGHTING);
glColorMask(GL_FALSE,
GL_FALSE, GL_FALSE, GL_FALSE);
glDepthMask(GL_FALSE);
if ( bDrawScene
){
for ( int
i=0 ; i<g_NumVisibleNodes ; i++ ){
glBeginOcclusionQueryNV(ptrOcclusionQueries[i]);
// render bounding box for object i
if (
pptrNodes[i]->m_Life == 0 ) pptrNodes[i]->DrawAABB();
glEndOcclusionQueryNV();
}
glEnable(GL_LIGHTING);
glEnable(GL_LIGHT0);
glColorMask(GL_TRUE,
GL_TRUE, GL_TRUE, GL_TRUE);
glDepthMask(GL_TRUE);
glClear(GL_DEPTH_BUFFER_BIT);
for ( int
j=0 ; j<g_NumVisibleNodes ; j++ ){
glGetOcclusionQueryuivNV(ptrOcclusionQueries[j],
GL_PIXEL_COUNT_NV,
&pixelCount);
if (pixelCount
> 50 || pptrNodes[j]->m_Life > 0) {
// render object i
pptrNodes[j]->m_Life--;
pptrNodes[j]->DrawNode();
}
}
}
//glFlushVertexArrayRangeNV();
glDeleteOcclusionQueriesNV(g_NumVisibleNodes,
ptrOcclusionQueries);
//QueryPerformanceCounter(
&( timings.m_end_clk ) );
/*
glPushMatrix();
for ( int i=0 ;
i<pScene->numMaterials ; i++ ){
JDNMaterial mat =
pScene->pMaterials[i];
if (
mat.numGeoObjects > 0 ){
GLfloat no_mat[]
= { 0.0, 0.0, 0.0,
1.0};
GLfloat mat_a[4];
GLfloat mat_d[4];
GLfloat mat_s[4];
GLfloat mat_shine[1];
mat_a[0]
= (GLfloat)mat.ambient.x;
mat_a[1]
= (GLfloat)mat.ambient.y;
mat_a[2]
= (GLfloat)mat.ambient.z;
mat_a[3]
= 1.0;
mat_d[0]
= (GLfloat)mat.diffuse.x;
mat_d[1]
= (GLfloat)mat.diffuse.y;
mat_d[2]
= (GLfloat)mat.diffuse.z;
mat_d[3]
= 1.0;
mat_s[0]
= (GLfloat)mat.specular.x;
mat_s[1]
= (GLfloat)mat.specular.y;
mat_s[2]
= (GLfloat)mat.specular.z;
mat_s[3]
= 1.0;
mat_shine[1]
= 0.0;
glMaterialfv(
GL_FRONT_AND_BACK, GL_AMBIENT, mat_a );
glMaterialfv(
GL_FRONT_AND_BACK, GL_DIFFUSE, mat_d );
glMaterialfv(
GL_FRONT_AND_BACK, GL_SPECULAR, mat_s );
glMaterialfv(
GL_FRONT_AND_BACK, GL_SHININESS,
mat_shine );
glMaterialfv(
GL_FRONT_AND_BACK, GL_EMISSION, no_mat);
if (
true ){
//glEnable(GL_TEXTURE_2D);
for
( int x=0 ; x<mat.numGeoObjects ; x++ ){
JDNGeoObject*
obj = mat.pGeoObjects[x];
float cm[16] = {
obj->currentTM.m[0][0],obj->currentTM.m[0][1],obj->currentTM.m[0][2],obj->currentTM.m[0][3],
obj->currentTM.m[1][0],obj->currentTM.m[1][1],obj->currentTM.m[1][2],obj->currentTM.m[1][3],
obj->currentTM.m[2][0],obj->currentTM.m[2][1],obj->currentTM.m[2][2],obj->currentTM.m[2][3],
obj->currentTM.m[3][0],obj->currentTM.m[3][1],obj->currentTM.m[3][2],obj->currentTM.m[3][3],
};
pBBox = new Vector4f[8];
for ( int q=0 ; q<8 ; q++){
pBBox[q] = obj->currentTM*obj->pBBVertices[q];
}
if ( pCamera->m_pFrustum->isCubeInFrustum(pBBox) ){
numObjects++;
glPushMatrix();
glMultMatrixf(cm);
// -- Draw
BBox --- If selected
/*
if ( obj->drawBB ){
glColor3f(1.0,1.0,1.0);
glBegin(GL_LINES);
glVertex3f(obj->pBBVertices[0].x,obj->pBBVertices[0].y,
obj->pBBVertices[0].z);
glVertex3f(obj->pBBVertices[1].x,obj->pBBVertices[1].y,
obj->pBBVertices[1].z);
glVertex3f(obj->pBBVertices[1].x,obj->pBBVertices[1].y,
obj->pBBVertices[1].z);
glVertex3f(obj->pBBVertices[2].x,obj->pBBVertices[2].y,
obj->pBBVertices[2].z);
glVertex3f(obj->pBBVertices[2].x,obj->pBBVertices[2].y,
obj->pBBVertices[2].z);
glVertex3f(obj->pBBVertices[3].x,obj->pBBVertices[3].y,
obj->pBBVertices[3].z);
glVertex3f(obj->pBBVertices[3].x,obj->pBBVertices[3].y,
obj->pBBVertices[3].z);
glVertex3f(obj->pBBVertices[0].x,obj->pBBVertices[0].y,
obj->pBBVertices[0].z);
glVertex3f(obj->pBBVertices[0].x,obj->pBBVertices[0].y,
obj->pBBVertices[0].z);
glVertex3f(obj->pBBVertices[4].x,obj->pBBVertices[4].y,
obj->pBBVertices[4].z);
glVertex3f(obj->pBBVertices[4].x,obj->pBBVertices[4].y,
obj->pBBVertices[4].z);
glVertex3f(obj->pBBVertices[5].x,obj->pBBVertices[5].y,
obj->pBBVertices[5].z);
glVertex3f(obj->pBBVertices[5].x,obj->pBBVertices[5].y,
obj->pBBVertices[5].z);
glVertex3f(obj->pBBVertices[6].x,obj->pBBVertices[6].y,
obj->pBBVertices[6].z);
glVertex3f(obj->pBBVertices[6].x,obj->pBBVertices[6].y,
obj->pBBVertices[6].z);
glVertex3f(obj->pBBVertices[7].x,obj->pBBVertices[7].y,
obj->pBBVertices[7].z);
glVertex3f(obj->pBBVertices[7].x,obj->pBBVertices[7].y,
obj->pBBVertices[7].z);
glVertex3f(obj->pBBVertices[4].x,obj->pBBVertices[4].y,
obj->pBBVertices[4].z);
glVertex3f(obj->pBBVertices[1].x,obj->pBBVertices[1].y,
obj->pBBVertices[1].z);
glVertex3f(obj->pBBVertices[5].x,obj->pBBVertices[5].y,
obj->pBBVertices[5].z);
glVertex3f(obj->pBBVertices[2].x,obj->pBBVertices[2].y,
obj->pBBVertices[2].z);
glVertex3f(obj->pBBVertices[6].x,obj->pBBVertices[6].y,
obj->pBBVertices[6].z);
glVertex3f(obj->pBBVertices[3].x,obj->pBBVertices[3].y,
obj->pBBVertices[3].z);
glVertex3f(obj->pBBVertices[7].x,obj->pBBVertices[7].y,
obj->pBBVertices[7].z);
glEnd();
}
//
--------------------------------
// ----
Draw Object ---------------
//glBindTexture(GL_TEXTURE_2D, pTextures[i]);
//for ( int j=0 ; j<obj->numFaces ; j++){
glBegin(GL_TRIANGLES);
//glColor3f(mat_d[0],mat_d[1],mat_d[2]);
glNormal3f( obj->pVNormals[obj->pFaces[j*3]].x,
obj->pVNormals[obj->pFaces[j*3]].y,
obj->pVNormals[obj->pFaces[j*3]].z);
//glTexCoord2f(obj->pTVertices[obj->pTFaces[j*3]].x,
obj->pTVertices[obj->pTFaces[j*3]].y);
glVertex3f(obj->pLVertices[obj->pFaces[j*3]].x,
obj->pLVertices[obj->pFaces[j*3]].y,
obj->pLVertices[obj->pFaces[j*3]].z);
glNormal3f( obj->pVNormals[obj->pFaces[j*3+1]].x,
obj->pVNormals[obj->pFaces[j*3+1]].y,
obj->pVNormals[obj->pFaces[j*3+1]].z);
//glTexCoord2f(obj->pTVertices[obj->pTFaces[j*3+1]].x,
obj->pTVertices[obj->pTFaces[j*3+1]].y);
glVertex3f(obj->pLVertices[obj->pFaces[j*3+1]].x,
obj->pLVertices[obj->pFaces[j*3+1]].y,
obj->pLVertices[obj->pFaces[j*3+1]].z);
glNormal3f( obj->pVNormals[obj->pFaces[j*3+2]].x,
obj->pVNormals[obj->pFaces[j*3+2]].y,
obj->pVNormals[obj->pFaces[j*3+2]].z);
//glTexCoord2f(obj->pTVertices[obj->pTFaces[j*3+2]].x, obj->pTVertices[obj->pTFaces[j*3+2]].y);
glVertex3f(obj->pLVertices[obj->pFaces[j*3+2]].x,
obj->pLVertices[obj->pFaces[j*3+2]].y,
obj->pLVertices[obj->pFaces[j*3+2]].z);
glEnd();
glVertexPointer(3,GL_FLOAT,4*sizeof(GLfloat),obj->pLVertices);
glNormalPointer(GL_FLOAT,4*sizeof(GLfloat),obj->pVNormals
);
glDrawElements(GL_TRIANGLES,obj->numFaces*3,
GL_UNSIGNED_INT, obj->pFaces);
glPopMatrix();
}
delete [] pBBox;
}
//glDisable(GL_TEXTURE_2D);
//cout <<
"\nHas Texture";
}
}
}
glPopMatrix();
*/
glDisable(GL_LIGHTING);
if ( bDrawOctree
){
glColor3f(1.0f,1.0f,1.0f);
pOctree->DrawDebug();
}
glLoadIdentity();
gluLookAt(0.0f,0.0f,-11.0f,
0.0f,0.0f,1.0f, 0.0f,1.0f,0.0f);
QueryPerformanceCounter(
&( timings.m_end_clk ) );
timings.m_interval_sum.HighPart =
timings.m_end_clk.HighPart - timings.m_start_clk.HighPart;
timings.m_interval_sum.LowPart = timings.m_end_clk.LowPart - timings.m_start_clk.LowPart;
timings.m_start_clk = timings.m_end_clk;
_gcvt( ( 1.0f/(double)timings.m_interval_sum.LowPart)
*(double)freq.LowPart, 7, buffer );
//_gcvt( (double)totalVertices, 7, buffer );
output(0.0f,0.0f,buffer);
if (
isPlayingback ){
bfile
<< buffer << endl;
}
//cout << "\nNumber of objects
drawn = " << numObjects;
glutSwapBuffers();
}
void
idle(void){
glutPostRedisplay();
}
void
initGL(void)
{
g_NumVisibleNodes=0;
g_NumNodes=0;
g_DisplayListCtr=0;
totalVertices =
0;
pScene = new
SceneGraph();
cout <<
"Reading Scene Graph file \n" << endl;
pScene->read("new.JDN");
cout <<
"Setting Materials \n" << endl;
pScene->SetupMaterials();
cout
<<"Checking for textures\n"<< endl;
pScene->CheckMatForTextures();
cout <<
"\nScene Statistics\n"<< endl;
cout <<
"Num of geometric objects = " << pScene->numGeoObjects
<< "\n"<< endl;
cout <<
"Num of lights =
" << pScene->numLights << "\n"<< endl;
cout <<
"Num of materials =
" << pScene->numMaterials << "\n"<< endl;
for ( int i=0 ;
i<pScene->numGeoObjects ; i++){
totalVertices
+= pScene->pGeoObjects[i].numVertices;
}
pCamera = new
JDNCamera();
pBBox = new
Vector4f[8];
Vector4f centre = pScene->CalculateSceneCentre();
float width =
pScene->CalculateSceneWidth();
cout <<
"Centre = " << centre.x << " " <<
centre.y << " " << centre.z << endl;
cout <<
"Width = " << width;
cout
<<"\n\n" << endl;
pOctree = new
Octree(pScene,0,centre,width);
int* sceneIndex =
new int[pScene->numGeoObjects];
for ( int z=0;
z<pScene->numGeoObjects ; z++){
sceneIndex[z]=z;
};
pOctree->CreateNode(pScene->numGeoObjects
, sceneIndex);
cout <<
"Number of Octree Nodes = " << g_NumNodes << endl;
ptrOcclusionQueries
= new GLuint[g_NumNodes];
pptrNodes = new Octree*[g_NumNodes];
pOctree->CreateAABBDisplayList();
bDrawOctree =
true;
bDrawScene = true;
pScene->SetupVAR();
glEnableClientState(GL_VERTEX_ARRAY_RANGE_NV);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_INDEX_ARRAY);
/* Enable a single OpenGL light. */
GLfloat light0_ambient[]
= { 0.2f, 0.2f, 0.2f,
1.0f};
GLfloat light0_diffuse[]
= { 1.0f, 1.0f, 1.0f,
1.0f};
GLfloat light0_specular[]
= { 1.0f, 1.0f, 1.0f,
1.0f};
GLfloat light0_position[]
= {-120000.0f, 40000.0f, -28000.0f, 1.0f};
GLfloat light0_direction[]
= {-100, -100.0, -100.0};
GLfloat mat_ambient[]
= { 0.7f, 0.7f, 0.7f, 1.0f};
GLfloat mat_diffuse[]
= { 1.0f, 0.5f, 0.8f,
1.0f};
GLfloat mat_specular[]
= { 1.0f, 1.0f, 1.0f,
1.0f};
GLfloat mat_shininess[]
= { 50.0f};
GLfloat no_mat[]
= { 0.0f, 0.0f, 0.0f,
1.0f};
glLightfv(GL_LIGHT0,
GL_AMBIENT, light0_ambient);
glLightfv(GL_LIGHT0,
GL_DIFFUSE, light0_diffuse);
glLightfv(GL_LIGHT0,
GL_SPECULAR, light0_specular);
glLightfv(GL_LIGHT0,
GL_POSITION, light0_position);
//glLightfv(GL_LIGHT0,
GL_SPOT_DIRECTION, light0_direction);
glMaterialfv(
GL_FRONT, GL_AMBIENT, mat_ambient );
glMaterialfv(
GL_FRONT, GL_DIFFUSE, no_mat );
glMaterialfv( GL_FRONT,
GL_SPECULAR, mat_specular );
glMaterialfv(
GL_FRONT, GL_SHININESS, mat_shininess );
glMaterialfv(
GL_FRONT, GL_EMISSION, no_mat);
glEnable(GL_LIGHTING);
glEnable(GL_LIGHT0);
/* Use depth buffering for hidden surface
elimination. */
glEnable(GL_DEPTH_TEST);
glEnable(GL_NORMALIZE);
glEnable(GL_CULL_FACE);
glCullFace(GL_BACK);
/* Setup the view of the cube. */
glMatrixMode(GL_PROJECTION);
gluPerspective( /*
field of view in degree */ 45.0,
/* aspect ratio */ 1.33,
/* Z near */ 10.0, /* Z far */ 250000.0);
QueryPerformanceFrequency(
& freq );
}
void
reshape(int w, int h){
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(
/* field of view in degree */ 45.0,
/* aspect ratio */ w/h,
/* Z near */ 10.0, /* Z far */ 250000.0);
glMatrixMode(GL_MODELVIEW);
glViewport(0, 0,
w, h);
iWidth = w;
iHeight = h;
}
void
cleanUp(){
//if ( pScene )
delete pScene;
if (
bfile.is_open() ) bfile.close();
if (
ofile.is_open() ) ofile.close();
if ( ifile.is_open()
) ifile.close();
}
static
void
key(unsigned
char c, int x, int y)
{
switch (c){
case 27:
cleanUp();
exit(0); /* IRIS GLism, Escape quits. */
break;
case 'w':
pCamera->move(1000.5);
break;
case 's':
pCamera->move(-1000.5);
break;
case 'a':
pCamera->strafe(-1000.5);
break;
case 'd':
pCamera->strafe(1000.5);
break;
case 'o':
bDrawOctree
= !bDrawOctree;
break;
case 'p':
bDrawScene
= !bDrawScene;
break;
case 'z':
recordfunc();
break;
case 'x':
playbackfunc(1);
break;
case 'b':
playbackfunc(1);
break;
case 'n':
playbackfunc(2);
break;
case 'm':
playbackfunc(3);
break;
}
glutPostRedisplay();
}
static
void leftMotion (int x, int y){
float xRot, yRot;
yRot =
(float)(x-lastX)/500.0f;
xRot =
(float)(y-lastY)/500.0f;
Vector4f axis, vuv, vpn;
vuv =
pCamera->VUV();
vpn =
pCamera->VPN();
axis =
Vector4f::cross(vuv,vpn);
pCamera->rotate(-yRot,
Vector4f(0.0f,1.0f,0.0f));
pCamera->update();
pCamera->rotate(xRot,
axis);
lastX = x;
lastY = y;
}
static
void rightMotion (int x, int y){
}
static
void mouseCallback (int button, int state, int x, int y){
if
(state==GLUT_DOWN) {
/* A button is being pressed. Set the correct
motion function */
if
(button==GLUT_LEFT){
lastX
= x;
lastY
= y;
glutMotionFunc
(leftMotion);
}
//else if (button==GLUT_RIGHT)
// glutMotionFunc (rightMotion);
}
}
static
GLboolean QueryExtension(char* extName){
GLubyte* ext = (GLubyte*
)glGetString(GL_EXTENSIONS);
char *p = (char
*)ext;
char *end = p +
strlen(p);
while (p <
end){
int n =
strcspn(p, " ");
if
((strlen(extName)==n) && (strncmp(extName,p,n)==0)){
return
GL_TRUE;
}
p += (n+1);
}
return GL_FALSE;
}
int
main(int argc, char **argv)
{
isRecording = false;
isPlayingback =
false;
glutInit(&argc,
argv);
//glutInitDisplayMode(GLUT_DOUBLE
| GLUT_RGB | GLUT_DEPTH);
glutInitDisplayString("rgb
double depth=32 ");
glutInitWindowSize(640,480);
glutCreateWindow("Jean-Daniel
Nahmias Msc VIVE Massive Model Rendering");
if (
!QueryExtension("GL_NV_occlusion_query") ){
cout
<< "Your Graphics card does not support occlusion extension"
<< endl;
return 0;
}
if (
!QueryExtension("GL_NV_fence") ){
cout
<< "Your Graphics card does not support fence extension"
<< endl;
return 0;
}
if (
!QueryExtension("GL_NV_vertex_array_range") ){
cout
<< "Your Graphics card does not support VAR extension" <<
endl;
return 0;
}
//ofstream bfile = ofstream("b.txt");
//bfile = ofstream("b.txt");
//bfile.open("b.txt");
setupExt();
initGL();
glutDisplayFunc(drawGL);
glutReshapeFunc(reshape);
glutKeyboardFunc(key);
glutMouseFunc(mouseCallback);
glutMotionFunc(leftMotion);
glutIdleFunc(idle);
int ctr=0;
bool add;
cout <<
"Number of Materials = " << pScene->numMaterials <<
endl;
for ( int i=0 ;
i<pScene->numMaterials-1 ; i++){
add =
true;
for ( int
j=(i+1) ; j<pScene->numMaterials ; j++){
if (
pScene->pMaterials[i]==pScene->pMaterials[j] ){
add
= false;
break;
}
}
if ( add )
ctr++;
}
cout <<
"Number of unique Material = " << ctr << endl;
/*
GLubyte* ext = (GLubyte*
)glGetString(GL_EXTENSIONS);
while ( *ext != '\0' ){
char t =
(char)*ext;
ext++;
if ( t !=
' ' ) cout << t; else cout << endl;
}
*/
glutMainLoop();
return 0; /* ANSI C requires main to return
int. */
}
Oct-Tree.h
// Octree.h: interface for the Octree
class.
//
//////////////////////////////////////////////////////////////////////
#ifndef _OCTREE_H
#define _OCTREE_H
#include "MassiveMain.h"
#ifdef _WIN32
#include <windows.h> // Header File For Windows
#endif
#include <GL\gl.h> // Header File For The OpenGL32 Library
#include <GL\glu.h> // Header File For The GLu32 Library
#include <GL\glaux.h> // Header File For The
Glaux Library
#include <GL\glut.h>
#include <iostream.h>
#include "WGLEXT.h"
#include "glext.h"
#include "SceneGraph.h"
#include "JDNGeoObject.h"
#include "Vector4.h"
#include "Matrix4.h"
#include "ViewFrustum.h"
#define MAXRECURSION 32
#define MAXTRIANGLES 2000
extern PFNGLGENOCCLUSIONQUERIESNVPROC glGenOcclusionQueriesNV;
extern PFNGLDELETEOCCLUSIONQUERIESNVPROC glDeleteOcclusionQueriesNV;
extern PFNGLISOCCLUSIONQUERYNVPROC glIsOcclusionQueryNV;
extern PFNGLBEGINOCCLUSIONQUERYNVPROC glBeginOcclusionQueryNV;
extern PFNGLENDOCCLUSIONQUERYNVPROC glEndOcclusionQueryNV;
extern PFNGLGETOCCLUSIONQUERYIVNVPROC glGetOcclusionQueryivNV;
extern PFNGLGETOCCLUSIONQUERYUIVNVPROC glGetOcclusionQueryuivNV;
extern PFNGLGENFENCESNVPROC glGenFencesNV;
extern PFNGLDELETEFENCESNVPROC glDeleteFencesNV;
extern PFNGLSETFENCENVPROC glSetFenceNV;
extern PFNGLTESTFENCENVPROC glTestFenceNV;
extern PFNGLFINISHFENCENVPROC glFinishFenceNV;
extern PFNGLISFENCENVPROC glIsFenceNV;
extern PFNGLGETFENCEIVNVPROC glGetFenceivNV;
extern PFNGLVERTEXARRAYRANGENVPROC glVertexArrayRangeNV;
extern PFNGLFLUSHVERTEXARRAYRANGENVPROC glFlushVertexArrayRangeNV;
extern PFNWGLALLOCATEMEMORYNVPROC wglAllocateMemoryNV;
extern int g_DisplayListCtr;
extern int g_NumNodes;
extern int g_NumVisibleNodes;
enum eOctreeNodes
{
TOP_LEFT_FRONT, // 0
TOP_LEFT_BACK, // 1
TOP_RIGHT_BACK, // etc...
TOP_RIGHT_FRONT,
BOTTOM_LEFT_FRONT,
BOTTOM_LEFT_BACK,
BOTTOM_RIGHT_BACK,
BOTTOM_RIGHT_FRONT
};
class Octree
{
public:
Octree(SceneGraph* pScene, int depth, Vector4f& centre,
float width){
m_pScene
= pScene;
m_depth = depth;
m_centre
= centre;
m_width = width;
m_bSubDivided
= false;
m_BBoxVertices[0] = Vector4f(centre.x - width, centre.y -
width , centre.z - width);
m_BBoxVertices[1] = Vector4f(centre.x + width, centre.y -
width , centre.z - width);
m_BBoxVertices[2] = Vector4f(centre.x + width, centre.y -
width , centre.z + width);
m_BBoxVertices[3] = Vector4f(centre.x - width, centre.y -
width , centre.z + width);
m_BBoxVertices[4] = Vector4f(centre.x - width, centre.y +
width , centre.z - width);
m_BBoxVertices[5] = Vector4f(centre.x + width, centre.y + width
, centre.z - width);
m_BBoxVertices[6] = Vector4f(centre.x + width, centre.y +
width , centre.z + width);
m_BBoxVertices[7] = Vector4f(centre.x - width, centre.y +
width , centre.z + width);
m_Life
= 0;
};
virtual ~Octree(){};
void CreateNode(int numParentObjects, int*
parentObjectsIndices){
int tmp, ctr, tri;
tmp = ctr = tri = 0;
m_NodeID
= g_NumNodes+10000;
g_NumNodes++;
//
Iterate through each of the parent objects and find
the
//
number of object that intersect with the child, add intersecting objects index
//
to child vector and calculate the total number of intersecting triangles with
//
child
for ( int i=0 ; i<numParentObjects ; i++){
tmp =
IntersectNode(m_pScene->pGeoObjects[parentObjectsIndices[i]]);
tri += tmp;
if ( tmp != 0 ){
ctr++;
}
}
m_numIntersectingObjects
= ctr;
m_pObjectIndices
= new int[ctr];
m_pObjectNumTri = new int[ctr];
m_numTriangles
= tri;
ctr = 0;
for ( int j=0 ; j<numParentObjects ; j++){
tmp = IntersectNode(m_pScene->pGeoObjects[parentObjectsIndices[j]]);
if ( tmp != 0 ){
m_pObjectIndices[ctr] = parentObjectsIndices[j];
m_pObjectNumTri[ctr] =
tmp;
ctr++;
}
}
//
If the depth has not reached maximum and number of intersecting triangles
//
is greater than the maximum allowed triangle count for each node
//
subdived node
if ( tri > MAXTRIANGLES && m_depth <
MAXRECURSION ){
SubDivideNode();
}else{
if ( tri > 0 ) AssignFaces();
}
};
void DrawDebug(void){
glBegin(GL_LINES);
glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,
m_BBoxVertices[0].z);
glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,
m_BBoxVertices[1].z);
glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,
m_BBoxVertices[1].z);
glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,
m_BBoxVertices[2].z);
glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,
m_BBoxVertices[2].z);
glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,
m_BBoxVertices[3].z);
glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,
m_BBoxVertices[3].z);
glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,
m_BBoxVertices[0].z);
glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,
m_BBoxVertices[0].z);
glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y, m_BBoxVertices[4].z);
glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,
m_BBoxVertices[4].z);
glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,
m_BBoxVertices[5].z);
glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,
m_BBoxVertices[5].z);
glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,
m_BBoxVertices[6].z);
glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,
m_BBoxVertices[6].z);
glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,
m_BBoxVertices[7].z);
glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,
m_BBoxVertices[7].z);
glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,
m_BBoxVertices[4].z);
glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,
m_BBoxVertices[1].z);
glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,
m_BBoxVertices[5].z);
glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,
m_BBoxVertices[2].z);
glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,
m_BBoxVertices[6].z);
glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,
m_BBoxVertices[3].z);
glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,
m_BBoxVertices[7].z);
glEnd();
if ( m_bSubDivided ){
m_octreeNodes[TOP_LEFT_FRONT]->DrawDebug();
m_octreeNodes[TOP_LEFT_BACK]->DrawDebug();
m_octreeNodes[TOP_RIGHT_BACK]->DrawDebug();
m_octreeNodes[TOP_RIGHT_FRONT]->DrawDebug();
m_octreeNodes[BOTTOM_LEFT_FRONT]->DrawDebug();
m_octreeNodes[BOTTOM_LEFT_BACK]->DrawDebug();
m_octreeNodes[BOTTOM_RIGHT_BACK]->DrawDebug();
m_octreeNodes[BOTTOM_RIGHT_FRONT]->DrawDebug();
}
return;
};
void CullOctree(Octree** pptrNodes, ViewFrustum&
viewFrustum){
//
Check if the current node is in our frustum
if( !viewFrustum.isCubeInFrustum( m_BBoxVertices ) ){
m_Life
= 0;
return;
}
//
Check if this node is subdivided. If so, then we need to recurse and draw it's nodes
if( m_bSubDivided )
{
//
Recurse to the bottom of these nodes and draw the end node's vertices
//
Like creating the octree, we need to recurse through each of the 8 nodes.
m_octreeNodes[TOP_LEFT_FRONT]->CullOctree(pptrNodes,
viewFrustum);
m_octreeNodes[TOP_LEFT_BACK]->CullOctree(pptrNodes,
viewFrustum);
m_octreeNodes[TOP_RIGHT_BACK]->CullOctree(pptrNodes,
viewFrustum);
m_octreeNodes[TOP_RIGHT_FRONT]->CullOctree(pptrNodes,
viewFrustum);
m_octreeNodes[BOTTOM_LEFT_FRONT]->CullOctree(pptrNodes,
viewFrustum);
m_octreeNodes[BOTTOM_LEFT_BACK]->CullOctree(pptrNodes,
viewFrustum);
m_octreeNodes[BOTTOM_RIGHT_BACK]->CullOctree(pptrNodes,
viewFrustum);
m_octreeNodes[BOTTOM_RIGHT_FRONT]->CullOctree(pptrNodes,
viewFrustum);
}
else
{
pptrNodes[g_NumVisibleNodes] = this;
g_NumVisibleNodes++;
}
}
void NaiveDraw(ViewFrustum& viewFrustum){
//
Check if the current node is in our frustum
if( !viewFrustum.isCubeInFrustum( m_BBoxVertices ) ){
return;
}
//
Check if this node is subdivided. If so, then we need to recurse and draw it's nodes
if( m_bSubDivided )
{
//
Recurse to the bottom of these nodes and draw the end node's vertices
// Like creating the octree, we
need to recurse through each of the 8 nodes.
m_octreeNodes[TOP_LEFT_FRONT]->NaiveDraw(viewFrustum);
m_octreeNodes[TOP_LEFT_BACK]->NaiveDraw(viewFrustum);
m_octreeNodes[TOP_RIGHT_BACK]->NaiveDraw(viewFrustum);
m_octreeNodes[TOP_RIGHT_FRONT]->NaiveDraw(viewFrustum);
m_octreeNodes[BOTTOM_LEFT_FRONT]->NaiveDraw(viewFrustum);
m_octreeNodes[BOTTOM_LEFT_BACK]->NaiveDraw(viewFrustum);
m_octreeNodes[BOTTOM_RIGHT_BACK]->NaiveDraw(viewFrustum);
m_octreeNodes[BOTTOM_RIGHT_FRONT]->NaiveDraw(viewFrustum);
}
else
{
DrawNode();
}
};
void DrawAABB(void){
m_Life
= 5;
glCallList(m_NodeID);
}
void CreateAABBDisplayList(void){
if( m_bSubDivided )
{
//
Recurse to the bottom of these nodes and draw the end node's vertices
//
Like creating the octree, we need to recurse through each of the 8 nodes.
m_octreeNodes[TOP_LEFT_FRONT]->CreateAABBDisplayList();
m_octreeNodes[TOP_LEFT_BACK]->CreateAABBDisplayList();
m_octreeNodes[TOP_RIGHT_BACK]->CreateAABBDisplayList();
m_octreeNodes[TOP_RIGHT_FRONT]->CreateAABBDisplayList();
m_octreeNodes[BOTTOM_LEFT_FRONT]->CreateAABBDisplayList();
m_octreeNodes[BOTTOM_LEFT_BACK]->CreateAABBDisplayList();
m_octreeNodes[BOTTOM_RIGHT_BACK]->CreateAABBDisplayList();
m_octreeNodes[BOTTOM_RIGHT_FRONT]->CreateAABBDisplayList();
}
else
{
glNewList(m_NodeID,GL_COMPILE);
glBegin(GL_QUADS);
glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,m_BBoxVertices[0].z);
glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,m_BBoxVertices[3].z);
glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,m_BBoxVertices[2].z);
glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,m_BBoxVertices[1].z);
glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,m_BBoxVertices[0].z);
glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,m_BBoxVertices[1].z);
glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,m_BBoxVertices[5].z);
glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,m_BBoxVertices[4].z);
glVertex3f(m_BBoxVertices[1].x,m_BBoxVertices[1].y,m_BBoxVertices[1].z);
glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,m_BBoxVertices[2].z);
glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,m_BBoxVertices[6].z);
glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,m_BBoxVertices[5].z);
glVertex3f(m_BBoxVertices[2].x,m_BBoxVertices[2].y,m_BBoxVertices[2].z);
glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,m_BBoxVertices[3].z);
glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,m_BBoxVertices[7].z);
glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,m_BBoxVertices[6].z);
glVertex3f(m_BBoxVertices[3].x,m_BBoxVertices[3].y,m_BBoxVertices[3].z);
glVertex3f(m_BBoxVertices[0].x,m_BBoxVertices[0].y,m_BBoxVertices[0].z);
glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,m_BBoxVertices[4].z);
glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,m_BBoxVertices[7].z);
glVertex3f(m_BBoxVertices[4].x,m_BBoxVertices[4].y,m_BBoxVertices[4].z);
glVertex3f(m_BBoxVertices[5].x,m_BBoxVertices[5].y,m_BBoxVertices[5].z);
glVertex3f(m_BBoxVertices[6].x,m_BBoxVertices[6].y,m_BBoxVertices[6].z);
glVertex3f(m_BBoxVertices[7].x,m_BBoxVertices[7].y,m_BBoxVertices[7].z);
glEnd();
glEndList();
}
return;
}
void DrawNode(void){
if ( m_numIntersectingObjects > 0 ){
for ( int i=0 ; i<m_numIntersectingObjects ; i++){
JDNGeoObject*
obj = &(m_pScene->pGeoObjects[m_pObjectIndices[i]]);
JDNMaterial* mat = &(m_pScene->pMaterials[obj->matID]);
int numFaces = m_pObjectNumTri[i];
/*
GLfloat
no_mat[] = { 0.0,
0.0, 0.0, 1.0};
GLfloat
mat_a[4];
GLfloat
mat_d[4];
GLfloat
mat_s[4];
GLfloat
mat_shine[1];
mat_a[0] = (GLfloat)mat->ambient.x;
mat_a[1] = (GLfloat)mat->ambient.y;
mat_a[2] = (GLfloat)mat->ambient.z;
mat_a[3] = 1.0;
mat_d[0] = (GLfloat)mat->diffuse.x;
mat_d[1] = (GLfloat)mat->diffuse.y;
mat_d[2] = (GLfloat)mat->diffuse.z;
mat_d[3] =
1.0;
mat_s[0] = (GLfloat)mat->specular.x;
mat_s[1] = (GLfloat)mat->specular.y;
mat_s[2] = (GLfloat)mat->specular.z;
mat_s[3] = 1.0;
mat_shine[1] = 0.0;
glMaterialfv( GL_FRONT_AND_BACK, GL_AMBIENT, mat_a );
glMaterialfv( GL_FRONT_AND_BACK, GL_DIFFUSE, mat_d );
glMaterialfv( GL_FRONT_AND_BACK, GL_SPECULAR, mat_s );
glMaterialfv( GL_FRONT_AND_BACK, GL_SHININESS, mat_shine );
glMaterialfv( GL_FRONT_AND_BACK, GL_EMISSION, no_mat);
*/
glCallList(mat->id);
glVertexPointer(3,GL_FLOAT,6*sizeof(GLfloat),m_pScene->m_ptrVertices+(obj->m_offset*6));
glNormalPointer(GL_FLOAT,6*sizeof(GLfloat),m_pScene->m_ptrVertices+(obj->m_offset*6)+3);
//glIndexPointer(GL_UNSIGNED_INT,0,m_pGeoObjectsFaceIdx[i]);
glDrawElements(GL_TRIANGLES,numFaces*3, GL_UNSIGNED_INT,
m_pGeoObjectsFaceIdx[i]);
}
}
return;
};
int m_Life;
private:
void SubDivideNode(void){
m_bSubDivided
= true;
//
Create children nodes
m_octreeNodes[TOP_LEFT_FRONT] =
new Octree(m_pScene, ++m_depth, GetCentre(TOP_LEFT_FRONT), m_width/2.0f);
m_octreeNodes[TOP_LEFT_BACK] =
new Octree(m_pScene, ++m_depth, GetCentre(TOP_LEFT_BACK), m_width/2.0f);
m_octreeNodes[TOP_RIGHT_BACK] =
new Octree(m_pScene, ++m_depth, GetCentre(TOP_RIGHT_BACK), m_width/2.0f);
m_octreeNodes[TOP_RIGHT_FRONT] =
new Octree(m_pScene, ++m_depth, GetCentre(TOP_RIGHT_FRONT), m_width/2.0f);
m_octreeNodes[BOTTOM_LEFT_FRONT] = new Octree(m_pScene, ++m_depth, GetCentre(BOTTOM_LEFT_FRONT), m_width/2.0f);
m_octreeNodes[BOTTOM_LEFT_BACK] = new Octree(m_pScene, ++m_depth,
GetCentre(BOTTOM_LEFT_BACK), m_width/2.0f);
m_octreeNodes[BOTTOM_RIGHT_BACK] = new Octree(m_pScene, ++m_depth, GetCentre(BOTTOM_RIGHT_BACK),
m_width/2.0f);
m_octreeNodes[BOTTOM_RIGHT_FRONT] = new Octree(m_pScene, ++m_depth, GetCentre(BOTTOM_RIGHT_FRONT),
m_width/2.0f);
m_octreeNodes[TOP_LEFT_FRONT]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
m_octreeNodes[TOP_LEFT_BACK]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
m_octreeNodes[TOP_RIGHT_BACK]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
m_octreeNodes[TOP_RIGHT_FRONT]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
m_octreeNodes[BOTTOM_LEFT_FRONT]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
m_octreeNodes[BOTTOM_LEFT_BACK]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
m_octreeNodes[BOTTOM_RIGHT_BACK]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
m_octreeNodes[BOTTOM_RIGHT_FRONT]->CreateNode(m_numIntersectingObjects,
m_pObjectIndices);
return;
};
Vector4f
GetCentre(int node){
switch ( node ){
case TOP_LEFT_FRONT:
return Vector4f(m_centre.x-(m_width/2) ,m_centre.y+(m_width/2),
m_centre.z+(m_width/2) );
break;
case TOP_LEFT_BACK:
return Vector4f(m_centre.x-(m_width/2) ,m_centre.y+(m_width/2),
m_centre.z-(m_width/2) );
break;
case TOP_RIGHT_BACK:
return Vector4f(m_centre.x+(m_width/2) ,m_centre.y+(m_width/2),
m_centre.z-(m_width/2) );
break;
case TOP_RIGHT_FRONT:
return Vector4f(m_centre.x+(m_width/2) ,m_centre.y+(m_width/2),
m_centre.z+(m_width/2) );
break;
case BOTTOM_LEFT_FRONT:
return Vector4f(m_centre.x-(m_width/2) ,m_centre.y-(m_width/2),
m_centre.z+(m_width/2) );
break;
case BOTTOM_LEFT_BACK:
return Vector4f(m_centre.x-(m_width/2) ,m_centre.y-(m_width/2),
m_centre.z-(m_width/2) );
break;
case BOTTOM_RIGHT_BACK:
return Vector4f(m_centre.x+(m_width/2) ,m_centre.y-(m_width/2), m_centre.z-(m_width/2)
);
break;
case BOTTOM_RIGHT_FRONT:
return Vector4f(m_centre.x+(m_width/2) ,m_centre.y-(m_width/2),
m_centre.z+(m_width/2) );
break;
}
return Vector4f(6.666f,6.666f,6.666f);
};
void AssignFaces(void){
m_pGeoObjectsFaceIdx
= new GLuint*[m_numIntersectingObjects];
//
Allocate memory for FaceSet of each intersecting
object
for ( int i=0 ; i<m_numIntersectingObjects ; i++){
m_pGeoObjectsFaceIdx[i]
= new GLuint[m_pObjectNumTri[i]*3];
}
Vector4f*
verts = new Vector4f[3];
for ( int j=0 ; j<m_numIntersectingObjects ; j++){
JDNGeoObject*
obj = &(m_pScene->pGeoObjects[m_pObjectIndices[j]]);
int ctr=0;
for ( int k=0 ; k<obj->numFaces ; k++ ){
//
If any of the current face's vertex is in node add to the node's faceSet
/*
if (
isVertexInNode(obj->pVertices[obj->pFaces[k*3]]) ||
isVertexInNode(obj->pVertices[obj->pFaces[k*3+1]])
||
isVertexInNode(obj->pVertices[obj->pFaces[k*3+2]])
) {
m_pGeoObjectsFaceIdx[j][ctr] = obj->pFaces[k*3];
m_pGeoObjectsFaceIdx[j][ctr+1] = obj->pFaces[k*3+1];
m_pGeoObjectsFaceIdx[j][ctr+2] = obj->pFaces[k*3+2];
ctr += 3;
}
*/
verts[0] = obj->pVertices[obj->pFaces[k*3]];
verts[1] = obj->pVertices[obj->pFaces[k*3+1]];
verts[2] = obj->pVertices[obj->pFaces[k*3+2]];
if ( faceIntersectNode(verts) ){
m_pGeoObjectsFaceIdx[j][ctr] =
obj->pFaces[k*3];
m_pGeoObjectsFaceIdx[j][ctr+1] = obj->pFaces[k*3+1];
m_pGeoObjectsFaceIdx[j][ctr+2] = obj->pFaces[k*3+2];
ctr += 3;
}
}
}
delete [] verts;
};
//
Test object against node and return the number of intersecting triangles
int IntersectNode(JDNGeoObject& obj){
int num=0;
Vector4f*
verts = new Vector4f[3];
for ( int j=0 ; j<obj.numFaces ; j++){
/*
if ( isVertexInNode(obj.pVertices[obj.pFaces[j*3]]) ){
num++;
continue;
}
if ( isVertexInNode(obj.pVertices[obj.pFaces[j*3+1]]) ){
num++;
continue;
}
if ( isVertexInNode(obj.pVertices[obj.pFaces[j*3+2]]) ){
num++;
continue;
}
*/
verts[0] = obj.pVertices[obj.pFaces[j*3]];
verts[1] = obj.pVertices[obj.pFaces[j*3+1]];
verts[2] = obj.pVertices[obj.pFaces[j*3+2]];
if ( faceIntersectNode(verts) ) num++;
}
delete [] verts;
return num;
};
bool faceIntersectNode(Vector4f* verts){
//
Test to see if any verts are in Node
if ( isVertexInNode(verts[0]) ) return true;
if ( isVertexInNode(verts[1]) ) return true;
if ( isVertexInNode(verts[2]) ) return true;
/*
//
Test each edge of triangle
Vector4f
quad[3];
Vector4f
e1 = verts[1]
- verts[0];
Vector4f
e2 = verts[2]
- verts[0];
Vector4f
e3 = verts[2]
- verts[1];
float t = 0.0f;
//
Test edges against 1st quad
quad[0]
= m_BBoxVertices[2];
quad[1]
= m_BBoxVertices[3];
quad[2]
= m_BBoxVertices[0];
t
= JDNGeoObject::quadRay(verts[0], e1, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[0], e2, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[1], e3, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
//
Test edges against 2nd quad
quad[0]
= m_BBoxVertices[6];
quad[1] = m_BBoxVertices[5];
quad[2]
= m_BBoxVertices[4];
t
= JDNGeoObject::quadRay(verts[0], e1, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[0], e2, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[1], e3, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
//
Test edges against 3rd quad
quad[0]
= m_BBoxVertices[6];
quad[1]
= m_BBoxVertices[2];
quad[2]
= m_BBoxVertices[1];
t
= JDNGeoObject::quadRay(verts[0], e1, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[0], e2, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[1], e3, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
//
Test edges against 4th quad
quad[0]
= m_BBoxVertices[5];
quad[1]
= m_BBoxVertices[1];
quad[2]
= m_BBoxVertices[0];
t
= JDNGeoObject::quadRay(verts[0], e1, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[0], e2, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[1], e3, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
//
Test edges against 5th quad
quad[0]
= m_BBoxVertices[4];
quad[1]
= m_BBoxVertices[0];
quad[2]
= m_BBoxVertices[3];
t
= JDNGeoObject::quadRay(verts[0], e1, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[0], e2, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[1], e3, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
//
Test edges against 6th quad
quad[0]
= m_BBoxVertices[7];
quad[1]
= m_BBoxVertices[3];
quad[2]
= m_BBoxVertices[2];
t
= JDNGeoObject::quadRay(verts[0], e1, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[0], e2, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
t
= JDNGeoObject::quadRay(verts[1], e3, quad);
if ( (t != 0.0f) && (t <= 1.0f) && (t
>= -1.0f) ){
return true;
}
//
Test each diagonal against face
//delete
[] quad;
*/
return false;
}
//
This function test a vertex against Node
bool isVertexInNode(Vector4f& vert){
if ( (vert.x >
(m_centre.x+m_width)) ||
(vert.y
> (m_centre.y+m_width)) ||
(vert.z
> (m_centre.z+m_width)) ||
(vert.x
< (m_centre.x-m_width)) ||
(vert.y
< (m_centre.y-m_width)) ||
(vert.z
< (m_centre.z-m_width)) ) return
false;
else return true;
};
GLuint** m_pGeoObjectsFaceIdx; // Index of
intersecting faces
bool m_bSubDivided; //
Is the Node subdivide i.e. does it have children
int m_displayListID; //
Display List ID
int m_numTriangles; //
Total number of intersecting triangles
int m_numIntersectingObjects; // Number of intersecting
objects
int m_depth; //
Depth of node
Vector4f m_centre; //
Coords of centre of node
Vector4f m_BBoxVertices[8]; // Vertices of
each nodes corners
float m_width; //
Width from centre to Node's boundry
Octree* m_octreeNodes[8]; //
Pointer to Node children
SceneGraph*
m_pScene; //
Pointer to scene graph
int* m_pObjectIndices; //
Intersecting Object indicex
int* m_pObjectNumTri; //
Number of intersecting triangles for each intersecting object
int* m_pDisplayListID;
GLuint** m_Indices;
GLfloat** m_Vertices;
int m_NodeID;
};
#endif