Point-Based Rendering for City-Scale Models

CSE557 Final Project
Aditya Sankar and Paramjit Singh Sandhu
(aditya, paramsan @cs.washington.edu)


                   
The old city of Dubrovnik                                                                        Saint Peters Basilica

Problem/Goal
State of the art 3D reconstruction systems are capable of reproducing city-scale dense point-cloud representations of famous landmarks and urban landscapes from an almost infinite library of photographs on the web. However, one of the pressing issues with such large data-sets is the ability to render these points, in the order of millions, in a real-time, interactive and visually compelling manner. Many point-based rendering systems have been implemented to tackle these issues, each with their own specific pros and cons, based on the domain of application.

Our goal in this project is to build upon existing point-based rendering systems in an attempt to improve the overall rendering quality without compromising on performance and real-time interactivity, by incorporating the idea of GPU-based splatting.

Related Work
QSplat: A multiresolution Point Rendering System for Large Meshes
QSplat [1] is an interactive point rendering system for large meshes, primarily obtained from laser scans (such as those from the Digital Michelangelo Project). By using a hierarchy of bounding spheres, QSplat creates a tree representation of the point-cloud which is subsequently utilized for rendering tasks such as level-of-detail control and visibility culling. The major advantage of the QSplat rendering scheme is the fact that the LOD can by dynamically altered on the fly, based on a target framerate. By maintaining a tradeoff between visual quality and rendering speed, the system allows a user to interact with the models in real-time. When the user ceases to interact, the static frame is progressively rendered at the highest level of detail. One of the main drawbacks of this approach is that it puts high load on the CPU and also on the GPU memory bus in order to render complex primitives such as spheres and ellipsoids, which provide good visual quality at the cost of rendering performance, therefore resulting in poor level-of-detail for interaction.

GPU-based Splatting on Modern GPUs
Recent improvements in graphics hardware and programming models allow for point-based geometry to be efficiently rendered on the Graphics Processing Unit (GPU), with minimal computation and data-transfer from the Central Processing Unit (CPU). One such hardware-accelerated rendering method is described by Botsch et. al. in [2], uses the programmable graphics hardware to delegate computationally expensive rendering tasks to the GPU, reducing the need for data transfer and expensive CPU cycles. Their implementation utilizes the in-built NV_POINT_SPRITE extension to the OpenGL framework, to rasterize gaussian-filtered splats on the graphics hardware, resulting in improved performance over similar software-based methods.

Our Approach
The basic idea of our approach is to combine the two methods described above, by implementing GPU-based Splatting as a primitive for the QSplat rendering system. We began with a detailed performance analysis of the base QSplat system to determine bottlenecks and potential areas for improvement. Our first observation pertained to QSplat's use of a Memory Mapped data file, which dynamically paged pertinent parts of the LOD-tree from disk, based on current viewpoint. By eliminating memory-mapping and loading the entire data-file into RAM, we were able to achieve a 2-3% performance improvement at the cost of a significantly larger memory footprint.

For complex geometry primitives, such as ellipses and spheres we observed that about 30-35% of the running time was spent in computing and drawing texture-mapped polygons (GL_TRIANGLE_STRIP) to form the desired shape. This included modelview (camera to world) transformations for occlusion and calculating intermediate texture co-ordinates for pasting the shapes to the polygon.

The last major bottleneck was found to be in the tree traversal, which is run entirely on a single-core, single thread CPU process. Regardless of the efficiency of the primitive rendering, the tree traversal would certainly serve as the lower-bound to the overall system performance.

By interpreting the results of the profiling, supported by some rough performance calculations, we were able to determine that we would possibly achieve interactive frame-rates without loss in visual quality by replacing the complex geometry rendering with GPU-based splatting, via point sprites, which pushes a single vertex, normal, radius and texture to the graphics pipeline and subsequently expands the point to form a splat on-hardware. To preserve visual quality, we add a gaussian texture (with an alpha fall-off) to the face of the sprite, to result in a smoothly blended and anti-aliased surfaces.

Current Progress
Currently we have implemented an extension to the QSplat system, in the form of an extra GPU-based rendering primitive. This can be activated from the regular QSplat user-interface via the "Driver" menu option. Our implementation can be roughly divided into three major parts:

Setup
Our rendering primitive utilizes the GL_POINT_SPRITE_ARB extension of the OpenGL Framework. This extension is widely supported across platforms and hardware architectures. We programmatically generate an alpha texture-map with a gaussian fall-off as our base texture, to be applied on every sprite. We utilize the glPointParameterfARB() function to modulate various point-sprite parameters such as distance attenuation, minimum and maximum sizes and fade threshold. We also enable alpha-blending along with the aforementioned texture-map, which is then uniformly applied to all point sprites. It's important to note that we disable GL_DEPTH_TEST, since we have a manual scheme for efficient depth-sorting (described below)

Render Loop
In our render loop, we first compute the depth of each point from the camera and test it against a threshold, in order to partition the points into two sets, near and far. We proceed to render the far points (which would serve as a background for blending) without any form of sorting. Though this is an approximate approach, the effect of the unsorted pixels is diminished as they lie in the far background. For the near (foreground) pixels, we proceed to perform O(n) radix sort, based on their depths and then render them over the background pixels to obtain a faithful alpha-blended result. This radix sort is performed over the quantized distance of the points from the viewer. Also, the point-sprite size is varied based on the diameter of the LOD bounding sphere. A sparse cloud would result in a large gaussian sprite and vice versa, buy the following quadratic function, where r is the radius and a, b, and c are empirically determined, based on the density and scale of the underlying data-set.


End of Render Loop
We disable all the extensions and clear the buffers which contained the depth information for the splats.

Flow Diagram
Flow Diagram for Rendering Algorithm

Some Failed Approaches
Due to the exploratory nature of the project, during it's course, we ran into multiple failed approaches, some of which did not provide any significant performance or quality improvement and some which actually performed worse than the original QSplat implementation. We list some of these below for posterity and future consideration:





                   
Aliasing due to Radix Sort                                                                Holes due to slow STL sort


Results and Conclusion
Using the above approach, we implemented a hardware based fuzzy gaussian point sprite splat primitive which demonstrates improved quality at a lower cost (higher, interactive frame-rates). In comparison, the splat is visually more appealing than rendering points alone and is comparable to using highest-quality spherical/elliptical splats. Also, the frame-rate is higher than rendering elliptical splats alone. Apart from the improved system, this project has helped us expand our knowledge about the capabilities of modern GPUs, OpenGL extensions and the landscape of point-based rendering research.



Points Gaussian Gl_Points Ellipses Spheres
St. Peters 3,281,560 0.564s 0.333s 0.659s >10s
Dubrovnik 2,750,112 0.320s 0.254s 0.536s >10s

Performance comparison of various primatives for view-dependent static rendering.


Splat Quality

Side-by-side quality comparison of Gaussian, Point and Elliptical splats.

Future work
Future work for this project includes parallelizing the LOD tree traversal, normal-based warping of gaussian splats, storing the vertices (and associated data) in the GPU vertex buffer objects (passing only the indices), rendering in front to back order with intelligent occlusion culling and making full use of the rapidly evolving GPU architecture and programming model as they evolve over time.

Acknowledgements
We acknowledge the help and guidance we received for this project from Brian Curless, Alex Colburn, Yasutaka Furukawa, the PhotoTourism/Rome in a Day teams for providing the datasets and finally the QSplat team for making their code available for research use.

References
[1] Rusinkiewicz, S. and Levoy, M. 2000. QSplat: a multiresolution point rendering system for large meshes. In Proceedings of the 27th Annual Conference on Computer Graphics and interactive Techniques International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00)

[2] Mario Botsch, Leif Kobbelt, "High-Quality Point-Based Rendering on Modern GPUs," pg, pp.335, 11th Pacific Conference on Computer Graphics and Applications (PG'03), 2003