Point-Based Rendering for City-Scale Models

State of the art 3D reconstruction systems are capable of reproducing city-scale dense point-cloud representations of famous landmarks and urban landscapes from an almost infinite library of photographs on the web. However, one of the pressing issues with such large data-sets is the ability to render these points, in the order of millions, in a real-time, interactive and visually compelling manner. Many point-based rendering systems have been implemented to tackle these issues, each with their own specific pros and cons, based on the domain of application.

QSplat [1] is an interactive point rendering system for large meshes, primarily obtained from laser scans (such as those from the Digital Michelangelo Project). By using a hierarchy of bounding spheres, QSplat creates a tree representation of the point-cloud which is subsequently utilized for rendering tasks such as level-of-detail control and visibility culling. The major advantage of the QSplat rendering scheme is the fact that the LOD can by dynamically altered on the fly, based on a target framerate. By maintaining a tradeoff between visual quality and rendering speed, the system allows a user to interact with the models in real-time. When the user ceases to interact, the static frame is progressively rendered at the highest level of detail. One of the main drawbacks of this approach is that it puts high load on the CPU and also on the GPU memory bus in order to render complex primitives such as spheres and ellipsoids, which provide good visual quality at the cost of rendering performance, therefore resulting in poor level-of-detail for interaction.

Recent improvements in graphics hardware and programming models allow for point-based geometry to be efficiently rendered on the Graphics Processing Unit (GPU), with minimal computation and data-transfer from the Central Processing Unit (CPU). One such hardware-accelerated rendering method is described by Botsch et. al. in [2], uses the programmable graphics hardware to delegate computationally expensive rendering tasks to the GPU, reducing the need for data transfer and expensive CPU cycles. Their implementation utilizes the in-built NV_POINT_SPRITE extension to the OpenGL framework, to rasterize gaussian-filtered splats on the graphics hardware, resulting in improved performance over similar software-based methods.

The basic idea of our approach is to combine the two methods described above, by implementing GPU-based Splatting as a primitive for the QSplat rendering system. We began with a detailed performance analysis of the base QSplat system to determine bottlenecks and potential areas for improvement. Our first observation pertained to QSplat's use of a Memory Mapped data file, which dynamically paged pertinent parts of the LOD-tree from disk, based on current viewpoint. By eliminating memory-mapping and loading the entire data-file into RAM, we were able to achieve a 2-3% performance improvement at the cost of a significantly larger memory footprint.

For complex geometry primitives, such as ellipses and spheres we observed that about 30-35% of the running time was spent in computing and drawing texture-mapped polygons (GL_TRIANGLE_STRIP) to form the desired shape. This included modelview (camera to world) transformations for occlusion and calculating intermediate texture co-ordinates for pasting the shapes to the polygon.

The last major bottleneck was found to be in the tree traversal, which is run entirely on a single-core, single thread CPU process. Regardless of the efficiency of the primitive rendering, the tree traversal would certainly serve as the lower-bound to the overall system performance.

By interpreting the results of the profiling, supported by some rough performance calculations, we were able to determine that we would possibly achieve interactive frame-rates without loss in visual quality by replacing the complex geometry rendering with GPU-based splatting, via point sprites, which pushes a single vertex, normal, radius and texture to the graphics pipeline and subsequently expands the point to form a splat on-hardware. To preserve visual quality, we add a gaussian texture (with an alpha fall-off) to the face of the sprite, to result in a smoothly blended and anti-aliased surfaces.

Currently we have implemented an extension to the QSplat system, in the form of an extra GPU-based rendering primitive. This can be activated from the regular QSplat user-interface via the "Driver" menu option. Our implementation can be roughly divided into three major parts:

Our rendering primitive utilizes the GL_POINT_SPRITE_ARB extension of the OpenGL Framework. This extension is widely supported across platforms and hardware architectures. We programmatically generate an alpha texture-map with a gaussian fall-off as our base texture, to be applied on every sprite. We utilize the glPointParameterfARB() function to modulate various point-sprite parameters such as distance attenuation, minimum and maximum sizes and fade threshold. We also enable alpha-blending along with the aforementioned texture-map, which is then uniformly applied to all point sprites. It's important to note that we disable GL_DEPTH_TEST, since we have a manual scheme for efficient depth-sorting (described below)

In our render loop, we first compute the depth of each point from the camera and test it against a threshold, in order to partition the points into two sets, near and far. We proceed to render the far points (which would serve as a background for blending) without any form of sorting. Though this is an approximate approach, the effect of the unsorted pixels is diminished as they lie in the far background. For the near (foreground) pixels, we proceed to perform O(n) radix sort, based on their depths and then render them over the background pixels to obtain a faithful alpha-blended result. This radix sort is performed over the quantized distance of the points from the viewer. Also, the point-sprite size is varied based on the diameter of the LOD bounding sphere. A sparse cloud would result in a large gaussian sprite and vice versa, buy the following quadratic function, where r is the radius and a, b, and c are empirically determined, based on the density and scale of the underlying data-set.

Due to the exploratory nature of the project, during it's course, we ran into multiple failed approaches, some of which did not provide any significant performance or quality improvement and some which actually performed worse than the original QSplat implementation. We list some of these below for posterity and future consideration:

Geometry Shaders: Geometry Shaders are a relatively new addition to graphics hardware, which allows extremely efficient geometry synthesis and modification on the GPU. We attempted to leverage Geometry Shaders to rasterize a point into an elliptical splat by programmatically adding vertices in the shader program. However, due to the their recent advent, our development hardware did not support these shaders. This approach would also limit the use/distribution of the system to only very recent hardware.

Two-pass alpha blending: In order to correctly render the gaussian splats, we required a correct blend order, which can be achieved by two-pass rendering or by manual depth sorting. Upon investigation, we found that two-pass rendering was slower than a manual depth sort, for the given data set. Moreover, the two pass based approach relies on a z-epsilon parameter, which determines the depth of splats that are alpha blended correctly. However, for a city-view based application like ours, this parameter is not just reasonably small. This results still in artifacts if this z-epsilon is not changed correctly. After attempting STL's in-built sort function, we were able to obtain optimal performance by writing our own radix-based depth sorting.

Aliasing due to Radix sort: Radix sort's inability to sort floating-point numbers in linear time resulted in the creation of sorted buckets, which contained multiple unsorted points within a small neighborhood. This incomplete sorting manifested itself visually as an aliasing artifact, especially apparent when moving the camera position (as shown in below video). However, the visual artifact seemed like a reasonable trade-off in order to avail the linearity of radix sort. The effect can be easily mitigated by multiplying the depth function by a factor 1000 to achieve higher precision in the radix sort.

Results and Conclusion
Using the above approach, we implemented a hardware based fuzzy gaussian point sprite splat primitive which demonstrates improved quality at a lower cost (higher, interactive frame-rates). In comparison, the splat is visually more appealing than rendering points alone and is comparable to using highest-quality spherical/elliptical splats. Also, the frame-rate is higher than rendering elliptical splats alone. Apart from the improved system, this project has helped us expand our knowledge about the capabilities of modern GPUs, OpenGL extensions and the landscape of point-based rendering research.

	Points	Gaussian	Gl_Points	Ellipses	Spheres
St. Peters	3,281,560	0.564s	0.333s	0.659s	>10s
Dubrovnik	2,750,112	0.320s	0.254s	0.536s	>10s

Performance comparison of various primatives for view-dependent static rendering.

Side-by-side quality comparison of Gaussian, Point and Elliptical splats.

Future work
Future work for this project includes parallelizing the LOD tree traversal, normal-based warping of gaussian splats, storing the vertices (and associated data) in the GPU vertex buffer objects (passing only the indices), rendering in front to back order with intelligent occlusion culling and making full use of the rapidly evolving GPU architecture and programming model as they evolve over time.

[1] Rusinkiewicz, S. and Levoy, M. 2000. QSplat: a multiresolution point rendering system for large meshes. In Proceedings of the 27th Annual Conference on Computer Graphics and interactive Techniques International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00)