![]()
Abstract | Pictures | MPeg movies | Sparse Grids | Parallelization | Results | Papers | See also
The ever growing size of data sets resulting from industrial and scientific simulations and measurements have created the need to employ multi-resolution techniques for both analysis speedup and data reduction. Among the most sophisticated approaches are wavelets and sparse grids. Recently, the best of both worlds have been merged by using wavelet bases in the sparse grid representation of multi-resolution data sets.
New algorithms that work entirely on sparse grids can create data sets that cannot be handled on uniform grids any more due to their size. On the other hand, most visualization techniques are only able to display uniform grids. As interpolation on sparse grids is a complicated and time consuming process, direct volume visualization is unthinkable for bigger data sets until the underlying interpolation is accelerated by some orders of magnitude. However, quite a number of super computers and PC clusters exist nowadays, that can be used for parallelization. By streaming the data sets and the resulting images from and to the end user's workstation, scientists can utilize high processing power without leaving the office.
Parallelizing visualization techniques rises the necessity to balance the computational load. Additionally, for time consuming rendering methods previews are useful for the user. Both generating preview images and load balancing are performed explicitly in most cases. We approach these problems by applying a special pixel rendering sequence which achieves superb results implicitly without generating communication overhead.
For interpolation on sparse grids, a hierarchy of basis functions is used, where some functions are defined on the entire grid. For interpolation all basis functions that are accessed during the hierarchy traversal have to be evaluated. On the contrary, the tri-linear interpolation on full grids only needs 8 basis functions, independend from the grid size. Thus, interpolation is much more expensive on sparse grids than on full grids.
A short introduction into Sparse Grids is given on our former webpage about Sparse Grid visualization.
By using MPI the parallelization process itself is relatively straight-forward, spreading the rays across the available processors in a domain decomposition scheme. Memory management is not really an issue, as sparse grids need only very little data space and can thus be replicated throughout the cluster.
A key problem that is noteworthy is that scientists are often unable to work at the front-end nodes of the cluster directly. Thus, the rendered data has to be streamed to the users' workstations. This is done by a dedicated communication node (typically not all nodes have direct internet connection), that collects incoming ray data and serves the TCP stream. In the meantime the workstation can generate preview images from early rendered rays.
As the clusters are often shielded by firewalls, ssh tunneling may be required. This seems to be a horrible bottleneck, but in fact the interpolation process on sparse grids is so computational intensive, that slow communication is not hindering the visualization process.
With replicated data sets the distribution of rays among the nodes can be chosen freely. Usually, a 'master' node selects by some scheme which node shall render which ray and sends new orders, when a job has finished. However, when several nodes finish their job at the same time, the lag between delivering rays and getting new job data can reduce the rendering speed significantly. Implicit assignment of rays prevents any additional communication overhead and reduces the idle time between rendered rays to the time needed to calculate the next ray assignment.
The parallelization version has been tested both on a set of workstations with a TCP/IP implementation of MPI (LAM) and on the new PC cluster Kepler of the University of Tübingen. This cluster consists of 96 dual PIII nodes connected with Myrinet, and two additional front-end nodes. The results were streamed to the University of Stuttgart. All rendering times presented here include the communication lag, which off course affects the rendering speedup significantly. The visualization of the incoming ray data is performed in a sparse grid visualization toolkit that effectively hides the parallelization technique from the user.
As one can see in Figure 3, the system scales almost perfectly with the number of processors. Load balancing works also extremely well for a system that does not require any additional communication at all.
We found that being able to generate previews completely eliminates the need to reduce the image resolution e.g. for finding good views of the volume. As soon as one is satisfied with image precision, the rendering process is interrupted and a new view can be set. In Figure 2 different stages of this process can be seen.