Large-Scale Data Visualization Using
Parallel Data Streaming
James Ahrens and Kristi Brislawn, Los Alamos National Laboratory.
Ken Martin, Berk Geveci, and C. Charles Law Kitware. Michael Papka Argonne National Laboratory.
1
● In engineering and product design, Computer simulation continues to replace physical prototypes, resulting in reduced design cycle times and costs.
2
Why Computer Simulation ?
Agenda
● Why parallel Computing ?
● Why Data Streaming ?
● How Does It Work ?
● Mixed Topologies
● Supporting Parallelism
● Sort-Last Parallel Rendering
● Result
● Conclusion
3
Glossary :
Visualization Pipeline : Streamlines :
Message Passing Interfaces (MPI’s) : Data Streaming :
4
Visualization pipeline :
★ The visualization pipeline describes the way from data acquisition to picture generation
★ The role of the visualization pipeline is to transform information into graphical data
5
Streamlines :
A streamline is a line that is tangent to the velocity vector at every point
Eg. Streamlines in the astrophysics dataset seeded outside the
proto-neutron star illustrate the nature of the complex magnetic field inside the supernova shock front
6
Message Passing interface (MPI) :
Message Passing Interface (MPI) is a standardized and portable
message-passing system developed for distributed and parallel computing
7
Why Parallel Computing ?
● In the big data era, datasets can grow in enormous sizes and therefore it may be impossible to load them into a single machine.
8
●
In parallel computing, such datasets can take advantage of multiple computer machines in order to load them in a distributed fashion, by partitioning them.Systems Supporting Parallel Computing :
1. Open Data Explorer(OpenDX)
2. Application Visualization System(AVS) 3. Demand Driven Visualizer(DDV)
4. SCIRun
9
These Systems Provide a pipeline infrastructure and can support Parallel execution
OpenDX and AVS
● OpenDX(Formerly IBM Data Explorer) and AVS are Dataflow-based visualization systems,providing numerous visualization and analysis algorithms for their users
10
● Both Systems’ architectures rely on a centralized executive to some degree to initiate modules, allocate memory, and execute modules.
SCIRun
● SCIRun is a dataflow-based simulation and visualization system that supports interactive computational steering.
11
● SCIRun provides threaded-task and data parallelism on shared-memory multiprocessors.
● An extension to SCIRun permits distributed-memory task parallelism.
DDV (Demand Driven Visualizer)
● DDV provides a pipeline-based, demand-driven execution model that handles large data sets by requesting only the minimum amount of data required to produce the results.
12
● This is a significant advantage for data sets with a large number of stored or computed fields.
Look’s good but…
● All the approaches we describe here lack the ability to stream data in memory when the data set topologies change.
13
● Because many visualization techniques can change the data's topology, this is an important consideration.
Streaming Data :
Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes
14
Why data streaming ?
Streaming Data through a Visualization pipeline offers two main benefits :
❏ Run Visualization Data that wouldn’t normally fit into memory
❏ Run Visualizations with a smaller memory footprint resulting in higher cache hits and little swapping to disk.
15
Advantages of Data Streaming :
➔ Real-Time Insights
➔ Scalability
Streaming data allows for real-time processing and analysis
This makes it possible to scale up the visualization pipeline to handle increasing amounts of data as needed
16
Advantages of Data Streaming :
➔ Flexibility
➔ Interactivity
A visualization pipeline can be designed to handle a wide variety of data types and formats
Streaming data through a
visualization pipeline enables the creation of interactive
visualizations that allow users to explore the data in real-time
17
Condition for Data Streaming :
The Data Set Should Be : Separable
Mappable
Result Invariant
The algorithm must be able to break the dataset into pieces efficiently
We must be able to determine what portion of the input data we need to generate a given portion of the output
The result has to be independent of the number of pieces the data
18
Going smooth ?
What About Streaming UNSTRUCTURED DATA
19
Mixed Topologies :
With regularly sampled volumetric data, such as images, we can use an extent defined as (imin, imax, jmin, jmax, kmin, kmax), but this doesn’t work for unstructured data
One approach is to use a geometric extent, such as (xmin, xmax, ymin, ymax, zmin, zmax), where each coordinate represents a point in the space that bounds the data
However, determining these bounding coordinates can be computationally expensive
20
Unstructured and Curvilinear
Addressing these Challenges Becomes even more intricate when dealing with datasets that are both unstructured and Curvilinear
Curvilinear Data Structured Data Unstructured Data
21
Unstructured extent
A more Practical approach is defining an Unstructured extent as M out of N Possible Pieces, we split the dataset into N pieces but we don’t have any control over what cells a piece will contain
This raises an issue of how to support unstructured algorithms that require neighborhood information…..
K-Nearest Neighbor (K-NN) Algorithm
22
K-Nearest Neighbors Algorithm
This Assigns the new Data to a Class or Category Based on it’s K nearest neighbors
23
Ghost cells
● Imagine you have a dataset that's organized as an unstructured grid, where the data points are connected in a way that doesn't follow a regular pattern like a grid of squares or cubes.
● Now, some algorithms need more information than just the main data points.
They might need "ghost cells," which are extra cells that surround the main cells. These ghost cells provide extra context for calculations and analyses.
They're like a buffer zone that helps algorithms work better.
24
Ghost Cells:
So the Solution is that we use Ghost cells which are included when algorithms require it i.e all unstructured grid data should be capable of producing ghost cells.
● Points on the edge between different sections can end up being counted twice, leading to duplicated shapes when processed.
● To solve this, we indicate which points in an extent are owned by that extent versus the ones that are ghost points
Breaking up a sphere into a piece (red) and ghost-level cells and points
(blue and green) 25
Conversion Between Structured and Unstructured Data :
1. Structured Data To Unstructured Data
2. Unstructured Data To structured Data
26
Structured to Unstructured Data
For most Operations that take in Structured and produces Unstructured Data, The architecture can use Block-Based Division to divide Structured Data into pieces Until There are N pieces as Requested
Structured Data Unstructured Data
27
Unstructured to Structured Data
We can convert it but it’s inappropriate for most of algorithms that convert unstructured data to structured data.
To convert unstructured grid to structured grid a structured input is needed to define the geometry of desired output.
This conversion process involves resampling the Data from unstructured grid onto geometry of structured input.
28
Supporting Parallelism
❖ Most large-scale simulations use parallel processing and often results are distributed across many processing Nodes.
❖ Parallelism requires some of these Conditions :- Data Separability
Result Invariance
Asynchronous Execution Collection
29
Reason for asynchronous execution :
We require asynchronous execution so that one process isn’t unnecessarily blocked, waiting for input from another process.
30
Asynchronous execution :
● A naive approach would be to simply ask each input to generate its data in order. The
problem is that while Filter 3 is waiting for Filter 1 to compute its data, Filter 2 is idle
● Modifications done are : Trigger
Asynchronous Update - essentially, this method traverses upstream in the pipeline, and when it encounters a port, the port calls Update data on its input
● And second is to use locality of the input to determine in what order to invoke Update data on them.
Locality = 1 ,if input is generated in the same process = 0 ,if input is generated in a different process = X ,x ∈ (0, 1) if input is partially generated in Different processes
31
Data-parallel visualization example and image resulting from it
32
Encapsulation of Initialization and Communication
● Process initialization and communication tasks have been encapsulated into a class. So that this shields users from dealing with low-level
initialization and communication details
33
Concrete Subclasses for Different Process Types :
● Concrete subclasses have been created for distributed-memory and shared-memory processes.
● MPI (Message Passing Interface) and pthreads are used for distributed-memory and shared-memory parallelism.
34
Sort-Last Parallel Rendering :
Sort-Last Parallel Rendering is a technique used in computer graphics to render large data sets on tile displays
➔ It is a type of parallel rendering, which is the application of parallel programming to the computational domain of computer graphics.
➔ Interprocess communication is utilized to gather parallel renderings and composite them into a final image.
35
Centralized Rendering :
● For centralized rendering, polygonal data is collected using ports between
processes, are connected to an append filter in the collection process
36
Result
● The results here are based on using an in-memory
analytic function as a data source.We organised the data as a regular volumetric data
37
● This also avoids dealing with the issues of massively parallel I/O, which are beyond the scope of this article
Result :
● The initial visualization example involves a data-parallel pipeline performing tasks like computing an isosurface and gradient
magnitude is then applied as colors to the isosurface using a probe filter, and the result is rendered using a parallel rendering
technique
38
●
The tests were conducted on data sizes of 39 Gbytes, 1.1 Tbytes and
0.9 Pbytes utilizing processor configurations ranging from 1 to 1,024
39
● Results are measured in terms of efficiency vs. processor count.
Results of a 39-Gbyte data-parallel visualization.
● It is worth noting that it took 360,000 seconds for 0.9 Pbyte run on 1,024 processors
40
● The second visualization example demonstrates task parallelism, where there are multiple independent visualization pipelines and each performing different tasks in an asynchronous manner
41
● The first pipeline is the
isosurface pipeline that we used in the first example
● The second pipeline computes a gradient vector field from the input data
● The third pipeline extracts a cut from the input data and displays it
Conclusions
● In many simulations with distributed data, the ghost cells can only be obtained from other processes. Currently, there isn’t a standard mechanism for one
process to determine where to find specific ghost cells
● Certain algorithms like streamlines need special versions for parallel
processing to handle situations where the streamline crosses from one data section to another and keep track of this transition.
42
References
1. W.J. Schroeder, K.M. Martin, and W.E. Lorensen, The Visualization Toolkit An Object-Oriented Approach to 3D Graphics, Prentice Hall, Upper Saddle River, N.J., 1996.
2. S.G. Parker, D.M. Weinstein, and C.R. Johnson, “The SCIRun Computational Steering Software System,” Modern Software Tools in Scientific Computing, E. Arge, A.M. Brauset, and H.P. Langtangen, eds., Birkhauser Boston, Cambridge, Mass., 1997, pp. 1-40.
3. S. Molnar et al., “A Sorting Classification of Parallel Rendering,” IEEE Computer Graphics and Applications, vol. 4, no. 4, July 1994, pp. 23-31.
4. M. Cox and D. Ellsworth, “Application-Controlled Demand Paging for Out-Of-Core Visualization,” Proc. IEEE Visualization 1997, ACM Press, New York, 1997, pp.
235-244.
5. M. Cox and D. Ellsworth, “Managing Big Data for Scientific Visualization,”
Exploring Gigabyte Datasets in Real-Time: Algorithms, Data Management, and Time-Critical Design, Siggraph 97, Course Notes 4, ACM Press, New York, 1997.
6. Y.J. Chiang and C.T. Silva, “Interactive Out-of-Core Isosurface Extraction,” Proc.
IEEE Visualization 1998, ACM Press, New York, 1998, pp. 167-174
7. T.A. Funkhouser et al., “Database Management for Models Larger Than Main Memory,” Interactive Walkthrough of Large Geometric Databases, Course Notes 32, Siggraph 95, ACM Press, New York, 1995.
8. I. Itoh and K. Koyamada, “Automatic Isosurface Propagation Using an Extrema Graph and Sorted Boundary Cell Lists,” IEEE Trans. Visualization and Computer Graphics, vol. 1, no. 4, Dec. 1995, pp. 319-327
9. S. Subramanian and S. Ramaswamy, “The P-Range Tree: A New Data Structure for Range Searching in Secondary Memory,” Proc. ACM/SIAM Symp. Discrete
Algorithms, SIAM, Philadelphia, Pa., 1995, pp. 378-387. 43
10. S. Teller et al., “Partitioning and Ordering Large Radiosity Computations,” Proc. Siggraph 94, ACM Press, New York, 1994, pp. 443-450.
11. S.K. Ueng, K. Sikorski, and K.-L. Ma, “Out-of-Core Streamline Visualization on Large Unstructured Meshes,” IEEE Trans. Visualization and Computer Graphics, vol. 3, no. 4, Oct.–Dec. 1997, pp. 370-380.
12. D. Song and E. Golin, “Fine-Grain Visualization Algorithms in Dataflow
Environments,” Proc. IEEE Visualization 1993, IEEE CS Press, Los Alamitos, Calif., 1993, pp. 126-133.
13. G. Abrams and L. Trenish, “An Extended Data-Flow Architecture for Data Analysis and Visualization,” Proc. IEEE Visualization 1995, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 263-270.
14. C. Upson et al., “The Application Visualization System: A Computational Environment for Scientific Visualization,” IEEE Computer Graphics and Applications, vol. 9, no. 4, July 1989, pp. 30-42.
15. M. Krogh and C. Hansen, “Visualization on Massively Parallel Computers using CM/AVS,” AVS Users Conf., 1993, pp. 129-137,
http://www.acl.lanl.gov/Viz/abstracts/Parallel AC-AVS.html.
16. C.R. Johnson and S. Parker, “The SCIRun Parallel Scientific Computing Problem-Solving Environment,” Ninth SIAM Conf. Parallel Processing for Scientific Computing, SIAM, Philadelphia, Pa., 1999.
17. M. Miller, C. Hansen, and C. Johnson, “Simulation Steering with SCIRun in a Distributed Environment,” Applied Parallel Computing, Fourth Int’l Workshop (PARA 98), Lecture Notes in Computer Science, no. 1541, B. Kagström, J. Dongarra, E.
Elmroth, and J. Wasniewski, eds., SpringerVerlag, Berlin, 1998, pp. 366-376.
18. P.J. Moran and C. Henze, “Large Field Visualization With Demand-Driven Calculation,” Proc. IEEE Visualization 1999, ACM Press, New York, 1999, pp. 27-33.
19. R. Haimes and D.E. Edwards, Visualization in a Parallel Processing Environment, American Inst. of Aeronautics and Astronautics, Reston, Va., 1997.
20. C.C. Law et al., “A Multithreaded Streaming Pipeline Architecture for Large Structured Data Sets,” Proc. IEEE Visualization 1999, ACM Press, New York, 1999, pp.
225-232.
21. W. Gropp, E. Lusk, and A. Skjellum, Using MPI, Portable Parallel Programming with the Message-Passing Interface, MIT Press, Cambridge, Mass., 1994.
22. G. Humphreys et al., “Distributed Rendering for Scalable Displays,” Proc.
Supercomputing, CD-ROM, ACM Press, New York, 2000.