TITLE:

High Performance Computing With Spatial Datasets

PRESENTER:

Shashi Shekhar : Biography , Homepage

AFFILIATION:

Computer Science Department, University of Minnesota.

URL:

http://www.cs.umn.edu/~shekhar

SLIDES: pdf (2 Mb)

ABSTRACT:

The importance of geo-spatial data is growing with the increasing availability of large geo-spatial datasets such as maps, remote-sensing images, and the decennial census. Applications include geo-spatial intelligence, real-time situation assessment (e.g. during disaster response); high-fidelity terrain visualization (e.g. Google Earth, flight simulators); location-based services; predicting clustering or spread of disease; finding crime hot spots; mission to planet earth (global change and climatology, land-use classification); etc. Many of these applications often impose stringent performance and response time constraints which can not often be met by today's sequential Geographic Information Systems (GIS) due to the large volume of geo-spatial datasets and the complexity of geo-spatial data-items including satellite imagery, and extended objects (e.g. polygons and line-strings).

High performance computing, e.g. parallelization of GIS, may meet the requirements of some of these applications. In this talk, we illustrate this message in context of a few case studies including neural network backpropagation learning algorithm, real-time terrain visualization, change-interval detection, multi-scale multi-granular geo-image classification, and parameter estimation for spatial auto-regression. These case studies span a broad range of parallel high-performance computing including GPGPUs, shared-memory platforms (e.g., OpenMP, Unified Parallel C), and distributed platforms (e.g., MPI, Hadoop/Map-reduce).

We also explore cross-cutting issues such as spatial data partitioning and dynamic load balancing to improve speed-ups by reducing idling as well as costs associated with communication and synchronization. For example, data-partitioning is an effective approach towards achieving high performance in GIS. However, partitioning extended spatial objects is difficult, and special techniques such as systematic declustering beyond random partitioning are needed. Experiments also show that the replication of data may be needed to facilitate dynamic load balancing, as the cost of local processing is often less than the cost of data transfer for spatial objects.

KEYWORDS: Spatial Datasets, High-Performance, Parallel, Geographic Information Systems, Range Query, Spatial Auto-regression.

NOTE: Some of the results discussed in this talk appeared in the following publications:

  1. Parallel Processing over Spatial-Temporal Datasets from Geo, Bio, Climate and Social Science Communities: A Research Roadmap , IEEE International Congress on BigData , 2017: 232-250 (10.1109/BigDataCongress.2017.39). (with S. Prasad et al.)
  2. A vision for GPU-accelerated parallel computation on geo-spatial datasets, ACM SIGSPATIAL Special, 6(3): 19-26, 2014.
  3. GPGPU-accelerated interesting interval discovery and other computations on GeoSpatial datasets: a summary of results, ACM International Workshop on Big Spatial Data, ACM SIGSPATIAL, 2013: 65-72, (with Sushil K. Prasad et al.),
  4. Spatiotemporal data mining in the era of big spatial data: algorithms and applications, Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, 2012, Pages 1-10. (with R. Vatsavai et al.).
  5. V. Gandhi, M. Celik, and S. Shekhar, Parallelizing Multiscale and Multigranular Spatial Data Mining Algorithms, Workshop on Partitioned Global Address Space , 2006.
  6. B. M. Kazar, S. Shekhar, D. J. Lilja, D. Boley, A Parallel Formulation of the Spatial Auto-Regression Model for Mining Large Geo-Spatial Datasets, Proc. of 2004 SIAM International Conf. on Data Mining Workshop on High Performance and Distributed Mining (HPDM2004), Florida, USA, April 2004.
  7. S. Shekhar, Q. Lu, S. Kim, A Novel Approch to Evacuation Route Planning, in Army AHPCRC Research Center Bulletin, 15(4), 2005.
  8. S. Shekhar, S. Ravada, G. Turner, D. Chubb, and V. Kumar. Declustering and Load-Balancing Methods for Parallelizing Geographic Information Systems, IEEE Trans. on Knowledge and Data Eng, IEEE, Vol. 10, No. 4, July-Aug. 1998.
  9. Duen-Ren Liu and S. Shekhar, Partitioning similarity graphs: A framework for declustering problems, Information Systems, 21(6):475-496, Elsevier, September 1996.
  10. S. Shekhar, S. Ravada, G. Turner, D. Chubb, and V. Kumar. Parallelizing a GIS on a Shared Address Space Architecture, Computer (Special Issue on Shared Memory Multipro-cessors), IEEE, Vol. 29, No. 12, Dec. 1996.
  11. Y. Zhou and S. Shekhar, Disk allocation methods for parallelizing grid files , Prof. Intl. Conf. on Data Eng., IEEE, 1994.
  12. A Scalable Parallel Formulation of the Backpropagation Algorithm for Hypercubes and Related Architectures , IEEE Transactions on Parallel and Distributed Systems, 5(10):1073-1090, October 1994, doi: 10.1109/71.313123, with V. Kumar and B. Amin).
  13. S. Shekhar and S. Chawla, Spatial Databases: A Tour (Chapters 5 and 7), Prentice Hall 2003, ISBN 0-13-017480-7.