SLIDES:
|
pdf (2 Mb)
|
ABSTRACT:
The importance of geo-spatial data is growing with
the increasing availability of large geo-spatial datasets such as
maps, remote-sensing images, and the decennial census.
Applications include geo-spatial intelligence,
real-time situation assessment (e.g. during disaster response);
high-fidelity terrain visualization (e.g. Google Earth, flight simulators);
location-based services;
predicting clustering or spread of disease;
finding crime hot spots;
mission to planet earth (global change
and climatology, land-use classification);
etc.
Many of these applications often impose stringent performance
and response time constraints
which can not often be met by today's sequential Geographic Information
Systems (GIS) due to the large volume of geo-spatial datasets and the
complexity of geo-spatial data-items including
satellite imagery, and extended objects (e.g. polygons and line-strings).
High performance computing, e.g. parallelization of GIS, may meet the
requirements of some of these applications.
In this talk, we illustrate this message in
context of a few case studies including
neural network backpropagation learning algorithm,
real-time terrain visualization,
change-interval detection,
multi-scale multi-granular geo-image classification,
and parameter estimation for spatial auto-regression.
These case studies span a broad range of parallel high-performance computing
including GPGPUs, shared-memory platforms (e.g., OpenMP, Unified Parallel C),
and distributed platforms (e.g., MPI, Hadoop/Map-reduce).
We also explore cross-cutting issues such as spatial data partitioning
and dynamic load balancing to improve speed-ups by reducing idling as well as
costs associated with communication and synchronization.
For example, data-partitioning is an effective approach towards achieving high
performance in GIS. However, partitioning extended spatial objects is difficult,
and special techniques such as systematic declustering beyond random partitioning are needed.
Experiments also show that the replication of data may be needed to facilitate
dynamic load balancing, as the cost of local processing is often less than the
cost of data transfer for spatial objects.
KEYWORDS:
Spatial Datasets, High-Performance, Parallel, Geographic Information Systems,
Range Query, Spatial Auto-regression.
NOTE:
Some of the results discussed in this talk appeared
in the following publications:
-
Parallel Processing over Spatial-Temporal Datasets from Geo, Bio, Climate and Social Science Communities:
A Research Roadmap
, IEEE
International Congress on BigData , 2017: 232-250 (10.1109/BigDataCongress.2017.39).
(with S. Prasad et al.)
-
A vision for GPU-accelerated parallel computation on geo-spatial datasets,
ACM SIGSPATIAL Special, 6(3): 19-26, 2014.
-
GPGPU-accelerated interesting interval discovery and other computations on GeoSpatial datasets:
a summary of results,
ACM International Workshop on Big Spatial Data, ACM SIGSPATIAL, 2013: 65-72,
(with Sushil K. Prasad et al.),
-
Spatiotemporal data mining in the era of big spatial data: algorithms and applications,
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, 2012,
Pages 1-10.
(with R. Vatsavai et al.).
-
V. Gandhi, M. Celik, and S. Shekhar,
Parallelizing Multiscale and Multigranular Spatial Data Mining Algorithms,
Workshop on Partitioned Global Address Space , 2006.
-
B. M. Kazar, S. Shekhar, D. J. Lilja, D. Boley,
A Parallel Formulation of the Spatial Auto-Regression Model
for Mining Large Geo-Spatial Datasets,
Proc. of 2004 SIAM International Conf. on Data Mining Workshop on High
Performance and Distributed Mining (HPDM2004), Florida, USA, April 2004.
- S. Shekhar, Q. Lu, S. Kim,
A Novel Approch to Evacuation Route Planning,
in Army AHPCRC Research Center Bulletin, 15(4), 2005.
-
S. Shekhar, S. Ravada, G. Turner, D. Chubb, and V. Kumar.
Declustering and Load-Balancing
Methods for Parallelizing Geographic Information Systems,
IEEE Trans. on Knowledge and Data Eng,
IEEE, Vol. 10, No. 4, July-Aug. 1998.
-
Duen-Ren Liu and S. Shekhar,
Partitioning similarity graphs: A framework for declustering problems,
Information Systems, 21(6):475-496, Elsevier, September 1996.
-
S. Shekhar, S. Ravada, G. Turner, D. Chubb, and V. Kumar.
Parallelizing a GIS on a Shared Address Space Architecture,
Computer (Special Issue on Shared Memory Multipro-cessors),
IEEE, Vol. 29, No. 12, Dec. 1996.
- Y. Zhou and S. Shekhar,
Disk allocation methods for parallelizing grid files ,
Prof. Intl. Conf. on Data Eng., IEEE, 1994.
-
A Scalable Parallel Formulation of the Backpropagation Algorithm for Hypercubes and Related Architectures
,
IEEE Transactions on Parallel and Distributed Systems, 5(10):1073-1090, October 1994,
doi: 10.1109/71.313123,
with V. Kumar and B. Amin).
-
S. Shekhar and S. Chawla,
Spatial Databases: A Tour (Chapters 5 and 7),
Prentice Hall 2003, ISBN 0-13-017480-7.