Scientific Computing: Still Mainstream Computer Science



Picture: Left to right: Top Row: Peg Howland, Shiv Gowda, Irene Moulitsas.
Middle Row: Mahbubur Rahim Khan, Hyunsoo Kim, Kris Kampshoff, Dong Wei Cao.
Bottom Row: Cheonghee Park, Haesun Park, George Karypis, Na Li.

"Should scientific computing (or numerical analysis) be a part of computer science?" is a question often heard in computer science departments. Many, if not most, computer science departments started in the 60s and 70s as spin-offs from mathematics departments. Then the dominant topic was numerical computing, and the most common programming language was FORTRAN. Since then computer science has expanded to include many more specialties, and the field of scientific computing has itself grown to become what it is today: a field which comprises a rich variety of interrelated engineering and theoretical core disciplines, including, for example, numerical analysis, matrix computation, sparse matrix methods, and parallel numerical algorithms.

That the answer to the above question is 'yes' becomes clear only after one understands that the field of scientific computing is itself undergoing important mutations. Techniques of scientific computing are penetrating non-traditional areas, and this is changing the view of many computer scientists. As an example, methods for information retrieval may exploit vector space representation, and some of these methods require singular values and vectors of large and sparse matrices. Another example is the development of graph partitioning methods based on spectral analysis.

This interesting link between the continuous and the discrete is prevalent in many other applications. Conversely, methods in scientific computing are utilizing tools from computer science. For example, Metis, a graph-partitioning tool developed by George Karypis and Vipin Kumar, is widely used as a prerequisite step in many high-performance computing jobs where parallel approaches, i.e., approaches that harness the power of many processors toward solving a big problem, are utilized. In another example, Yousef Saad uses graph theory tools when designing solvers for sparse linear systems.

Haesun Park has recently developed a feature extraction method called LDA/GSVD. LDA/GSVD extends applicability of the widely accepted linear discriminant analysis (LDA) to high dimensional and undersampled data by using a matrix pair decomposition method called the generalized singular value decomposition (GSVD). LDA/GSVD has been successfully applied to document classification, facial recognition, and visualization. This research is closely tied to the development of knowledge-based prediction systems which are scalable to data. A system has been developed for successful prediction of protein secondary structure and solvent accessibility using the support vector machine classifiers.

George Karypis: research is primarily focused on developing tools that enable the efficient solutions of scientific applications on parallel computers. Such examples include the graph and mesh partitioning libraries, METIS and ParMETIS that can effectively decompose a variety of mesh-based parallel numerical simulations, the PSPASES library for solving symmetric positive definite systems on large numbers of processors and the MGridGen library for automatically generating a sequence of coarse grids for multigrid solvers. He has also focused on serial and parallel algorithms for mining scientific datasets and is leading a major research effort on graph mining.

Finally, Daniel Boley has developed a fast unsupervised clustering algorithm by bringing together ideas from machine learning, statistics, pattern recognition, and especially sparse linear algebra methods. The result, the method of Principal Direction Divisive Partitioning (PDDP), has been publicly available for download for five years. It is a method that has been particularly successful on text document collections, largely because it takes great advantage of the high degree of sparsity in the data. Its success depends on the use of sparse linear algebra methods similar to those used in the past in spectral graph partitioning.

A second important trend in scientific computing is its emergence as a force in a new era where multidisciplinary research has become mandatory. A generation ago one could spend a lifetime on a narrowly defined research topic and be quite prolific. As numerical methods have matured, new needs have emerged, and collaborations with experts from other areas have become the norm. In this situation the standard approach of just providing a .canned. product is limiting. Yousef Saad has had one of these collaborations. He started working with Professor Jim Chelikowsky from the Department of Chemical Engineering and Materials Science about twelve years ago. The basic research has to do with .electronic structures calculations. in quantum mechanics. The goal is to study the electronic or optical properties of materials. Methods used here allow one to characterize systems of a few hundred to a thousand atoms. The simulations are among the most computationally intensive ones today. A typical simulation done in the Saad and Chelikowsky group could consume a whole week of the IBM SP computer available at the Minnesota Supercomputing Institute. The core computation is a large eigenvalue problem. A decade ago, the main goal was to just write a parallel program that could run on a parallel platform. The fact that a group from computer science was collaborating with a physics team was a huge advantage. Some of the computations done in the mid to late 90s were the first and biggest of their kind.

The field of scientific computing is currently at a turning point. As computer power increases, there is no doubt that research in more effective and more powerful new algorithms will only gain importance. Computing - as in number crunching - is basic to almost every engineering and scientific field, and the need will grow as new technologies develop and mature.

-Yousef Saad, Daniel Boley, George Karypis and Haesun Park

Software Available:

Many of the software packages mentioned are freely available. Yousef Saad has made available a number of packages for sparse matrix computations over the past 12 years. Those who are familiar with iterative methods and sparse matrices know SPARSKIT, a package for performing some of the basic computations with sparse matrices. The package, the first version of which appeared in 1989, is written in FORTRAN 77, and is still widely used for its .iterative solvers. suite. A more recent package called pARMS provides parallel solvers for a message passing environment. pARMS is written in C and uses MPI for communication. Visit http://www-users.cs.umn.edu/~saad/software/home.html.

The tools developed by George Karypis described above are freely distributed via the department.s website and are used extensively in hundreds of sites world-wide and have been incorporated in many commercial applications. Information about these tools and directions for downloading can be found at http://www.cs.umn.edu/~karypis/software .

For information on getting Daniel Boley's software, see http://www-users.cs.umn.edu/~boley/Distribution/PDDP2.html.