DATA: A KEY FOUNDATION OF THE DIGITAL AGE

Over half a century of experience with the use of computers has shown that information is key to the functioning of any organization. Furthermore, experience with using information leads to the desire to obtain more of it, invariably leading to a complete metamorphosis of the process using it, as the understanding of the information increases. Information is generated when an interpretation is applied, in the context of a process, to a piece of data, and hence data is our best approximation of reality. This key role of data in modeling reality from a computational perspective has led to the phenomenal success of databases in practically all applications. As the role of data continues to increase in the digital age, there is an increasing need to develop new computational techniques for the storage, management, and analysis of data.

The Department of Computer Science and Engineering has a significant ongoing activity in the areas of databases and data mining, including activities in five laboratories. Faculty members, Professors John Carlis, Vipin Kumar, James Slagle, Shashi Shekhar, and Jaideep Srivastava, each expert in his respective field, direct on-going projects with both graduate and undergraduate students.

 

John Carlis

Professor John Carlis' main research interest is database management systems (DBMS). Within DBMS he is interested in data modeling, language extensions, and nontraditional applications. Data models are the focus of much of his work. He has built data models for a number of real, complex systems, thereby tempering theory with practice. His current interests include improving the user-DBMS interface, specifically in providing more powerful, natural commands that allow users to confidently and concisely express queries that otherwise may go unasked. The need to integrate separately developed business applications drove the development of DBMS's. Now other application areas (e.g., scientific computation, expert systems, CAD, and software design) are being similarly driven, and DBMS capability must be extended. Professor Carlis is interested in creating data models for such applications, assessing the match between DBMS capabilities and user requirements, and then improving the DBMS. Scientific applications provide fertile ground for database research, and Professor Carlis has three interdisciplinary database projects in progress. He is working with biologists and other computer scientists on plant genome, neuro-scientific, and chimpanzee databases.

 

Vipin Kumar

In addition to his long-standing interest in high-performance computing, Professor Vipin Kumar has an active interest in data mining. In this area, his research group is developing novel methods for mining information in high dimensionality data that pose major challenges for conventional data mining algorithms. Recent developments include a novel methodology for finding clusters in large high-dimensional data sets. In this scheme, relations among data items are captured using a graph or a hyper-graph, and efficient multi-level graph-based algorithms are used to find clusters of highly related items.

This methodology has been applied successfully to a variety of domains such as stock market data, and DNA data, documents on the Web. These experiments demonstrate that the graph-based approach is applicable and effective in a wide range of domains, and outperforms conventional clustering techniques such as K-Means even when they are used in conjunction with dimensionality reduction methods such as Principal Component Analysis. Graph-based methodology is also being used in nearest neighbor classification scheme in which the importance of discriminating variables is learned using mutual information and weight adjustment techniques. Empirical evaluations on many sets of real world documents demonstrate that this scheme outperforms state of the art classification algorithms such as C4.5, Ripper, Naive-Bayesian, and PEBLS. This research is being done in collaboration with a number of companies such as GTE, Fingerhut, and West Publishing.

 

James Slagle

Professor Slagle heads the Datatool project, whose overall goal is to apply computer science techniques to traffic engineering. Sponsored by the Minnesota Department of Transportation, the Datatool system provides a graphical user interface to a database management system. This is integrated with data analysis and display facilities. Current work is directed towards applying advanced computer science techniques to error detection and correction in data obtained from highway sensors.

 

Shashi Shekhar

Professor Shekhar's main research interest is in geographic information systems (GIS), which includes databases for managing spatial networks (e.g. road-maps), parallelization of GIS, routing algorithms for Advanced Traveler Information Systems, and archival of traffic measurements. His research group has developed some of the most efficient indexing methods for large roadmaps and algorithms for path evaluation as well as for computing shortest paths. Connectivity-Clustered Access Method (CCAM), a new storage and access method for spatial networks, has been developed and outperforms alternative schemes in carrying out network computations. In knowledge engineering, work has been done on the problem of discovery in database. Symbolic data mining techniques as well as neural networks have been studied. One of the fastest scalable parallel formulation of back-propagation learning algorithms for neural networks computes over one Giga connections per second. Re-search sponsors include the National Science Foundation, National Aero-nautics and Space Administration, Army Research Laboratories, Control Data Inc., U.S. Department of Trans-portation, Minnesota Department of Transportation, and the ITS Institute.

 

Jaideep Srivastava

Professor Srivastava's research interests are in databases, data mining, and multi-media computing. One of his current projects investigates the application of data mining techniques to Web data. Sponsored by the National Science Foundation, this project investigates how information about content, structure, and usage of the Web can be mined for knowledge useful to various applications. A critical issue is the modeling of human interaction with the Web. Page hits are at too fine a granularity to provide useful information; and user behavior must be analyzed at a coarser granularity. The approach is to group Web page hits into user transactions, based on clustering, which serve as the units of human interaction with the Web. Ongoing work is using Markov models to approximate the process a user is going through in browsing the Web. Another interesting issue is to mine for interesting usage patterns in Web logs. Hyperlinks in Web pages capture the author's view of pieces of information linked together, while browsing patterns capture the users' view of it. A usage pattern is interesting if there is significant disagreement between the two views. The framework of logic with supports is being used to model the beliefs in this environment, and information about content, structure, and usage of Web pages is used to estimate the degrees of these beliefs.

 

For additional information on these faculty members, please visit the Software Systems Faculty Profiles page.


Back to Main Page