research

My primary research interests include data mining, machine learning and algorithm development. Specifically, I am interested in application of various data mining and machine learning approaches for anomaly detection in different application domains such as aircraft safety, web mining, network intrusion detection, earth science data analysis, and analyzing gene expression data.

My resume is available here [
pdf, ps].

My PhD. thesis is focussed on detecting anomalies in sequence data. I investigate different techniques to address this problem for large symbolic as well as continous sequence databases, and the application of such techniques for anomaly detection in variety of real application domains such as aircraft safety, system call intrusion detection, and anomaly detection in biological data sets.

I have been involved in following projects at University of Minnesota:

  • Discovery of Changes from the Global Carbon Cycle and Climate System Using Data Mining (funded by Earth-sciences group at NASA Ames). I investigated the application of various statistical time-series modeling techniques to detect interesting trends in the spatio-temporal data sets capturing different climate features over the last 50 years for the entire earth.

  • Minnesota INtrusion Detection System(MINDS) (funded by Army Research Labs, ARL) is a data mining based approach to address various aspect of cyber/network security. I have focused on the MINDS Anomaly Detector which is a distance based outlier detection technique applied to network setting to detect behavioral anomalies. Apart from this I have also worked on second level analysis of cyber attacks to detect sophisticated attacks on large organizations.

  • Summarization of categorical data. This work deals with the problem of summarizing large datasets defined over categorical attributes to obtain a compact yet informative summary to the analyst. We have proposed a novel formulation of this problem with well-defined metrics and come up with algorithms to obtain these summaries. This work has been applied to network data sets. A variant of this research is also used as a summarizing component of MINDS where anomalous traffic is compressed to a few lines corresponding to attack (or attack like) patterns in the network traffic.

  • Situational Awareness Analysis Tool for Aiding Discovery of Security Events and Patterns) (funded by ARDA). Developed a two level analysis tool to detect sophisticated cyber-attacks through second level analyis of cyber attacks on large organizations (patent pending).

  • Privacy Preserving Data Mining (funded by AGNIK Corporation). Involved in development of a cross-domain intrusion detection system (PURSUIT) using privacy-preserving distributed data mining techniques.

  • Analyzing gene expression data. Investigated application of data mining and machine learning approaches be used to analyze gene expression data. I have adopted concepts from association analysis and unsupervised clustering domain to obtain functional groups from gene expression data .

  • Anomaly Detection in Flight Record Data. (Funded by Intelligent Systems Division at NASA, Ames) Currently I am involved in a project that addresses detecting anomalies in flight sequence data. I am applying various outlier detection and sequence modeling techniques to detect anomalous flight sequences, that would correspond to potential flight hazards.

I have also gained valuable experience in industry, in applying my research to various real life problems.

  • Tax Fraud Detection (Department of Revenue, Minnesota). Applied supervised classification approaches and unsupervised clustering based outlier detection approaches to detect fraudulent income tax returns.

  • Click Fraud Detection (Yahoo! Data Mining Research). This involved detecting fraudulent clicks in the web ad click data for Yahoo! by applying a Kalman Filtering based outlier detection technique. I also implemented a time-series outlier detection component which was integrated into the Yahoo Data Mining Platform (YDMP).

  • User Categorization (Yahoo! Data Mining Research). Investigated data mining techniques to infer deep user interest categories from Yahoo! search query logs. Developed a hierarchical classificationcation technique to classify users into deep interest categories from user query history on Yahoo! search. Implemented a multi-class hierarchical classifyer using SVMs in C++.

My undergraduate project at IIT Madras, India dealt with semi-structured data storage and retrieval. A research paper based on my undergraduate project can be found here.

Please visit my publications page for more details on my previous and ongoing research.