|
My
primary research interests include data mining, machine learning and algorithm
development. Specifically, I am interested in
application of various data mining and machine learning approaches for
anomaly detection in different application domains such as aircraft
safety, web mining, network intrusion detection, earth science data
analysis, and analyzing gene expression data.
My resume is available here [pdf, ps].
My PhD. thesis is focussed on detecting
anomalies in sequence data. I investigate different techniques
to address this problem for large symbolic as well as continous
sequence databases, and the application of such techniques for anomaly
detection in variety of real application domains such as aircraft
safety, system call intrusion detection, and anomaly detection in
biological data sets.
I have
been involved in following projects at University
of Minnesota:
-
Discovery of Changes from the Global
Carbon Cycle and Climate System Using Data Mining
(funded by Earth-sciences
group at NASA Ames). I investigated the application of various
statistical time-series modeling techniques to detect interesting
trends in the spatio-temporal data sets capturing different climate
features over the last 50 years for the entire earth.
-
Minnesota INtrusion Detection System(MINDS) (funded by Army Research Labs, ARL) is a
data mining based approach to address various aspect of cyber/network
security. I have
focused on the MINDS Anomaly
Detector which is a
distance based outlier detection technique applied to network setting
to detect behavioral anomalies. Apart from this I have also worked on
second level analysis of cyber attacks to detect sophisticated attacks
on large organizations.
-
Summarization of categorical data. This work deals with the problem of
summarizing large datasets defined over categorical attributes to
obtain a compact yet informative summary to the analyst. We have
proposed a novel formulation of this problem with well-defined metrics
and come up with algorithms to obtain these summaries. This work has
been applied to network data sets. A variant of this research is also
used as a summarizing component of MINDS where
anomalous traffic is compressed to a few lines corresponding to attack
(or attack like) patterns in the network traffic.
-
Situational Awareness Analysis Tool for
Aiding Discovery of Security Events and Patterns) (funded by ARDA). Developed a two level
analysis tool to detect sophisticated
cyber-attacks through second level analyis of cyber attacks on large
organizations (patent pending).
-
Privacy Preserving Data Mining (funded by AGNIK Corporation). Involved
in
development of a cross-domain intrusion detection system (PURSUIT)
using privacy-preserving distributed data mining techniques.
-
Analyzing
gene expression data. Investigated application of data mining and machine learning
approaches be used to analyze gene expression data. I have adopted
concepts from association analysis and unsupervised clustering domain
to obtain functional groups from gene expression data .
-
Anomaly
Detection in Flight Record Data. (Funded by Intelligent Systems Division at
NASA, Ames) Currently I am involved in a project that addresses
detecting anomalies in flight sequence data. I am applying various
outlier detection and sequence modeling techniques to detect anomalous
flight sequences, that would correspond to potential flight hazards.
I have
also gained valuable experience in industry, in applying my research to
various real life problems.
-
Tax Fraud Detection (Department of
Revenue, Minnesota). Applied supervised classification approaches and
unsupervised clustering based outlier detection approaches to detect
fraudulent income tax returns.
-
Click Fraud Detection (Yahoo! Data
Mining Research). This involved detecting fraudulent
clicks in the web ad click data for Yahoo! by applying a Kalman
Filtering based outlier detection technique. I also implemented a
time-series outlier detection component which was integrated into the Yahoo
Data Mining Platform (YDMP).
-
User
Categorization (Yahoo! Data Mining Research).
Investigated data mining techniques to infer deep user interest
categories from Yahoo! search
query logs. Developed a hierarchical classificationcation technique to
classify users into deep interest categories from user query history on
Yahoo! search. Implemented a multi-class hierarchical classifyer using
SVMs in C++.
My undergraduate
project at IIT Madras, India dealt with semi-structured
data storage and retrieval. A research paper based on my
undergraduate project can be found here.
Please visit my publications
page for more details on my previous and ongoing research.
|