Pattern Directed Mining for Frequent Episodes

Date of Submission: 
February 9, 1998
Report Number: 
98-007
Abstract: 
Sequence data arises naturally in many applications. Forexample,marketing and sales data collected over a period of timeprovidessequenceswhich can be analyzed and used for projections andforecasting.Abstractly, such adata collection can be viewed as a sequence of events, whereeach eventhas an associated time of occurence. An important andinterestingcharacteristic of event sequences is the occurrence of {itepisodes},i.e. a collection of events occuring in a certain patterncite{mannila-kdd95}. Of special interest are {it frequenteepisodes},i.e.episodes occuring with a frequency above a certainthreshold.In this paper, we study the problem of mining for frequentepisodes insequence data. We present a framework for efficient datamining frequentepisodes which goes beyond previous work in a number ofways. First, wepresent a language for specifying episodes of interest.Second, wedescribe anovel data structure, called the {it sequential patterntree}, whichcaptures the relationships specified in the patternslanguage in a verycompact manner. Third, we show how this data structure canbe used toby a standard bottom-up frequent episode mining algorithm todramaticallyreduce its computational cost. We present the results of apreliminaryevaluationof the proposed techniques.