日時: 平成19年6月29日(金)14:00〜16:00
場所: 第3講義室(ウエスト2号館3F)
講師: Prof. Jean-Francois Boulicaut
INSA-Lyon, LIRIS CNRS UMR5205 Villeurbanne, France タイトル:Condensed representations for frequent sets: theory and applications
概要: Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. For instance, the problem of frequent set mining from huge boolean matrices has given rise to many efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, association-based classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Furthermore, in many cases, the condensed representations are useful for themselves (e.g., to provide directly association rules with minimal left-hand side or maximal associations between sets of properties and sets of objects). Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations(e.g., closed sets, generators and free sets, non derivable itemsets) while it is also possible to consider approximated ones (e.g., delta-free itemsets). In other terms, it can be useful to trade computational complexity with a bounded approximation on the computed support values.
In this seminar, we will survey the core concepts used in the recent works on condensed representation for frequent sets. Targeted applications in molecular biology (gene expression data analysis) will support our discussion on the added-value for practitionners.
An introduction to the topics covered in this seminar can be found in:
Toon Calders, Chirstophe Rigotti, Jean-Fran?ois Boulicaut: A survey on condensed representations for frequent sets. Constraint-based mining and Inductive Databases. Jean-Fran?ois Boulicaut, Luc de Raedt, and Heikki Mannila, eds. Springer-Verlag LNCS 3848, pp. 64-80, 2005.