日時: 平成19年6月29日(金)14:00〜16:00
場所: 第3講義室(ウエスト2号館3F)
講師: Prof. Jean-Francois Boulicaut
INSA-Lyon, LIRIS CNRS UMR5205
Villeurbanne, France
タイトル:Condensed representations for frequent sets: theory and applications
概要:
Solving inductive queries which have to return complete
collections of patterns satisfying a given predicate has been
studied extensively the last few years. For instance, the problem of
frequent set mining from huge boolean matrices has given rise to
many efficient solvers. Frequent sets are indeed useful for
many data mining tasks, including the popular association rule
mining task but also feature construction, association-based
classification, clustering, etc. The research in this area has
been boosted by the fascinating concept of condensed representations
w.r.t. frequency queries. Such representations can be used to
support the discovery of every frequent set and its support
without looking back at the data. Furthermore, in many cases,
the condensed representations are useful for themselves (e.g.,
to provide directly association rules with minimal left-hand side or
maximal associations between sets of properties and sets of objects).
Interestingly, the size of condensed representations can be several
orders of magnitude smaller than the size of frequent set collections.
Most of the proposals concern exact representations(e.g., closed sets,
generators and free sets, non derivable itemsets) while it is also possible
to consider approximated ones (e.g., delta-free itemsets). In other
terms, it can be useful to trade computational complexity with a
bounded approximation on the computed support values.
In this seminar, we will survey the core concepts used in the recent
works on condensed representation for frequent sets. Targeted
applications in molecular biology (gene expression data analysis)
will support our discussion on the added-value for practitionners.
An introduction to the topics covered in this seminar can be
found in:
Toon Calders, Chirstophe Rigotti, Jean-Fran?ois Boulicaut:
A survey on condensed representations for frequent sets.
Constraint-based mining and Inductive Databases.
Jean-Fran?ois Boulicaut, Luc de Raedt, and Heikki Mannila, eds.
Springer-Verlag LNCS 3848, pp. 64-80, 2005.