13.Discovering Characteristic Expressions from Literary Works: A New Text Analysis Method beyond N-Gram Statistics and KWIC

   Masayuki Takeda, Tetsuya Matsumoto, Tomoko Fukuda and Ichiro Nanri
Proc. Third International Conference on Discovery Science (DS2000),pp. 112-126, 平成12年12月

【Abstract】
We attempt to extract characteristic expressions from literary works.That is,our probrem is,given literary works by a particular writer as positive examples and works by another writer as negative examples,to find expressions that appear frequently in the positive examples but do not so in the negative examples.It is considered as a spesial case of the optimal pattern discovery from textual data,in which only the substring patterns are considered.One reasonable approach is to create a list of substrings arranged in the descending order of their goodness,and to examine a first part of the list by a human expert.Since there is no word boundary in Japanese texts,a substring is often a fragment of a word or a phrase.How to assist the human expert is a key to success in discovery.In this paper,we propose(1)to restrict to the prime substrings in order to remove redundancy from the list,and(2)a way of browsing the neighbor of a focused string as well as its context.Using this method,we report successful results against two pairs of anthologies of classical Japanese poems.We expect that the extracted expressions will possibly lead to discovering overlooked aspects of individual poets.