A text annotation method based on semantic sequences
Abstract
This paper presents a text annotation method based on semantic sequences to label a document and a cluster of documents. The basic idea underlying the semantic sequence approach is to find locally frequent meanings to act as the labels of a document, using an ontology such as WordNet. The ontology is also used to measure the semantic similarity of labels that indicate similarity between documents. Further, a text clustering method based upon four natural rules is introduced to cluster documents and label each cluster. This method does not need any pre-defined number of clusters, which is necessary for the partitioning clustering method, and avoids the need to set appropriate levels as in the hierarachical clustering method.