A text annotation method based on semantic sequences

Bao, J.; Lyon, C.; Lane, P.C.R.

View/Open

iwcs07.pdf (PDF, 67Kb)

Author

Bao, J.

Lyon, C.

Lane, P.C.R.

Abstract

This paper presents a text annotation method based on semantic sequences to label a document and a cluster of documents. The basic idea underlying the semantic sequence approach is to find locally frequent meanings to act as the labels of a document, using an ontology such as WordNet. The ontology is also used to measure the semantic similarity of labels that indicate similarity between documents. Further, a text clustering method based upon four natural rules is introduced to cluster documents and label each cluster. This method does not need any pre-defined number of clusters, which is necessary for the partitioning clustering method, and avoids the need to set appropriate levels as in the hierarachical clustering method.