Comparing Different Text Similarity Methods

Bao, J., Lyon, C., Lane, P.C.R., Ji, W. and Malcolm, J. (2007) Comparing Different Text Similarity Methods. [Report]

Copy

This paper reports experiments on a corpus of news articles from the Financial Times, comparing different text similarity models. First the Ferret system using a method based solely on lexical similarities is used, then methods based on semantic similarities are investigated. Different feature string selection criteria are used, for instance with and without synonyms obtained from WordNet, or with noun phrases extracted for comparison. The results indicate that synonyms rather than lexical strings are important for finding similar texts. Hypernyms and noun phrases also contribute to the identification of text similarity,--though they are not better than synonyms. However, precision is a problem for the semantic similarity methods because too many irrelevant texts are retrieved.

Item Type	Report
Date Deposited	15 May 2025 15:58
Last Modified	25 Feb 2026 00:13

Explore Further

picture_as_pdf: S88.pdf

View

Download

EndNote

BibTeX

Reference Manager

Refer

Atom

Dublin Core

RIOXX2 XML

OpenURL ContextObject in Span

MODS

METS

Data Cite XML

MPEG-21 DIDL

OpenURL ContextObject

HTML Citation

ASCII Citation

Export

Downloads