Show simple item record

dc.contributor.authorBao, J.
dc.contributor.authorLyon, C.
dc.contributor.authorLane, P.C.R.
dc.contributor.authorJi, W.
dc.contributor.authorMalcolm, J.
dc.date.accessioned2008-03-06T17:31:53Z
dc.date.available2008-03-06T17:31:53Z
dc.date.issued2007
dc.identifier.citationBao , J , Lyon , C , Lane , P C R , Ji , W & Malcolm , J 2007 , Comparing Different Text Similarity Methods . UH Computer Science Technical Report , vol. 461 , University of Hertfordshire .
dc.identifier.otherdspace: 2299/1772
dc.identifier.urihttp://hdl.handle.net/2299/1772
dc.description.abstractThis paper reports experiments on a corpus of news articles from the Financial Times, comparing different text similarity models. First the Ferret system using a method based solely on lexical similarities is used, then methods based on semantic similarities are investigated. Different feature string selection criteria are used, for instance with and without synonyms obtained from WordNet, or with noun phrases extracted for comparison. The results indicate that synonyms rather than lexical strings are important for finding similar texts. Hypernyms and noun phrases also contribute to the identification of text similarity,--though they are not better than synonyms. However, precision is a problem for the semantic similarity methods because too many irrelevant texts are retrieved.en
dc.format.extent427769
dc.language.isoeng
dc.publisherUniversity of Hertfordshire
dc.relation.ispartofseriesUH Computer Science Technical Report
dc.titleComparing Different Text Similarity Methodsen
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionScience & Technology Research Institute
rioxxterms.typeOther
herts.preservation.rarelyaccessedtrue


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record