Show simple item record

dc.contributor.authorMalcolm, J.
dc.contributor.authorLane, P.C.R.
dc.date.accessioned2008-11-04T13:02:52Z
dc.date.available2008-11-04T13:02:52Z
dc.date.issued2008
dc.identifier.citationMalcolm , J & Lane , P C R 2008 , ' Efficient Search for Plagiarism on the Web ' , Proceedings (International Conference on Technology, Communication and Education) , vol. 2008 , pp. 206-211 .
dc.identifier.issn1997-7697
dc.identifier.otherPURE: 92712
dc.identifier.otherPURE UUID: 9e858f47-bf0a-4510-825a-88d25c524ed9
dc.identifier.otherdspace: 2299/2549
dc.identifier.urihttp://hdl.handle.net/2299/2549
dc.descriptionhttp://www.i-tce.org/
dc.description.abstractUnderstanding the characteristics of written English allows Internet search for the source of a document to be carried out efficiently. There is a Zipfian distribution of word frequencies in natural language, with some words common and many words rare. If we take a group of three words, the rarity of most of these triples is extreme. This can be exploited to detect web pages similar to a given target document: while a Google search for some triples from the target may return many hits, other triples will only be found in a few documents on the Internet. These documents may well be similar to the target, and are certainly worth examining more closely. Initial experiments show that this approach is very promising, and it is being implemented in a software tool called WebFerret.en
dc.language.isoeng
dc.relation.ispartofProceedings (International Conference on Technology, Communication and Education)
dc.subjectplagiarism
dc.subjectsearch engines
dc.subjectferret
dc.subjectnatural language processing
dc.titleEfficient Search for Plagiarism on the Weben
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionScience & Technology Research Institute
dc.description.statusPeer reviewed
rioxxterms.typeJournal Article/Review
herts.preservation.rarelyaccessedtrue


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record