Show simple item record

dc.contributor.authorBao, J.
dc.contributor.authorLyon, C.
dc.contributor.authorLane, P.C.R.
dc.contributor.authorJi, W.
dc.contributor.authorMalcolm, J.
dc.date.accessioned2008-03-06T17:31:55Z
dc.date.available2008-03-06T17:31:55Z
dc.date.issued2006
dc.identifier.citationBao , J , Lyon , C , Lane , P C R , Ji , W & Malcolm , J 2006 , Copy detection in Chinese documents using the Ferret: a report on experiments . University of Hertfordshire .
dc.identifier.otherdspace: 2299/1773
dc.identifier.urihttp://hdl.handle.net/2299/1773
dc.description.abstractThe Ferret copy detector has been used for some years on English texts to find plagiarism in large collections of students coursework. This article--reports on extending its application to Chinese, which differs from English in many respects: the sequence of characters that make up a Chinese text do not have--word boundaries marked, there is a vast Chinese alphabet , or number of different characters, and they are represented with multi-byte encoding. We discuss issues of representation, focus on the effectiveness of a sub-symbolic approach, and show how the Ferret can circumvent the classic problem of finding word boundaries with an automated system. Corpora of students coursework from two Chinese universities have been collected, and we apply Ferret to investigate the detection of plagiarism. Our experiments show that Ferret can find both artificially constructed plagiarism as well as actually occurring, previously undetected plagiarism. We also investigate--the parameters of the system, and report on typical optimum settings. Experiments reported in this article show that Ferret can work well on Chinese texts, and achieve a consistent performance. The investigation into the representation of written Chinese is likely to be of use in other language processing tasks.en
dc.format.extent201302
dc.language.isoeng
dc.publisherUniversity of Hertfordshire
dc.titleCopy detection in Chinese documents using the Ferret: a report on experimentsen
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionScience & Technology Research Institute
dc.contributor.institutionSchool of Engineering and Technology
rioxxterms.typeOther
herts.preservation.rarelyaccessedtrue


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record