Show simple item record

dc.contributor.authorMalcolm, J.
dc.contributor.authorLane, P.C.R.
dc.contributor.editorStein, Benno
dc.contributor.editorRosso, Paolo
dc.date.accessioned2009-09-30T13:30:39Z
dc.date.available2009-09-30T13:30:39Z
dc.date.issued2009
dc.identifier.citationMalcolm , J & Lane , P C R 2009 , Tackling the PAN’09 External Plagiarism Detection Corpus with a Desktop Plaigiarism Detector . in B Stein & P Rosso (eds) , Procs of the SEPLN'09 Workshop on Uncovering Plagiarsim, Authorship and Social Software Misuse . pp. 29-33 .
dc.identifier.otherdspace: 2299/3911
dc.identifier.urihttp://hdl.handle.net/2299/3911
dc.description.abstractFerret is a fast and effective tool for detecting similarities in a group of files. Applying it to the PAN’09 corpus required modifications to meet the requirements of the competition, mainly to deal with the very large number of files, the large size of some of them, and to automate some of the decisions that would normally be made by a human operator. Ferret was able to detect numerous files in the development corpus that contain substantial similarities not marked as plagiarism, but it also identified quite a lot of pairs where random similarities masked actual plagiarism. An improved metric is therefore indicated if the “plagiarised” or “not plagiarised” decision is to be automated.en
dc.format.extent182304
dc.language.isoeng
dc.relation.ispartofProcs of the SEPLN'09 Workshop on Uncovering Plagiarsim, Authorship and Social Software Misuse
dc.subjectplagiarism
dc.subjectferret
dc.subjecttext analysis
dc.subjecttrigrams
dc.titleTackling the PAN’09 External Plagiarism Detection Corpus with a Desktop Plaigiarism Detectoren
dc.contributor.institutionSchool of Computer Science
dc.contributor.institutionScience & Technology Research Institute
rioxxterms.typeOther
herts.preservation.rarelyaccessedtrue


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record