An improved semantic plagiarism detection scheme based on chi-squared automatic interaction detection

This paper introduces an improved semantic text plagiarism detection technique based on Chi-squared Automatic Interaction Detection (CHAID). The proposed technique analyses and compares text based on semantic allocation for each term inside the sentence. It also captures the underlying semantic mean...

Full description

Saved in:
Bibliographic Details
Main Authors: Osman, Ahmed Hamza, Salim, Naomie
Format: Conference or Workshop Item
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/50888/
http://eprints.utm.my/50888/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper introduces an improved semantic text plagiarism detection technique based on Chi-squared Automatic Interaction Detection (CHAID). The proposed technique analyses and compares text based on semantic allocation for each term inside the sentence. It also captures the underlying semantic meaning in terms of the relationships between its concepts via Semantic Role Labeling (SRL). SRL offers significant advantages when generating arguments for each sentence semantically. Voting for each argument generated by the CHAID technique, in order to select important arguments, is a main feature of the proposed method. Only the most important arguments selected by the CHAID method were used in the similarity calculation process. Testing was done using the CS11 and PAN-PC-10 datasets. The results show that this proposed method improves the SRL plagiarism detection method as well as exhibiting improved performance in terms of Recall, Precision and F-Measure when compared to current methods of plagiarism detection.