Text this: A holistic approach to duplicate publication and plagiarism detection using probabilistic ontologies