Evaluating Text Comparison Mechanisms For Authorship Authenticity

A number of resources are available on the internet, people are habitual of using them without proper citation of the original source or without giving credit to the author.The content directly taken from pre-published sources is called plagiarized text. Text Comparison is needed to check unauthorized/ illegal usage of views, ideas & publications.So there is a need to find a suitable technique to find similarity between two documents.There are many text matching mechanisms such as Levenshtein’s Edit Distance, Cosine Similarity measure, Jaccard Similarity Coefficient, N gram, Hamming Distance, Scam Algorithm, Finger Printing, Substring Matching etc. But all these techniques have disadvantages like:

  • Some techniques work on syntax whereas some of them are semantically sensitive.
  • Many techniques fail due to the lack of computational resources and processing takes a large amount of time.

The chosen techniques are primitive, moderate and advanced respectively. So, the aim here is to enhance the aforesaid algorithms in terms of similarity index, time and to provide graphical comparison reports.

