title: Coderivative Document Recognition
author: Stijn Johannes de Reede
published in: 2008
appeared as: Master of Science thesis
Man-machine interaction group
Delft University of Technology
PDF (14 MB)


Knowledge management in large enterprises currently depends very much on the active participation of the employees. If an employee is unable or unwilling to share his knowledge, the knowledge management fails. To counter this situation, the documents of all employees can be collected automatically and stored in a knowledge management application.
However, since every document is subject to change during its lifecycle, many versions of a document will be created, and thus collected and stored. A query put to a knowledge management application would produce a result list that is polluted by these many versions of all documents, which reduces the accessibility and usability of the documents in the knowledge management application. We provide a solution to this problem by introducing the Cayman system. This software system consists of the implementation of a new algorithm that is able to recognize different versions of documents, or coderivative documents. Additionally, it consists of a prototype application that collects documents, and provides a way to fully integrate our algorithm with an existing well-known knowledge management application.
We compare our algorithm with six other well known algorithms using a real life dataset. The algorithms are evaluated with several useful graphical methods, and, most importantly, with one quantitative method. This enables us to make a solid comparison of their performance. Our experiment shows that the newly introduced algorithm surpasses every other algorithm, except for one. Surprisingly, this is the most simple baseline algorithm, which all algorithms should outperform.
Despite the fact that our algorithm's performance is not yet optimal, we can say that the coderivative document recognition performs well enough to be of practical use. The Cayman system is put to a practical test in a professional environment, and succeeds in collecting documents, recognizing coderivatives, and making them accessible and reusable for employees.

