Establishment of equivalence between two degraded document images

With the advent of Information Technology, large volume of hardcopy documents are being scanned and stored as document images. Due to the age of source document, quality of the ink and recurring photo copies of same source, the document images generated are degraded in quality. Degraded document images obtained from different sources are stored in different places depending on the requirement of images for various purposes. This leads to storage of multiple copies of same document image with variations in degradation.

Establishment of equivalence between two degraded document images using Optical Character Recognizer (OCR) is not possible as OCR fails to recognize the characters in degraded conditions. In this paper, a novel approach has been proposed to establish equivalence between two degraded document images based on layout and content structure. Through projection profile, the number of components and the occurrence of components in the document images are compared correspondingly to establish layout equivalence. The components at paragraph level are compared based on foreground density and entropy quantifier to establish content structure equivalence. The efficacy of the proposed model is tested over variety of degraded document images.

