An efficient line segmentation approach for handwritten Bangla document image

Text line segmentation plays a vital role in the overall performance of a document recognition system. In the literature, similar segmentation works for offline handwritten Bangla documents are rarely found. On the other hand, certain peculiarities of handwritten Bangla script such as widespread occurrences of ascenders and descenders or some of its characters appearing only as an ascender or descender often cause unique difficulties to this segmentation task. Existence of connected components over a number of successive text lines is a common phenomenon in unconstrained handwritten Bangla documents. In this article, we propose a novel and efficient approach for text line segmentation where initially, we smudge the input document image, to blur-out white spaces between words, while preserving gaps between consecutive lines.

Next, we obtain an initial segmentation scheme by shredding the image based on the white most pixels in between consecutive smudged lines. Multi-line connected components have been taken care of by thinning, and then finding the most probable point of separation in the component. Combining it with the initial segmentation, we obtain the final output. The proposed approach has been evaluated on ICDAR 2013 Handwriting Segmentation Contest dataset of Bangla. The segmentation results show the efficiency of the proposed approach.

Share This Post