Automatic classification of packaging cartons according to their contents is an industrial need. In this paper we present an Optical Character Recognition (OCR) system to segment and recognize the sparse dot matrix text printed on the cartons in order to classify them based on the contents. Proposed solution is robust to non-uniformities in background illumination, shadow artifacts, inclined text, degraded text due to missing dots etc. We propose efficient segmentation technique using simple morphological operations which makes use of the discrete nature of the dot matrix text in distinguishing it from other information.
The dot matrix characters can be uniquely characterized by analyzing the pattern of dots. We retrieve this pattern, and feed it as feature vector to the trained Support Vector Machine (SVM) classifier. The combination of the unique patterns and SVM classifier results into high character recognition accuracy, in turn leading to efficient carton classification. Finally, we discuss the result statistics of character recognition and carton classification.