Turkish OCR on mobile and scanned document images

Optical character recognition (OCR) systems have been widely used to convert documents into digital form. There are lots of both commercial and open source OCR systems available, but a benchmark on Turkish OCR is nonexistent. In this work, we first prepared two publicly available datasets for TurkishOCR, consisting of scanned document images and mobile camera captured document images.

Then, we evaluated the Turkish OCR performance of three popular open source OCR systems (Tesseract, CuneiForm, GOCR) on the datasets. Tesseract outperformed the other two on both datasets.

Share This Post

Get Updates

Related Posts

Ocrapose: An indoor positioning system using smartphone/tablet cameras and OCR-aided stereo feature matching

Matching Musical Themes based on noisy OCR and OMR input

OCR and RFID enabled vehicle identification and parking allocation system

To skip or not to skip? A dataset of spontaneous affective response of online advertising (SARA) for audience behavior analysis