Developing Tess2Speech Trainer - Tess2Speech: An Intelligent Character Recognition ...

(g) Characters that have similar capital and small letters such as handwrit- ten ’A’, ’C’, ’J’, ’K’, ’M’, ’N’, ’O’, ’P’, ’S’, ’U’, ’V’, ’W’, ’X’, ’Y’, and

’Z’.

The workaround for this is to use the set unicharset properties.exe and create wordlist-dawg, bigram-dawg, and freq-dawg when you are training manually. Another workaround is to increase the frequencies of these frequent mis-classified characters in the training images.

4. Although the average character and word accuracy that I got is 99.18% and

97.58% respectively, there is still no conclusion that the trained data really

achieved those accuracy FOR ALL handwritings since the handwritings used

for the training images are only 6. The solution for this is to create an auto-

mated Tesseract trainer in which the users can easily train and personalize

Tesseract for their own handwritings. With this, anyone can train Tesseract

and if proper training is done for the user’s handwriting, this will fulfill the

objective to reach at least 80% accuracy for all handwritings.

The training results using this application will be the same as the manual training method. The only disadvantage of using this application is that the user can only edit the box files. I provided the default values for the dawg files, config file, and other optional files since it will be too complicated for the user to do.

The default values is based on my initial experimental trained data that yield the best results.

B..2 Problems Encountered in Developing Tess2Speech Trainer

1. Tesseract cannot run the training tools on the phone. As a workaround, I made Tess2Speech Trainer as a desktop application that is bundled with Tess2Speech mobile application.

2. There are box editors that are already existing. The problem is that I need to integrate it to my Tess2Speech Trainer application. In order to do this, I used an open-source box editor by vietOCR which is the jTessBoxEditor [32]. Since it is open-source, I managed to edit its source code and integrate it into my application.

C. Tess2Speech Mobile Application

C..1 Tess2Speech Results

Since Tess2Speech uses Tesseract as its OCR Engine, I can use the .trained-

data files that are produced from Tesseract’s training in desktop computers using

Tess2Speech Trainer. Because of this, I managed to create an Android Applica-

tion that fulfills the objectives I specified. The user can upload an image that

contains the handwriting (can also be computer printed texts) or write the hand-

written text itself on the mobile’s screen through a canvas, and convert it to text

or speech. The user can also personalize Tess2Speech by training their own hand-

writing using Tess2Speech Trainer and used the output .traineddata files to their

application.

The user can also download other .traineddata files from Tesseract’s home page, through updates, or by downloading from a host site.

The user can also save the converted text into a .txt or .pdf file. The user can also save the converted speech into a .wav file. Since there are a lot of other functionalities that can be done, I tried maximizing the capability of Tess2Speech by adding more functionalities such as PDF-to-Image and vice-versa, Ebook-to- Speech, and PDF-to-Speech.

C..2 Problems Encountered in Tess2Speech Application

1. Tesseract-OCR [5] is programmed in C++ which means it is not compatible with the programming language that I will use which is Java. As a solution, I used tess-two [35] which contains tools for compiling Tesseract and Leptonica for use on the Android Platform. It provides a Java API for accessing natively-compiled Tesseract and Leptonica APIs.

2. Text-to-Speech systems on Android only supports up to 4000 Characters at a time. As a solution, if a paragraph is more than 4000 characters long, I chopped the paragraph into strings of 4000 length and saved it into an ArrayList. I then used this ArrayList as input since Text-to-Speech system supports an ArrayList of strings as input. As a consequence, when saving a paragraph with greater than 4000 characters into a .wav file, the number of .wav files that will be created will be the same as the number of elements in the ArrayList.

3. Tesseract does not support PDF as an input and Android cannot display PDF files for API below 21. As a solution, I used PDFViewer library by Joan Zapata [38] in order to view PDFs for APIs below 21. I also examined the library and found a workaround for using PDF as a Tesseract input. The workaround is that I converted PDF to images first by using VuDroid [39], which is included in the PDFViewer library, before feeding it to Tesseract.

4. Since Android stylus functionalities like those in Galaxy Notes are propri-

etary, I need to create my own canvas for supporting handwritings that are written in the mobile’s screen. As a workaround, I managed to create a can- vas by imitating how MS Paint works, although writing is not that smooth and versatile unlike in Galaxy Note. The performance of the canvas is also dependent on the hardware (screen capability).

5. Tess2Speech is very hardware dependent. The quality of the image depends on the phone’s camera (unless it was scanned and then sent to phone). Since images taken from the low quality camera is usually noisy, it is very hard to get an image that Tess2Speech can read. As a solution, I implemented the capability on Tess2Spech to perform image pre-processing to an image before converting the image to text or speech.

Figure 41: Image Pre-processing

(a) Raw Image

(b) Image processed only by Tesseract’s Otsu Binarization

(c) Image processed by GrayScaling, Brightness and Contrast ad-

justment, and Tesseract’s Otsu Binarization

6. If the image is too large, processing time can take a very long time. Large images with large texts can also confuse Tesseract since it will treat the letters as a background and it will try look for texts inside the letters. As a solution, I implemented the capability to rescale the image into a smaller dimension before converting the image to text or speech.

7. Since De-skewing is hard to implement, I implemented an alternative which allows the users to manually rotate an image. I also implemented the crop functionality to allow the users to remove unnecessary parts of the image that can produce garbage texts.

8. The trained data for handwritten texts yields low accuracy when used to

recognize computer typed texts. As a solution, since Tesseract can use mul-

tiple trained data at the same time, the user can choose any combination

of .traineddata files including ’eng.traineddata’ in the Tesseract Language

settings of Tess2Speech.

Dalam dokumen Tess2Speech: An Intelligent Character Recognition ... - CAS DSpace (Halaman 95-100)