Copyright © 2013 IJEIR, All right reserved
Line Segmentation of Javanese Image of Manuscripts in
Javanese Scripts
Anastasia Rita Widiarti
Email: rita_widiarti@usd.ac.idAgus Harjoko
Email: aharjoko@ugm.ac.idMarsono
Email: marsono@ugm.ac.idSri Hartati
Email: shartati@ugm.ac.idAbstract–Segmentation is an important stage in automatic transliteration process of a manuscript image. One of the segmentation approaches generally used to get an image of the scripts of a scrip image is performing line segmentation and then performing segmentation of script images on the result of the line segmentation.
Line segmentation of image of manuscripts in Javanese scripts is often difficult because there are lines of images
which shouldn’t be on the same line, but are in one line area
of the image, and there are even images in different but overlapping lines. This paper offers an idea to use moving average to smooth the curve of vertical projection result of image of manuscripts in Javanese scripts as an initial guide of line separation. Next, to separate parts of scripts in different lines but in the same line or in overlapping lines, average height data and standard deviation of average height of every objects are use to get information on estimation of the height of line image. Image connectivity concept in an object is also used to separate script images in different but overlapping lines.
From the test result of 4 images of manuscripts in Javanese scripts from different writers with different writing styles, average percentage of correctness of line segmentation obtained was 93.19% with standard deviation of 4.55%. Mistakes of line segmentation were mostly caused by sandanganof a script in the previous line which joined the next line, and cutting imperfect scripts. However, from the percentage of correctness of line segmentation, it could be concluded that the combination of various image processing techniques used in line segmentation was relatively good.
Keywords – Connected Component, Javanese Manuscript Segmentation, Moving Average, Projection Profile.
I. I
NTRODUCTIONSegmentation of script image has a very significant and major role in the efforts to introduce a document image. The success of a segmentation process will influence the next processes, especially script introduction process. Moreover if the manuscript which will be recognized is a hand-written manuscript. This is a challenge, because manuscripts are written by many different writers, and the writing style may have differentgagrak.
Segmentation of script image can be done by first performing line segmentation of manuscript image, and then after obtaining lines of manuscript image, it’s
continued by segmentation of script images from corresponding line of manuscript image. However, line segmentation of manuscript image in often difficult because there are fluctuation lines, overlapping different components, and irregularity in geometry of the lines, such as the height and width of the lines [1].
II. R
ELEVANTW
ORKSC segmentation of historic image hand leaves. It start with binarization stage wit separate object and background, then co segmentation stage by applying profile p and finally segmentation stage to g histogram of segmented images. They percentage of correctness of scripts of 8 [8] on horizontal segmentation for line printed Indian scripts successfully get segmentation of 96.45% to 99.79%.
Widiarti [9] has studied the use of pro get images of Javanese scripts from printe Javanese scripts, with success rate of 86.78%. Profile projection in this case is be used because the characteristic of pr in Javanese scripts is having clear a distance between lines and scripts. In th proposed new strategies to line se Javanese manuscript. We use of vertical image segmentation on manuscripts in combined with popular technique to smo was moving average method, and com statistical information from data on the h an image and using pixel connectivity co object.
III. S
TUDY OF THEC
HARACTERIR
ULES OFW
RITINGJ
AVANEJavanese scripts consist of two ma which are basic Javanese scripts and de scripts. Basic Javanese scripts are main
which haven’t beenadded with various p or sandangan, therefore these main Jav called legena orwuda scripts which m Fig. 1 shows 20legenaJavanese scripts.
Fig.1.NglegenaJavanese scri
Generally, most Javanese scripts used
of legena scripts, bust use various add There are many kinds ofsandangan, i.e. for i consonant called wulu and is corresponding legena script. Or sanda
consonant called suku which is writte script.
From a study of placement area of Jav punctuation marks orsandangan,an inf to place Javanese scripts is obtained, as s
Fig.2. Writing area of Javanes
Copyright © 2013 IJEIR, All right reserved ndwritten on palm
with Otsu method to continued with line ile projection method, get scripts using y get an average of f 82.5%. Jindal et al. lines of overlapping et accuracy lever of profile projection to rinted manuscripts in of segmentation of e is very possible to printed manuscripts and even uniform this paper, we have segmentation from al projection for line in Javanese scripts mooth curves, which ombined again with e height of objects in concept on the same
ERISTICS AND ANESE
S
CRIPTmajor scrip groups, derivative Javanese ain Javanese scripts s punctuation marks Javanese scripts are means bare scripts. ts.
cripts [9]
ed don’t only consist
dditions sandangan. i.e.sandangan swara
is written above
dangan swara of u tten beneath legena
Javanese scripts and information on spots
s shown in Fig. 2.
ese script
Area 1 is upper area or upper
sandangan swara i.e. wulu an
panyigegi.e.layarandcecak. A
zone. It’s called main zone be
scripts of Javanese scripts which is lower are or lower zone. Low lower part of taling, cakra ma keret, as well as pasangan pla examples of Javanese scripts and
Fig.3. Writing zones o Fig.4 shows the practice of Javanese scripts are written fr Javanese script can take the form 1) Legenascript only, i.e.sas 2) Legena script with sandan
zone, i.esescript in Fig. 4(b 3) Legena script with sandan
zone, i.e.suscript in Fig. 4( 4) Legena script with sanda
zones at once, i.e.surscript
Fig.4. Samples of script form placement of the From results of a study on m scripts on manuscripts, pro segmentation of manuscripts discovered as shown in Fig. 5, b 1) There are scripts on differen
5(a). Shows that part of scr touch part of script image line is the sign for line chan 2) Distance between scriptsis
to only use vertical projec part of script image in the position as script image in t
Fig.5. Samples of problems manuscripts in Jav
IV. S
OLUTIONManuscript segmentation be scripts which formed the manu which form script lines. Once free of noise, then line segmenta
er zone. In this zone there are and pepet, and sandangan
. Area 2 is main area or main
because this is where basic
ich arelegenascripts. Area 3 ower zone is a place forsuku,
mandaswara script, cakra
placed below. Fig. 3. shows and places to put the scripts.
s of Javanese scripts
of Javanese script writing, from left to right, and one orm of:
ascript in Fig. 4(a).
dangan placed on the upper 4(b).
dangan placed on the lower . 4(c).
dangan in upper and lower ipt in Fig. 4(d).
rms with combination of the the additions.
methods of writing Javanese problems related of line ts in Javanese scripts are , because:
rent lines which overlap. Fig. script image in the upper line e in the second line. Dashed ange.
isn’t clear, so it’s impossible
jection. Fig. 5(b). shows that the upper line is in the same
n the next line.
s in line segmentation of Javanese scripts.
N
P
ROPOSALbegan with finding lines of nuscript, then finding scripts ce a binary script image was
C General working method to ge segmentation from manuscripts in Java the limitation of characteristics of Javane method is shown in Fig. 6.
To perform crude line segmentation, was vertical projection. Because there lines which overlap, the result of vertical be refined to get a curve which reflected between phases. Phases in the curve gav the beginning and end of a line, because line of manuscript image.
Fig.6. Framework of line segmenta Moving average is a concept often us
Equation (1) was adopted in this study to Due to the characteristics of manusc scripts which often had no clear dista lines, the result of line image cutti projection which had been refined still c For every script which was really in the the script was clearly in its line, but ofte of scripts in the line below them. connectivity concept between the pixels that parts of the scripts could be discover
Abnormality appeared when the heig script, or hereinafter very big object, wh average height of standard scripts plu deviation value of average height. If abnormality in the object height, the ob different objects from different lin connected, as shown in Fig. 5(a). In prin might be part of scripts in different line cut or separated.
To calculate the height of every ob make a script, the distance between the h
Copyright © 2013 IJEIR, All right reserved get script image
avanese scripts with anese scripts writing n, the tool used first re were manuscript tical projection should ted clarity of distance ave location clue of se 1 phase showed 1
ntation stages to refine curves. uscripts in Javanese istance between the tting with vertical till cut several scripts. the line range found, ften there were parts m. To solve this, xels was applied so vered.
eight of a part of a which was over the plus twice standard If there was any object must be two lines which were principle, the object lines, so it had to be object which might e highest position to
the lowest position of every ob the height of every object was height and standard deviation manuscript could be calculated.
The main principle in cutting different line positions but were the scripts right at the position average height plus twice the average height calculated from This convention was used be scripts have legena scripts an below them, as seen in Fig. 4( was seen as average height of height of sandangan was con standard deviation of average he
To solve the problem of hav next line which was usuallysan
for a script right below it whic position, the first was is looking thesandangan. If the highest po above average object height, the in the upper zone of a scrip convention was based on the b script in the lower zonedidn’t
average height of the main sc
couldn’tbe sandangan in the lo the line.
V. R
ESULT ANDD
The study started by selectin become testing data, with a con should be written by different writing styles from each person. of data source and information o in this study, and Fig.7. shows script in number 3 in Table 1.
Table 1: descriptio No. Catalogue Number / Bo
Information
1 W74 Pakem Ringgit Pur lampahan) PB C 6 Javanese Language Jav Script Roll 40 no. 6 [12] 2 S160 Serat Babad Jum Sultan Kabanaran SK 1036 Javanese Lan Javanese Script Rol 111 [12] it was discovered that there was imaged, although in several line
object was calculated. After as discovered, average object tion of average height in the
d.
ting two different scripts and in ere connected was by cutting ition which is the distance of he standard deviation of the m the beginning of the line. because generally Javanese and sandangan above and 4(d). Average object height ofLegena scripts, while the onsidered absolute value of height.
aving parts of a script in the
sandanganin the upper zone which was in the present line ing for the highest position of position found had a distance the object must besandangan
ript in the next line. This e belief that sandangan of a
’t have a distance above the script. Therefore, the object e lower zone of the script in
AND
D
ISCUSSIONSting manuscript which would ontention that the manuscript nt people and with different on. Table 1 shows description n of data chosen for test data s sample image of a page of .
tion of test data
C scripts which overlap. By using the in characteristics, the first step was verticall of the image of the manuscript with the data of the result of vertical project, a co 8 was made.
The peaks of the 16 curves in information on the number of line corresponding data, but this curve also
boundaries between lines weren’t clear
variation in the slopes of the curves. The distance between lines created a p segmentation, because it made dete position difficult. One of the methods to between lines was by refining curv average. The main principle of moving a curve refining is by remapping data of c values around the value which would using this, it was expected that varying d became more uniform or like other data a
Fig. 7. A section of an image of a manus scripts, collection of Sonobudoyo museu number PB A.57 175 Javanese Languag
Macapat Rol 153 no. 9 [1
Fig.8. Curve of the result of vertica Fig.9. shows a curve which was refine it means that the present value is the tota the right and left of the spot and the sp refining was performed 6 (six) times.
Copyright © 2013 IJEIR, All right reserved information on the
tically projecting data the scripts. From the cover as seen in Fig. in Fig. 8 produced line images in the lso showed that the
ar due to high value
he vagueness of the problem in line determining cutting s to get a clear gap urves with moving g average method in f curve values using ld be remapped. By g data at close range ta around it.
nuscript in Javanese seum with catalogue age Javanese Script
[12]
tical projection ined with slot size 5, total of two values at spot itself, and the
Fig.9. Curve fro
The slope of the result of the
refer to the number 0 (zero) wh gap between the lines, becaus were parts of scripts on the upp same range, or overlap, or conn was determining which line ha and this line would be the re lines. In this study, the metho shift is by marking the line in w the result of subtraction of initia which had been refined from n versa. The result of the test on obtained became the initial c changes in the numbers.
Besides that, from the study o height of every object in ma information of average object standard deviation were determ summary of study result on ave standard deviation on 4 (four) Javanese scripts used.
Table 2: Statistical data of ob No
data
Total Object
Average Object Height
1 204 62.505
2 308 36.325
3 442 44.267
4 148 28.905
Fig.10. A section of manuscri
from refining
he curve refining often didn’t
which meant that there was a use in the manuscript there pper line or lower line in the nnect. One way to solve this had significant data shifting, reference for separating the thod to determining the line n which there was shifting of initial value with new value negative to positive or vice t on line shifting information l clue that there were line y on the characteristics of the manuscript image, statistical ct height connected and its termined. Table 2 shows the average object height and its r) images of manuscripts in object height on test image
ge ct ht
Standard Deviation of Average Object
Height
5 43.573
5 24.402
7 20.684
5 19.206
C Fig.10 shows a section of scrip image 11 shows a sample of segmentation re lines from manuscript image. In Fig. 10 lines to show the limit of each line. In th seen that starting from the image of the s on, there are parts of script images in dif the same area or overlap.
Fig.11.1. to Fig.11.9. show segmentatio models offered, besides Fig. 11.2a. Fig segmentation result in line 1 is very cle the image was segmented well without other lines. This was because there was original image between the first line an Line segmentation in the next lines, i.e shows data overlapping. Fig. 11.2a. w show that there are parts of the scripts which enter the area of the second information on average object height standard deviation, line separation or done.
Fig.11. The result of segmentation of the Using visual observation on each se number of scripts which should be in a lin line could be calculated. Table 3 shows i number of lines in the original image result which had been performed as seen
Using simple method of calculatin correctness, percentage of correctness of for manuscript image in Fig.11 was disco
159 151 100 script actual Total
script output Total s correctnes f
%o x x
The same experiment was applied for t of manuscripts in Javanese scripts. After
Copyright © 2013 IJEIR, All right reserved ge in Fig. 7 and Fig.
result of images of 10 there are dashed the figure it can be e second line and so different lines but in tation result from the Fig.11.1. shows that clean, it means that out connection with as a clear line in the and the second line. i.e. the second line . with dashed lines ipts in the third line nd line. By using ight as well as its or cutting could be
the image in Fig. 10 segmented line, the a line and still in the s information of the e and segmentation en in Fig. 11.
ting percentage of of line segmentation iscovered as follows:
97 . 94 100
x
(2) or three other images After the result of line
image segmentation on the four testing material was obtained, p line image segmentation on calculated. Table 4 shows the s result of line image segmenta manuscripts used as test data.
The calculation of percentag different manuscript images us average percentage of correctne deviation of 4.55% as seen in average value of percentage of than 90%, it was concluded tha line segmentation on images o scripts in this study was good.
Table 3: Observation data on scripts and output of segmen Line
No Number of Script Original Output
1 17 17
2 20 20
3 16 16
4 18 18
5 18 18
6 16 11
7 19 18
8 19 19
9 16 14
Total
scripts 159 151
Table 4: Summary of percenta segmentation of the four
No Data 1 2 3 4
Average percentage of correctness Average standard deviation
ur manuscript images used as , percentage of correctness of on all test data could be e summary of the data of the ntation determination in all .
tage of correctness on four used in this study results in tness of 93.19% with standard in Table 4. With very high of correctness, which is more that the model suggested for s of manuscripts in Javanese
.
on the number of original entation from Figure 11.
ipt Information
ut
All scripts which should be in certain
lines are in place
There were 3 sandangan which joined the precious
line, and 2 scripts which were cut
imperfectly There was 1 which joined the previous
line All scripts which should be in the line
are in place There were 2
sandanganwhich joined the previous
line
ntage of correctness of line ur manuscript images
Percentage of Correctness
87.36 98.10 92.31 94.97 f
93.19
C
VI. C
ONCLUSION ANDF
UTUREFrom the experimental test of seg offered, it could be concluded that line images of manuscripts in Javanese scrip using guides of vertical projection of the combined with moving average refin average object height value as well deviation. Segmentation result could overlapping images, but by using labe every connected object, images of lines in connected to it could be found. Howeve be continued to finish the problems cutting overlapping scripts, and whether line segmentation could be used segmentation later.
R
EFERENCES[1] A. Nicolaou, and B. Gatos. (2009). H Segmentation by Shredding Text in
International Conference on Docum Recognition, IEEE. pp. 626-630. A cvc.uab.es/ icdar2009/papers/3725a626.pd [2] S., Palakollu, R. Dhir, and R. Rani. (201 for Line Segmentation of Handwritten Hin
of International Journal of Computer Appl on Electronics, Information and Commun ICEICE. Available: http://research. number5/iceice033.pdf.
[3] G.S. Lehal, and C. Singh. (No Yea
Segmentation of Gurmukhi Text. Avai centrepunjabi.org/pdf/A%20technique%20 %20of%20gurmukhi%20text.pdf. [4] N. Tripathy and U. Pal. (2006, Dec.). Han
of unconstrained Oriya text. Sadhana. V Available: http://www.ias.ac.in/ sadhana/ P [5] C. Weliwitage, A.L. Harvey, and A.
Handwritten Document Offline Text
Proceedings of the the Digital Imaging C and Applications (DICTA
http://nguyendangbinh.org/Proceedings/DI weliwitage_textsegment.pdf.
[6] F. Yin, and C. Liu. (2009). Handwritt segmentation by clustering with distance m
Recognition. Volume(42), 3146-3157. web.ia.ac.cn/2009papers/gjkw/gk13.pdf. [7] O. Surinta, and R. Chamchong. (2008) I
Historical Handwriting from Palm Leaf M
http://www.wbi.msu.ac.th/file/721 /doc_57 [8] M.K. Jindal, R.K. Sharma, and G.S. Lehal
of Horizontally Overlapping Lines in P
International Journal of Computational
Volume (3), 277-286. Available: http://ww [9] A.R. Widiarti, “Segmentasi Citra Dokum
Modern Mempergunakan Profil Proyeksi,
dan Teknologi, vol. 10, 2007, pp. 167-176. [10] Anonim. (No Year). Available: htt
wiki/Aksara_JawaW.
[11] C.E. Efstathiou (No Year). Signal Sm
Available: http://www.chem.uoa.gr appl_smooth2.html.
[12] T.E. Behrend, Katalog Induk Naskah-nask Museum Sonobudaya Yogyakarta.Jakarta:
Copyright © 2013 IJEIR, All right reserved
URE
W
ORKsegmentation model line segmentation of cripts could be done he manuscript image fining method with ell as its standard ld contain several label information of s in Javanese scripts ver, this study could s of separating or her the result of the d well in script
Handwritten Text Line into its Lines. 10th cument Analysis and
Available: http://www. .pdf.
011). A New Technique Hindi Text.Special Issue Applications (0975–8887)
unication Engineering–
. ijcaonline.org/iceice/
ear). A Technique for
vailable: http://advanced 20for%20segmentation
andwriting segmentation . Volume(31), 755–769. a/ Pdf2006Dec/755.pdf.
A.B. Jennings. (2005). xt Line Segmentation.
g Computing: Techniques 2005). Available: s/DICTA/2005/data/27_c
ritten Chinese text line e metric learning.Pattern
. Available: http://nlpr-.
)Image Segmentation of f Manuscripts. Available: _57.pdf.
hal. (2007). Segmentation Printed Indian Scripts.
al Intelligence Research. www.ijcir.info.
umen Teks Sastra Jawa si,”SIGMA Jurnal Sains
76.
http://id.wikipedia .org/
Smoothing Algorithms. .gr/applets/appletsmooth/
naskah Nusantara Jilid I
ta: Djambatan, 1990.
A
UTHOR’
SP
ROFILEAnastasia Rit
received Master’s dGadjah Mada Univ 2000 she has been Informatics Engi University in Yo interests include Javanese manuscrip recognition.
Agus Harjoko
received the Ph.D. the of New Brunsw processing and com been teaching at t Yogyakarta.
Prof. Dr. Mar
is lecturer in Dep Faculty of Cultural in Indonesia.
Sri Hartati
received the Ph.D. the of New Brunsw Intelligence. Since Computer Science S Instrumentation St University in Yogya
ita Widiarti
s degree in Computer Science from
iversity, Yogyakarta in 2006. Since een teaching at the Department of gineering, at Sanata Dharma ogyakarta. Her current research ripts image analysis and pattern
oko
D. degree in computer science from nswick, Canada, in the field image omputer vision. Since 1987 he has t the Gadjah Mada University in
arsono
epartment of Nusantara Literature al Sciences Gadjah Mada University