• Tidak ada hasil yang ditemukan

Line segmentation of javanese image of manuscripts in Javanese scripts.

N/A
N/A
Protected

Academic year: 2017

Membagikan "Line segmentation of javanese image of manuscripts in Javanese scripts."

Copied!
6
0
0

Teks penuh

(1)

Copyright © 2013 IJEIR, All right reserved

Line Segmentation of Javanese Image of Manuscripts in

Javanese Scripts

Anastasia Rita Widiarti

Email: rita_widiarti@usd.ac.id

Agus Harjoko

Email: aharjoko@ugm.ac.id

Marsono

Email: marsono@ugm.ac.id

Sri Hartati

Email: shartati@ugm.ac.id

Abstract–Segmentation is an important stage in automatic transliteration process of a manuscript image. One of the segmentation approaches generally used to get an image of the scripts of a scrip image is performing line segmentation and then performing segmentation of script images on the result of the line segmentation.

Line segmentation of image of manuscripts in Javanese scripts is often difficult because there are lines of images

which shouldn’t be on the same line, but are in one line area

of the image, and there are even images in different but overlapping lines. This paper offers an idea to use moving average to smooth the curve of vertical projection result of image of manuscripts in Javanese scripts as an initial guide of line separation. Next, to separate parts of scripts in different lines but in the same line or in overlapping lines, average height data and standard deviation of average height of every objects are use to get information on estimation of the height of line image. Image connectivity concept in an object is also used to separate script images in different but overlapping lines.

From the test result of 4 images of manuscripts in Javanese scripts from different writers with different writing styles, average percentage of correctness of line segmentation obtained was 93.19% with standard deviation of 4.55%. Mistakes of line segmentation were mostly caused by sandanganof a script in the previous line which joined the next line, and cutting imperfect scripts. However, from the percentage of correctness of line segmentation, it could be concluded that the combination of various image processing techniques used in line segmentation was relatively good.

Keywords Connected Component, Javanese Manuscript Segmentation, Moving Average, Projection Profile.

I. I

NTRODUCTION

Segmentation of script image has a very significant and major role in the efforts to introduce a document image. The success of a segmentation process will influence the next processes, especially script introduction process. Moreover if the manuscript which will be recognized is a hand-written manuscript. This is a challenge, because manuscripts are written by many different writers, and the writing style may have differentgagrak.

Segmentation of script image can be done by first performing line segmentation of manuscript image, and then after obtaining lines of manuscript image, it’s

continued by segmentation of script images from corresponding line of manuscript image. However, line segmentation of manuscript image in often difficult because there are fluctuation lines, overlapping different components, and irregularity in geometry of the lines, such as the height and width of the lines [1].

II. R

ELEVANT

W

ORKS

(2)

C segmentation of historic image hand leaves. It start with binarization stage wit separate object and background, then co segmentation stage by applying profile p and finally segmentation stage to g histogram of segmented images. They percentage of correctness of scripts of 8 [8] on horizontal segmentation for line printed Indian scripts successfully get segmentation of 96.45% to 99.79%.

Widiarti [9] has studied the use of pro get images of Javanese scripts from printe Javanese scripts, with success rate of 86.78%. Profile projection in this case is be used because the characteristic of pr in Javanese scripts is having clear a distance between lines and scripts. In th proposed new strategies to line se Javanese manuscript. We use of vertical image segmentation on manuscripts in combined with popular technique to smo was moving average method, and com statistical information from data on the h an image and using pixel connectivity co object.

III. S

TUDY OF THE

C

HARACTERI

R

ULES OF

W

RITING

J

AVANE

Javanese scripts consist of two ma which are basic Javanese scripts and de scripts. Basic Javanese scripts are main

which haven’t beenadded with various p or sandangan, therefore these main Jav called legena orwuda scripts which m Fig. 1 shows 20legenaJavanese scripts.

Fig.1.NglegenaJavanese scri

Generally, most Javanese scripts used

of legena scripts, bust use various add There are many kinds ofsandangan, i.e. for i consonant called wulu and is corresponding legena script. Or sanda

consonant called suku which is writte script.

From a study of placement area of Jav punctuation marks orsandangan,an inf to place Javanese scripts is obtained, as s

Fig.2. Writing area of Javanes

Copyright © 2013 IJEIR, All right reserved ndwritten on palm

with Otsu method to continued with line ile projection method, get scripts using y get an average of f 82.5%. Jindal et al. lines of overlapping et accuracy lever of profile projection to rinted manuscripts in of segmentation of e is very possible to printed manuscripts and even uniform this paper, we have segmentation from al projection for line in Javanese scripts mooth curves, which ombined again with e height of objects in concept on the same

ERISTICS AND ANESE

S

CRIPT

major scrip groups, derivative Javanese ain Javanese scripts s punctuation marks Javanese scripts are means bare scripts. ts.

cripts [9]

ed don’t only consist

dditions sandangan. i.e.sandangan swara

is written above

dangan swara of u tten beneath legena

Javanese scripts and information on spots

s shown in Fig. 2.

ese script

Area 1 is upper area or upper

sandangan swara i.e. wulu an

panyigegi.e.layarandcecak. A

zone. It’s called main zone be

scripts of Javanese scripts which is lower are or lower zone. Low lower part of taling, cakra ma keret, as well as pasangan pla examples of Javanese scripts and

Fig.3. Writing zones o Fig.4 shows the practice of Javanese scripts are written fr Javanese script can take the form 1) Legenascript only, i.e.sas 2) Legena script with sandan

zone, i.esescript in Fig. 4(b 3) Legena script with sandan

zone, i.e.suscript in Fig. 4( 4) Legena script with sanda

zones at once, i.e.surscript

Fig.4. Samples of script form placement of the From results of a study on m scripts on manuscripts, pro segmentation of manuscripts discovered as shown in Fig. 5, b 1) There are scripts on differen

5(a). Shows that part of scr touch part of script image line is the sign for line chan 2) Distance between scriptsis

to only use vertical projec part of script image in the position as script image in t

Fig.5. Samples of problems manuscripts in Jav

IV. S

OLUTION

Manuscript segmentation be scripts which formed the manu which form script lines. Once free of noise, then line segmenta

er zone. In this zone there are and pepet, and sandangan

. Area 2 is main area or main

because this is where basic

ich arelegenascripts. Area 3 ower zone is a place forsuku,

mandaswara script, cakra

placed below. Fig. 3. shows and places to put the scripts.

s of Javanese scripts

of Javanese script writing, from left to right, and one orm of:

ascript in Fig. 4(a).

dangan placed on the upper 4(b).

dangan placed on the lower . 4(c).

dangan in upper and lower ipt in Fig. 4(d).

rms with combination of the the additions.

methods of writing Javanese problems related of line ts in Javanese scripts are , because:

rent lines which overlap. Fig. script image in the upper line e in the second line. Dashed ange.

isn’t clear, so it’s impossible

jection. Fig. 5(b). shows that the upper line is in the same

n the next line.

s in line segmentation of Javanese scripts.

N

P

ROPOSAL

began with finding lines of nuscript, then finding scripts ce a binary script image was

(3)

C General working method to ge segmentation from manuscripts in Java the limitation of characteristics of Javane method is shown in Fig. 6.

To perform crude line segmentation, was vertical projection. Because there lines which overlap, the result of vertical be refined to get a curve which reflected between phases. Phases in the curve gav the beginning and end of a line, because line of manuscript image.

Fig.6. Framework of line segmenta Moving average is a concept often us

Equation (1) was adopted in this study to Due to the characteristics of manusc scripts which often had no clear dista lines, the result of line image cutti projection which had been refined still c For every script which was really in the the script was clearly in its line, but ofte of scripts in the line below them. connectivity concept between the pixels that parts of the scripts could be discover

Abnormality appeared when the heig script, or hereinafter very big object, wh average height of standard scripts plu deviation value of average height. If abnormality in the object height, the ob different objects from different lin connected, as shown in Fig. 5(a). In prin might be part of scripts in different line cut or separated.

To calculate the height of every ob make a script, the distance between the h

Copyright © 2013 IJEIR, All right reserved get script image

avanese scripts with anese scripts writing n, the tool used first re were manuscript tical projection should ted clarity of distance ave location clue of se 1 phase showed 1

ntation stages to refine curves. uscripts in Javanese istance between the tting with vertical till cut several scripts. the line range found, ften there were parts m. To solve this, xels was applied so vered.

eight of a part of a which was over the plus twice standard If there was any object must be two lines which were principle, the object lines, so it had to be object which might e highest position to

the lowest position of every ob the height of every object was height and standard deviation manuscript could be calculated.

The main principle in cutting different line positions but were the scripts right at the position average height plus twice the average height calculated from This convention was used be scripts have legena scripts an below them, as seen in Fig. 4( was seen as average height of height of sandangan was con standard deviation of average he

To solve the problem of hav next line which was usuallysan

for a script right below it whic position, the first was is looking thesandangan. If the highest po above average object height, the in the upper zone of a scrip convention was based on the b script in the lower zonedidn’t

average height of the main sc

couldn’tbe sandangan in the lo the line.

V. R

ESULT AND

D

The study started by selectin become testing data, with a con should be written by different writing styles from each person. of data source and information o in this study, and Fig.7. shows script in number 3 in Table 1.

Table 1: descriptio No. Catalogue Number / Bo

Information

1 W74 Pakem Ringgit Pur lampahan) PB C 6 Javanese Language Jav Script Roll 40 no. 6 [12] 2 S160 Serat Babad Jum Sultan Kabanaran SK 1036 Javanese Lan Javanese Script Rol 111 [12] it was discovered that there was imaged, although in several line

 

 

 

object was calculated. After as discovered, average object tion of average height in the

d.

ting two different scripts and in ere connected was by cutting ition which is the distance of he standard deviation of the m the beginning of the line. because generally Javanese and sandangan above and 4(d). Average object height ofLegena scripts, while the onsidered absolute value of height.

aving parts of a script in the

sandanganin the upper zone which was in the present line ing for the highest position of position found had a distance the object must besandangan

ript in the next line. This e belief that sandangan of a

’t have a distance above the script. Therefore, the object e lower zone of the script in

AND

D

ISCUSSIONS

ting manuscript which would ontention that the manuscript nt people and with different on. Table 1 shows description n of data chosen for test data s sample image of a page of .

tion of test data

(4)

C scripts which overlap. By using the in characteristics, the first step was verticall of the image of the manuscript with the data of the result of vertical project, a co 8 was made.

The peaks of the 16 curves in information on the number of line corresponding data, but this curve also

boundaries between lines weren’t clear

variation in the slopes of the curves. The distance between lines created a p segmentation, because it made dete position difficult. One of the methods to between lines was by refining curv average. The main principle of moving a curve refining is by remapping data of c values around the value which would using this, it was expected that varying d became more uniform or like other data a

Fig. 7. A section of an image of a manus scripts, collection of Sonobudoyo museu number PB A.57 175 Javanese Languag

Macapat Rol 153 no. 9 [1

Fig.8. Curve of the result of vertica Fig.9. shows a curve which was refine it means that the present value is the tota the right and left of the spot and the sp refining was performed 6 (six) times.

Copyright © 2013 IJEIR, All right reserved information on the

tically projecting data the scripts. From the cover as seen in Fig. in Fig. 8 produced line images in the lso showed that the

ar due to high value

he vagueness of the problem in line determining cutting s to get a clear gap urves with moving g average method in f curve values using ld be remapped. By g data at close range ta around it.

nuscript in Javanese seum with catalogue age Javanese Script

[12]

tical projection ined with slot size 5, total of two values at spot itself, and the

Fig.9. Curve fro

The slope of the result of the

refer to the number 0 (zero) wh gap between the lines, becaus were parts of scripts on the upp same range, or overlap, or conn was determining which line ha and this line would be the re lines. In this study, the metho shift is by marking the line in w the result of subtraction of initia which had been refined from n versa. The result of the test on obtained became the initial c changes in the numbers.

Besides that, from the study o height of every object in ma information of average object standard deviation were determ summary of study result on ave standard deviation on 4 (four) Javanese scripts used.

Table 2: Statistical data of ob No

data

Total Object

Average Object Height

1 204 62.505

2 308 36.325

3 442 44.267

4 148 28.905

Fig.10. A section of manuscri

from refining

he curve refining often didn’t

which meant that there was a use in the manuscript there pper line or lower line in the nnect. One way to solve this had significant data shifting, reference for separating the thod to determining the line n which there was shifting of initial value with new value negative to positive or vice t on line shifting information l clue that there were line y on the characteristics of the manuscript image, statistical ct height connected and its termined. Table 2 shows the average object height and its r) images of manuscripts in object height on test image

ge ct ht

Standard Deviation of Average Object

Height

5 43.573

5 24.402

7 20.684

5 19.206

(5)

C Fig.10 shows a section of scrip image 11 shows a sample of segmentation re lines from manuscript image. In Fig. 10 lines to show the limit of each line. In th seen that starting from the image of the s on, there are parts of script images in dif the same area or overlap.

Fig.11.1. to Fig.11.9. show segmentatio models offered, besides Fig. 11.2a. Fig segmentation result in line 1 is very cle the image was segmented well without other lines. This was because there was original image between the first line an Line segmentation in the next lines, i.e shows data overlapping. Fig. 11.2a. w show that there are parts of the scripts which enter the area of the second information on average object height standard deviation, line separation or done.

Fig.11. The result of segmentation of the Using visual observation on each se number of scripts which should be in a lin line could be calculated. Table 3 shows i number of lines in the original image result which had been performed as seen

Using simple method of calculatin correctness, percentage of correctness of for manuscript image in Fig.11 was disco

159 151 100 script actual Total

script output Total s correctnes f

%oxx

The same experiment was applied for t of manuscripts in Javanese scripts. After

Copyright © 2013 IJEIR, All right reserved ge in Fig. 7 and Fig.

result of images of 10 there are dashed the figure it can be e second line and so different lines but in tation result from the Fig.11.1. shows that clean, it means that out connection with as a clear line in the and the second line. i.e. the second line . with dashed lines ipts in the third line nd line. By using ight as well as its or cutting could be

the image in Fig. 10 segmented line, the a line and still in the s information of the e and segmentation en in Fig. 11.

ting percentage of of line segmentation iscovered as follows:

97 . 94 100 

x

(2) or three other images After the result of line

image segmentation on the four testing material was obtained, p line image segmentation on calculated. Table 4 shows the s result of line image segmenta manuscripts used as test data.

The calculation of percentag different manuscript images us average percentage of correctne deviation of 4.55% as seen in average value of percentage of than 90%, it was concluded tha line segmentation on images o scripts in this study was good.

Table 3: Observation data on scripts and output of segmen Line

No Number of Script Original Output

1 17 17

2 20 20

3 16 16

4 18 18

5 18 18

6 16 11

7 19 18

8 19 19

9 16 14

Total

scripts 159 151

Table 4: Summary of percenta segmentation of the four

No Data 1 2 3 4

Average percentage of correctness Average standard deviation

 

ur manuscript images used as , percentage of correctness of on all test data could be e summary of the data of the ntation determination in all .

tage of correctness on four used in this study results in tness of 93.19% with standard in Table 4. With very high of correctness, which is more that the model suggested for s of manuscripts in Javanese

.

on the number of original entation from Figure 11.

ipt Information

ut

All scripts which should be in certain

lines are in place

There were 3 sandangan which joined the precious

line, and 2 scripts which were cut

imperfectly There was 1 which joined the previous

line All scripts which should be in the line

are in place There were 2

sandanganwhich joined the previous

line

ntage of correctness of line ur manuscript images

Percentage of Correctness

87.36 98.10 92.31 94.97 f

93.19

(6)

C

VI. C

ONCLUSION AND

F

UTURE

From the experimental test of seg offered, it could be concluded that line images of manuscripts in Javanese scrip using guides of vertical projection of the combined with moving average refin average object height value as well deviation. Segmentation result could overlapping images, but by using labe every connected object, images of lines in connected to it could be found. Howeve be continued to finish the problems cutting overlapping scripts, and whether line segmentation could be used segmentation later.

R

EFERENCES

[1] A. Nicolaou, and B. Gatos. (2009). H Segmentation by Shredding Text in

International Conference on Docum Recognition, IEEE. pp. 626-630. A cvc.uab.es/ icdar2009/papers/3725a626.pd [2] S., Palakollu, R. Dhir, and R. Rani. (201 for Line Segmentation of Handwritten Hin

of International Journal of Computer Appl on Electronics, Information and Commun ICEICE. Available: http://research. number5/iceice033.pdf.

[3] G.S. Lehal, and C. Singh. (No Yea

Segmentation of Gurmukhi Text. Avai centrepunjabi.org/pdf/A%20technique%20 %20of%20gurmukhi%20text.pdf. [4] N. Tripathy and U. Pal. (2006, Dec.). Han

of unconstrained Oriya text. Sadhana. V Available: http://www.ias.ac.in/ sadhana/ P [5] C. Weliwitage, A.L. Harvey, and A.

Handwritten Document Offline Text

Proceedings of the the Digital Imaging C and Applications (DICTA

http://nguyendangbinh.org/Proceedings/DI weliwitage_textsegment.pdf.

[6] F. Yin, and C. Liu. (2009). Handwritt segmentation by clustering with distance m

Recognition. Volume(42), 3146-3157. web.ia.ac.cn/2009papers/gjkw/gk13.pdf. [7] O. Surinta, and R. Chamchong. (2008) I

Historical Handwriting from Palm Leaf M

http://www.wbi.msu.ac.th/file/721 /doc_57 [8] M.K. Jindal, R.K. Sharma, and G.S. Lehal

of Horizontally Overlapping Lines in P

International Journal of Computational

Volume (3), 277-286. Available: http://ww [9] A.R. Widiarti, “Segmentasi Citra Dokum

Modern Mempergunakan Profil Proyeksi,

dan Teknologi, vol. 10, 2007, pp. 167-176. [10] Anonim. (No Year). Available: htt

wiki/Aksara_JawaW.

[11] C.E. Efstathiou (No Year). Signal Sm

Available: http://www.chem.uoa.gr appl_smooth2.html.

[12] T.E. Behrend, Katalog Induk Naskah-nask Museum Sonobudaya Yogyakarta.Jakarta:

Copyright © 2013 IJEIR, All right reserved

URE

W

ORK

segmentation model line segmentation of cripts could be done he manuscript image fining method with ell as its standard ld contain several label information of s in Javanese scripts ver, this study could s of separating or her the result of the d well in script

Handwritten Text Line into its Lines. 10th cument Analysis and

Available: http://www. .pdf.

011). A New Technique Hindi Text.Special Issue Applications (0975–8887)

unication Engineering–

. ijcaonline.org/iceice/

ear). A Technique for

vailable: http://advanced 20for%20segmentation

andwriting segmentation . Volume(31), 755–769. a/ Pdf2006Dec/755.pdf.

A.B. Jennings. (2005). xt Line Segmentation.

g Computing: Techniques 2005). Available: s/DICTA/2005/data/27_c

ritten Chinese text line e metric learning.Pattern

. Available: http://nlpr-.

)Image Segmentation of f Manuscripts. Available: _57.pdf.

hal. (2007). Segmentation Printed Indian Scripts.

al Intelligence Research. www.ijcir.info.

umen Teks Sastra Jawa si,”SIGMA Jurnal Sains

76.

http://id.wikipedia .org/

Smoothing Algorithms. .gr/applets/appletsmooth/

naskah Nusantara Jilid I

ta: Djambatan, 1990.

A

UTHOR

S

P

ROFILE

Anastasia Rit

received Master’s d

Gadjah Mada Univ 2000 she has been Informatics Engi University in Yo interests include Javanese manuscrip recognition.

Agus Harjoko

received the Ph.D. the of New Brunsw processing and com been teaching at t Yogyakarta.

Prof. Dr. Mar

is lecturer in Dep Faculty of Cultural in Indonesia.

Sri Hartati

received the Ph.D. the of New Brunsw Intelligence. Since Computer Science S Instrumentation St University in Yogya

ita Widiarti

s degree in Computer Science from

iversity, Yogyakarta in 2006. Since een teaching at the Department of gineering, at Sanata Dharma ogyakarta. Her current research ripts image analysis and pattern

oko

D. degree in computer science from nswick, Canada, in the field image omputer vision. Since 1987 he has t the Gadjah Mada University in

arsono

epartment of Nusantara Literature al Sciences Gadjah Mada University

Gambar

Fig.1. Nglegena Javanese scricripts [9]
Fig.6. Framework of line segmentantation stages
Table 2: Statistical data of ob object height on test image
Fig.11. The result of segmentation of thethe image in Fig. 10

Referensi

Dokumen terkait

Tabel 4.33 Data Motivasi Belajar Siswa Kelas Eksperimen dan Kelas Kontrol

Peranan Permainan Tradisional dan Permainan Konvensional dalam Aktivitas Warming Up terhadap Minat Belajar Pendidikan Jasmani. UPI

1 Yogyakarta / sebagai kota budaya / kota pariwisata serta kota pelajar / harus terus dilestarikan agar. citranya tidak luntur seiring dengan perkembangan jaman // Untuk

Jakarta: Yayasan Badan Penerbit Pekerjaan Umum.. Buku

Kewirusahaaan Terhadap Minat Untuk Menjadi Young Entrepreneur Pada Mahasiswa Prodi Manajemen Fakultas Ekonomi dan Bisnis USU. 1.2

[r]

Tesis ini merupakan salah satu syarat untuk menyelesaikan studi pada Program Studi Magister Mate- matika Fakultas Matematika dan Ilmu Pengetahuan Alam (FMIPA) Universitas

Ketentuan lebih lanjut mengenai penetapan rencana umum nasional Keselamatan Lalu Lintas dan Angkutan Jalan sebagaimana dimaksud dalam Pasal 203 ayat (2) dan kewajiban