Published by Canadian Center of Science and Education
Research on Fragments Reassembly Based on Feature of Chinese Character and Template Matching
Gui-Sen Xu1,3, Yuan-Biao Zhang2,3, Yi Lin4, Yu-Jian Lin4 & Xin-Guang Lv2,3
1 Electrical and Information School, Jinan University, Zhuhai, China
2 Packaging Engineering Institute, Jinan University, Zhuhai, China
3 Key Laboratory of Product Packaging and Logistics of Guangdong Higher Education Institutes, Jinan University, Zhuhai 519070, China
4 International Business School, Jinan University, Zhuhai, China
Correspondence: Yuan-Biao Zhang, Packaging Engineering Institute, Jinan University, Zhuhai 519070, China.
E-mail: [email protected]
Received: June 26, 2014 Accepted: July 7, 2014 Online Published: July 29, 2014 doi:10.5539/cis.v7n3p92 URL: http://dx.doi.org/10.5539/cis.v7n3p92
Abstract
The technology of fragments reassembly is widely employed in many scientific fields, such as judicial evidence recovery, restoration of historic documents, accessing to military intelligence and so on, which is based on computer vision and pattern recognition. In this paper, an efficient method for Chinese fragments reassembly is presented. The proposed reassembly method is based on the feature of line spacing and Chinese characters’
feature that the same font has the same height. Considered the feature of English characters that English letters are connected components, this paper proposes a model for English fragments reassembly, which is based on template matching. Using the proposed methods on digitally scanned images of actual fragments of paper image prints has verified the robustness and reliability of two models.
Keywords: fragments reassembly, the feature of Chinese characters’ line spacing, template matching, connectivity of English letters
1. Introduction
Fragments reassembly arises in many scientific fields, such as judicial evidence recovery, restoration of historic documents, accessing to military intelligence and so on. The manual execution of fragments reassembly is very difficult, as it requires great amount of time, skill and effort. Specifically, if the number of fragments is very large, reassemble manually will be impossible to be accomplished in a short time. Therefore, the automation of such a work is very important and can lead to faster, more efficient, painting reassembly and to a significant reduction in the human effort involved.
Fragments reassembly is an application technology based on computer vision and pattern recognition, which can be divided into two parts. The first part is to access to information of fragments by image preprocessing. The other part is to reassemble fragments based on accessed information. Conventional methods utilized corner feature of edge, outline feature, area feature and so on to reassemble fragments (Jia et al., 2005; Wolfson, 1990;
Hori, 1999; Kong, 2001; Li, 1995; Gao, 2007). These methods are based on geometric characteristics of fragments. Therefore, they can’t recover fragments with similar shape. In this field, some scholars have made related researches. Otsu N. presented Otsu algorithm based on the principle of least square method, which solved the problem of searching optimal threshold by calculating the maximum variance between grey classes (Ostu, 1979). Luo Z. solved the problem of feature extraction of Chinese document, by exacting line spacing and the height of characters (Luo, 2012). Gao et al., solved the problem of labeling 8-connected region in the model of English fragments reassembly, by combining the advantages of the mark line based method and region growing method (Gao, 2007).
Considering those methods, which are based on geometric characteristics, can’t recover fragments with similar shape, this paper combined the advantages of above methods, and proposes subregional scoring method to optimize the extraction of Chinese line spacing. Because line spacing of English is not obvious, template matching method is proposed to reassemble English fragments, which is based on 8-connected region of English
process of based on th In course algorithm.
1, means a Finally, bi transforma spacing.
If a paper spacing. T line, can b paragraph some fragm cases caus method is more area appropriat into anoth more accu
f model is div he similarity o of locating lin The next step all the pixels a inary image m ation not only
r is divided int Therefore, acco be found. How
and the end o ments’ line spa sed large erro
presented to o as. And then th te accuracy. Fi her problem th urate.
vided into two of edges of bin ne spacing in f p is to compare are white, this matrix is trans greatly reduce
Fig to two parts, ording to this f wever, becaus of paragraph, c acing feature a ors to the resu optimize the ex
his paper com inally, the pro at compares s
Figure
Figure 3. Fra
sub-progress arized pixel m fragments, firs e the value of t pixel row will formed into a e the amount o
gure 1. Transfo the break edg feature of leftm
e the size of contain large b are very simila ult of classific xtraction of lin mpares the sim
oblem, which c similarity betw
e 2. Fragments
agments that v 93 including the matrixes.
stly, the proce the same pixel l be map into 1 a column vect of data process
ormation of bi ges must have most fragments
fragments is blank area, as s
ar, but not real cation. In orde ne spacing. Th milarity betwee compares sim ween those are
s containing la
very similar bu
classification ss of image bi row in binary 1; otherwise, t tor with 0 and sing,but fully
nary image the text line s, all other frag
so small, som shown in Figu lly matching, a er to solve th he first step is en the same ar ilarity between as of two vect
arge blank area
ut not really ma
process and r inarization is c y image matrix
this pixel row d 1, as shown y embodies the
with same hei gments, which me fragments,
ure 2. Another as shown in F his problem, su
to divide the reas of two fr n two fragmen tors. This met
a
atching
reassembly pro completed by x. If all number will be map in n in Figure 1.
e feature of the
ight and same locate on the from beginnin special case is igure 3. Those ubregional sco column vector ragments unde nts, is transfo thod led to a r
ocess Ostu rs are nto 0.
This e line
e line same ng of s that e two oring r into er the rmed result
After usin score was the simila fragments, which are have made Finally, ba similarity row. When specially. U the similar the feature 6, the line edge is 21p similar situ
2.2 Steps o Step 1: C
A Step 2: Div
g subregional also low. For t arity of other , which contain
very similar b e classification ased on result of edges of bi n reassembling Under this situ rity of edges o e that printed d
spacing is 28 px, we just ne uation, fragme
of Algorithm Comparing bin
( ) 1
A ki ;or viding bi Tinto
Figure 4. Sc scoring metho the case in Fig
areas is high ning a large nu but not really m n result more a
t of classificat inarized pixel g fragments, t uation, there is of binarized pix
documents hav 8px and the bre
ed to search fo ents can be rea
Figure 5. Fra
Figure 6. Spe
nary image m
A ki ( ) 0 . Fin o
a
areas, anchematic diagra od, the similar gure 2, only the h. Therefore,
umber of blank matching, can’
accurate.
tion, the fragm matrix. The n the case that b no black pixe xel matrix. Th ve fixed line sp
eak edge just or a fragments ssembling by t
agments that th
ecial case that t
matrixes Ai o nally, Ai is tr nd denoting as
ams of subregi rity between tw
e area containi subregional sc k, can be reass
’t be reassemb ments in the s next operation break edge loc el in the edge, s erefore, an im pacing and cha
located in blan row with 7px the same meth
he break edge
the break edge
of the first i
ransformed int s b1i b2i b
ional scoring m wo fragments
ing a large num coring method sembled togeth bled by mistak same row wer is to find the cated in blank
so fragments c mproved method
aracters distan nk area. Bacau
line spacing u hod.
located in blan
e located in bla
fragments, if to column vec
T
bai . Every are method
in Figure 3 w mber of blank d can ensure her. What’s mo ke. It can be sa
re reassembled correct order area is neede can’t be reassem
d is presented, nce. For instanc
use the line sp under break ed
nk area
ank area
f the first k r tor biT. ea contains H
as low, so the has a low scor that two mat ore, two fragm aid that this me d, according to of each fragm ed to be consid mbled accordi , which is base ce shown in F pacing above b dge. If encount
row all is 1,
/ a pixels(Hi total re, so tched ments, ethod o the ments dered ng to ed on igure break ering
then s the
Step 4: Ca
W th Step 5: Af
th to Step 6:A
th an to co 3. Model f Considerin Furthermo templates.
3.1 Buildin Found out pictures, th was implem
3.2 Extrac According within fra boundary sample fra Figure 9.
8-connecte Step 1:M Step 2:A
m
represent tolerable max alculating totalWithin the faul he same row, o fter classifying he similarity of o be reassembl After reassembl his paper takes nd bottom edg o the feature t
orrectly.
for English Fr ng English let ore, the numbe
So fragments ng of English C t the font type
he library of c mented, such a
ction of Letters g to this featu agments is ma of connected agments of L a Figure 8 and ed region.
Marked connec According to th
s the first
m
x numbers of di score of two f
lt tolerance
or they are in d g all fragments f edges of bina led according t ling each row, s each row as a ges. If the brea that printed do ragments Rea tters are comp er of English pu can be reassem Characters Tem of this Englis character templ
as Figure 7.
s
ure that every arked in the o component. F and H. The foll d figure 9 are ted componen he boundary lo
m
subregion i ifferent value b fragments
, if Zscore10 different row., reassembling arized pixel m to the feature t the next step i a fragment. We ak edge located ocuments have assembly
posed of 52 u unctuations is mbled accordin mplates Librar sh document, a lates can be se
Figure 7. Bina
English letter order. The nex Finally, all lett lowing is the e schematic di nt of L and H in
cation of B an 95
in fragments.
between two a
1 a
score sco
i
Z Z
00(a2), the fi g fragments be matrix. If the br that printed do is to extract the
e recover all ro d in blank area e fixed line sp
uppercase and small. Theref ng to those tem ry
and then found et up. Finally,
ary template of
r is an 8-conn xt operation is ters are extrac extraction proc iagrams, which n the order, as nd L’s 8-connec
represent areas.ore( )i . irst
i
fragmen elong to the sam reak edge locatcuments have e edge of top a ow fragments a, fragments n pacing. Finally
d lowercase le fore, it’s conve mplates.
d all pictures o the process of
f letter O
nected region, s to extract th cted in the sam cess of letter L
h is aimed to shown in Figu cted region, ex
ts fault tolera
nt and the firs me row in the ted in blank ar fixed characte and bottom of e
according to th need to be reas
y, all fragmen
etters and Eng enient to make
of characters. A f character tem
each of conn he whole lette
me method. F L and H. The re o better explai ure 8.
xtracted letter L
ance, which is
st
j
fragment aorder, accordi rea, fragments ers distance.
each row. And he similarity o ssembled accor nts are reassem
glish punctuat English chara
According to mplates binariz
nected compon er according to Figure 8 show
esults are show in the princip
L and H.
s the
(2)
are in ng to need d then of top rding mbled
tions.
acters
these ation
nents o the
s the wn in le of
3.3 Detect After extra vector{ }M
characters whether th transforma method is 3.3.1 Initia 3.3.2 Tran 3.3.3 For
End f
tion of Comple acting the lett
T, according to embodied in here is a same ation. If there i shown as follo alize the Param sform those bi
count=1,2,3, Transform tem If rows of
{MXOR}T
Add each Else
Add each End if If Sum Sum XO
Sum Sum
End if If Sum0
This cha End if for
ete Letters ters from fragm
o the model fo
{ }M T, which i template as th is the matched ow:
meters S inary images o
…,TP
mplates into co
{ }M T is equal
{ }M TXOR M{ mod
h value of {M
h value of {M
OR
mXOR
aracter is comp
Figure 8. Bin
Figure 9. Mar
ments, the firs or Chinese frag is convenient t he fragment, th d template, it s
100000 Sum
of letters into{M
olumn vectors;
l to rows of {M
del}T;
}T
MXOR and assi
}T
M and assign
plete.
nary image of L
rked image of
st step is to tr gments reassem
to compare th he next step is shows that the
000 , number
}T M .
;
model}T M
ign toSumXOR. n toSumXOR.
L and H
L and H
ransform binar mbly. This pro he characters a
to compare {
character is co
r of torn piece
ry images of l cess makes all and templates.
{ }M Tand templ omplete. Pseud
( ) es TP .
letters into co l the informatio
In order to d lates with the do algorithm o
lumn on of detect same of the
around bre Figure 10.
Figure 11,
In order to flowchart
eak edge. If th . Otherwise, it the area inside
o reassemble a of the model i
ere aren’t inco t means these e wireframe is
Figure 10. C
Figure 11. Inc all fragments i
s shown in Fig
omplete charac two fragments s the area aroun
Complete chata
complete chata in the same ro gure 12.
97
cters, it means s aren’t match nd break edge.
acters inside ar
acters inside ar ow, the searchi
these two frag hed, as shown
.
rea around brea
rea around bre ing procedure
gments are ma in Figure 11.
ak edge
eak edge is iteratively
atched, as show For Figure 10
applied. Algor wn in 0 and
rithm
4. Experim The perfor paper ima appendix
72 180px ( model for 13.
Figur ments and Re rmance of the age prints (CU
3 and append (CUMCM, 201
English fragm
re 12. Algorith esults
proposed met UMCM, 2013 dix 4 included 13). Model for ments reassemb
hm flowchart o
thods was eval ). There were d 209 fragmen r Chinese fragm bly was applie
f model for En
luated using d e Chinese fra nt images resp ments reassem ed to reassemb
nglish fragmen
digitally scanne agments and E pectively, and mbly was appli le appendix 4.
nts reassembly
ed images of a English fragm d each fragme ied to reassemb
. The results a y
actual fragmen ments. Furtherm
ent image had ble appendix 3 are shown in F
nts of more, size 3 and
igure
Table 1. A
4.1 Analys Table 1 sh reassembly the result o appendix 3 the reassem with the sa error in the
Application resu Mo Mo sis on Result of hows that the y time efficien of recovery. O 3 was too wid mbly of all row ame line spaci e process of cl
(a) Figure
ults of the mod odel for Chines odel for English
f Appendix 3 e recovery rate ncy is high. In One of the impo
de, as shown in ws. It can be se ing. Another r lassifying.
e 13. Results o
dels
se Fragments h Fragments
e of model fo the process of ortant reasons n Figure 14. T een that the m reason is that
Figure 14. Co 99 of appendix 3 a
Recover 88.57 96.17
or Chinese fra f reassembling
is the line spa This reason cau model is more s
the line spacin
ompare of line
and appendix 4
ry rate Co 7%
7%
agments reass g fragments in acing of the sec
used serious er suitable for rea ng of few frag
spacing
(b) 4
omput. time 3.03s 47.38s
sembly reache appendix 3, tw cond row in th rror in the pro assembling the gments was ve
ed 88.57% and wo reasons affe he last paragra ocess of compl document pri ery similar, cau
d the ected ph in eting nting using
4.2 Analys Table 1 sh higher. Th recognition The font i fragments, we only n based on te In the reas the feature Figure 15 extracted t and lower connectivi impact on
In this ex fragments region, ou Furthermo have the s again. Fina in daily lif 5. Conclus This pape reassemble that the sam English c characteris Therefore, had high re In daily li handwritte fragments recognizin method of English fra 2013).
As a futur calculation Acknowle The author Logistics Universitie the proje (No.2011B
sis on Result of hows that the he reason is t n. However, w in fragments o , Zapf Calligra need to set up
emplate match ssembling proc e of English ch .These Englis these characte rcase letters m ity of letters i accuracy, whi
xample, all fra can’t solve th ur model for ore, most fragm
ame shape. W ally, we get ma fe.
sion
r considered t e Chinese frag me font has th characters’ co stics, the prop , the models ar
ecovery rate an ife, the handw en documents, reassembly is ng Chinese ch f double elasti agments reasse re work, in ord n will be inves edgments
rs acknowledg of Guangdon es, the project ects of Zhuh B050102013 &
f Appendix 4 recovery rate that this mode we utilized met
of appendix 4 aphic 801’s tem
a new templat hing presented cess, occasion haracters’ 8-co h punctuation rs through con meet the requir
in our model.
ch caused by t
F agments were he reassembly r English do ments in daily When people to any fragments
the different f gments, the m he same height onnectivity is
posed models re very practic nd the robustn written docum we need to re s similar to th haracters and E
c mesh is use embly, method der to speed up stigated.
ge the financia ng Higher Ed of the Natura hai Science,
& 2012D05019
of model for el was based thod of exhaus 4 in this exam mplate library te library. It ca
in our paper c ally there wer nnected region ns didn’t meet
nnectivity, they rement of 8-co Therefore, in those special p
Figure 15. Not e 180 72px im
of these fragm cument fragm life have the s rn papers, they with same sha
features betwe model based on is proposed. I proposed. C s can accurate cal in daily life ness and reliabi ments are very ecover those f he method pre English letters d to recognize d of principal c p calculating,
al support of th ducation Instit al Science Fou Technology, 990033).
English fragm on template m stion in the pro mple was Zapf y was set up. I an be seen tha can be widely e re little mismat
n. However, no t the requirem y were divided onnected regio n the process punctuations, c
connected pun mages. The m ments. By extr ments reassem
similar shape.
y usually torn ape. Therefore
een Chinese c n the feature o In order to reas Compared w ely recover d e. After a pract ility of models y common. In fragments. In g esented in this
s. In model o e Chinese char curves algorith using intellige
his research by tutes , the Fu undation of Gu
Industry, Tr
ments is very h matching, whi ocess, caused th f Calligraphic f we reassemb at the English employed in d tched cases. D ot all English c ment of 8-conn d into two con on. And this p of reassemblin could be ignore
nctuations method based
acting the cha mbly can rec For example, papers in half e, it can be seen
characters and of line spacing
ssemble Englis ith traditiona document frag tical experimen s were verified n order to sav general, the m paper. The m of handwritten
racters (Chen, hm is used to re ent algorithm t
y the Key Labo undamental R uangdong Prov rade and In
high but the c ich has very he higher com 801. In order ble document w
document frag daily life.
Due to English characters are nected region nnected areas.
paper focused ng fragments ed.
on geometric aracters accord over the doc fragments pro f, then fold the n that our mod
d English char g and Chinese sh fragments, t al model bas gments with th
nt, it can be se d.
ve some impo method of hand main difference n Chinese frag
2009). In mo ecognize Engl to optimize the
oratory of Prod Research Fund vince (No.S201 nformation Te
computation co high rate of mputation cost.
r to reassembl with different gments reasse model is base
connected, su . When this p
But, all uppe d on the use o by the model
c characteristic ding to 8-conne cument accura oduced by shre em in half and del is very prac
racters. In ord characters’ fe the model base sed on geom he similar sh een that our mo ortant fragmen
dwritten docum e is the metho gments reassem odel of handwr
ish characters e process of m
duct Packaging ds for the Ce 12010008773) echnology Bu
ost is letter le all font, mbly ed on ch as paper rcase of the l, the
cs of ected ately.
edder d torn ctical
der to ature ed on metric apes.
odels nts of ments
od of mbly, ritten (Ma, model
g and entral , and ureau
101
Gao, H. B., & Wang, W. X. (2007). New connected component labeling algorithm for binary image. Journal of Computer Applications, 27(11), 2776-2777.
Gao, X. T., Sattar, F., Quddus, A., & Venkateswarlu, R. (2007). Local natural scale based contour corner detection using wavelet transform. Proceedings of Information, Communications and Signal Processing, 25(6), 329-333. http://dx.doi.org/10.1109/ICICS.2005.1689061
Hori, K., Imai, M., & Ogasawara, T. (1999). Joint detection for potsherds of broken earthenware. Computer Vision and Pattern Recognition, (2), 440-445. http://dx.doi.org/10.1109/CVPR.1999.784718
Jia, H. Y., Zhu, L. J., Zhou, Z. T., & Hu, D. W. (2005). A Shape Matching Method for Automatic Reassembly of Paper Fragments. Computer Simulation, 23(11), 180-183. http://dx.doi.org/10.3969/j.issn.100 6-9348.2006.11.046
Kong, W. X., & Kimia, B. B. (2001). On solving 2D and 3D puzzles using curve matching. Computer Vision and Pattern Recognition, (2), 583-590. http://dx.doi.org/10.1109/CVPR.2001.991015
Li, H., Manjunath, B. S., & Mitra, S. K. (1995). A contour-based approach to mutlisensor image registration.
Image Processing, 4(3), 320-334. http://dx.doi.org/10.1109/83.366480
Luo, Z. (2012). Semi-auto stitching of scrapped paper based on character characteristic. Computer Engineering and Applications, 48(5), 207-210. http://dx.doi.org/10.3778/j.issn.1002-8331.2012.05.060
Ma, C., & Yu, M. (2013). Analysis and extraction of structural features of off-line handwritten characters based on principal curves algorithm. Computer Engineering and Applications, 49(3), 202-206.
Otsu, N. (1979). An automatic threshold selection method based on discriminate and least squares criteria.
Denshi Tsushin Gakkai Ronbunshi, (63), 349-356.
Wolfson, H. J. (1990). On curve matching. Pattern Analysis and Machine Intelligence, 12(5), 483-489.
http://dx.doi.org/10.1109/34.55108 Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).