In this chapter, this dissertation has proposed a dewarping algorithm of curved surface in real scene by extending the document dewarping algorithm in 3. Since the curved surface in real scene contains a lots of line segments (i.e., boundary lines of circular objects or barcode lines), the proposed algorithm yielded improved rectification performance for a range of cases. Experimental results on the tested dataset showed that the geometric measures of the proposed algorithm is higher than the conventional methods.
(a) (b)
(c) (d)
Figure 5.3: Results of the three methods. (a) Input image, (b) Result of [4], (c) Result of [4] for the salient object region, (d) Result of the proposed method.
(a) (b)
(c) (d)
Figure 5.4: Results of the three methods. (a) Input image, (b) Result of [4], (c) Result of [4] for the salient object region, (d) Result of the proposed method.
(a) (b)
(c) (d)
Figure 5.5: Results of the three methods. (a) Input image, (b) Result of [4], (c) Result of [4] for the salient object region, (d) Result of the proposed method.
(a) (b)
(c) (d)
Figure 5.6: Results of the three methods. (a) Input image, (b) Result of [1], (c) Result of [6], (d) Result of the proposed method.
(a) (b)
(c) (d)
Figure 5.7: Results of the three methods. (a) Input image, (b) Result of [1], (c) Result of [6], (d) Result of the proposed method.
(a) (b)
(c) (d)
Figure 5.8: Results of the three methods. (a) Input image, (b) Result of [1], (c) Result of [6], (d) Result of the proposed method.
Chapter 6
Conclusions
In this dissertation, a new rectification method for document and scene text image based on alignment properties have been proposed. For document image dewarping, the proposed method exploited line segment properties which are valid in non-text as well as text regions, encoded this properties into a cost function and obtained optimum transformation parameters by minimizing the cost function. Since the pro- posed method considered the properties on non-text as well as text regions, the proposed algorithm yields improved OCR accuracy and geometric measures, espe- cially for the images include many non-text regions. For scene text rectification, in the proposed method, two properties of rectified text images were encoded into the cost function and also obtained transformation parameters by minimizing the cost function. Since the proposed method considered the alignments of characters as well as glyph (low-rank) property, the proposed algorithm yielded improved rectification performance for a range of cases. Experimental results on natural and synthetic im- ages showed that the OCR accuracy of the proposed algorithm is higher than the conventional low-rank based method. In addition, the proposed alignment term of
line segments was extended to the curved surface dewarping in a real scene.
Bibliography
[1] B. S. Kim, H. I. Koo, and N. I. Cho, “Document dewarping via text-line based optimization,” Pattern Recognition, vol. 48, no. 11, pp. 3600–3614, 2015.
[2] R. G. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “LSD: a line segment detector,”Image Processing On Line, vol. 2, pp. 35–55, 2012.
[3] S. S. Bukhari, F. Shafait, and T. M. Breuel, “Dewarping of document images us- ing coupled-snakes,” inProceeding of International Workshop on Camera Based Document Analysis and Recognition, 2009, pp. 34–41.
[4] Z. Zhang, X. Liang, A. Ganesh, and Y. Ma, “TILT: transform invariant low- rank textures,” inAsian Conference on Computer Vision, 2010, pp. 314–328.
[5] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection via graph-based manifold ranking,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3166–3173.
[6] Z. Zhang, X. Liang, and Y. Ma, “Unwrapping low-rank textures on generalized cylindrical surfaces,” in IEEE International Conference on Computer Vision, 2011, pp. 1347–1354.
[7] R. Smith, “An overview of the tesseract ocr engine,” inInternational Conference on Document Analysis and Recognition, vol. 2, 2007, pp. 629–633.
[8] T. A. Tran, I. S. Na, and S. H. Kim, “Page segmentation using minimum ho- mogeneity algorithm and adaptive mathematical morphology,” International Journal on Document Analysis and Recognition, vol. 19, no. 3, pp. 191–209, 2016.
[9] H. I. Koo and D. H. Kim, “Scene text detection via connected component clus- tering and nontext filtering,”IEEE Transactions on Image Processing, vol. 22, no. 6, pp. 2296–2305, 2013.
[10] L. O’Gorman, “The document spectrum for page layout analysis,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1162–
1173, 1993.
[11] G. Nagy, “Twenty years of document image analysis in pami,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 38–62, 2000.
[12] H. I. Koo, “Text-line detection in camera-captured document images using the state estimation of connected components,”IEEE Transactions on Image Pro- cessing, vol. 25, no. 11, pp. 5358–5368, 2016.
[13] L.-Y. Duan, R. Ji, Z. Chen, T. Huang, and W. Gao, “Towards mobile document image retrieval for digital library,” IEEE Transactions on Multimedia, vol. 16, no. 2, pp. 346–359, 2014.
[14] K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Quarterly of Applied Mathematics, vol. 2, no. 2, pp. 164–168, 1944.
[15] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinear parameters,” Journal of the society for Industrial and Applied Mathematics, vol. 11, no. 2, pp. 431–441, 1963.
[16] Y. Lu, “Machine printed character segmentation—; an overview,” Pattern Recognition, vol. 28, no. 1, pp. 67–80, 1995.
[17] R. G. Casey and E. Lecolinet, “A survey of methods and strategies in character segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 18, no. 7, pp. 690–706, 1996.
[18] M. S. Brown, M. Sun, R. Yang, L. Yun, and W. B. Seales, “Restoring 2d con- tent from distorted documents,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 11, 2007.
[19] L. Zhang, Y. Zhang, and C. Tan, “An improved physically-based method for geometric restoration of distorted document images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 728–734, 2008.
[20] O. Samko, Y.-K. Lai, D. Marshall, and P. L. Rosin, “Virtual unrolling and in- formation recovery from scanned scrolled historical documents,”Pattern Recog- nition, vol. 47, no. 1, pp. 248–259, 2014.
[21] G. Meng, Y. Wang, S. Qu, S. Xiang, and C. Pan, “Active flattening of curved document images via two structured beams,” in Conference on Computer Vi- sion and Pattern Recognition, 2014, pp. 3890–3897.
[22] Y.-C. Tsoi and M. S. Brown, “Multi-view document rectification using bound- ary,” in IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.
[23] H. I. Koo, J. Kim, and N. I. Cho, “Composition of a dewarped and enhanced document image from two view images,”IEEE Transactions on Image Process- ing, vol. 18, no. 7, pp. 1551–1562, 2009.
[24] S. You, Y. Matsushita, S. Sinha, Y. Bou, and K. Ikeuchi, “Multiview rectifi- cation of folded documents,”IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 2017.
[25] C. L. Tan, L. Zhang, Z. Zhang, and T. Xia, “Restoring warped document im- ages through 3d shape modeling,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 2, pp. 195–208, 2006.
[26] F. Courteille, A. Crouzil, J.-D. Durou, and P. Gurdjos, “Shape from shading for the digitization of curved documents,” Machine Vision and Applications, vol. 18, no. 5, pp. 301–316, 2007.
[27] L. Zhang, A. M. Yip, M. S. Brown, and C. L. Tan, “A unified framework for document restoration using inpainting and shape-from-shading,” Pattern Recognition, vol. 42, no. 11, pp. 2961–2978, 2009.
[28] Y. Takezawa, M. Hasegawa, and S. Tabbone, “Camera-captured document im- age perspective distortion correction using vanishing point detection based on radon transform,” inPattern Recognition (ICPR), 2016 23rd International Con- ference on. IEEE, 2016, pp. 3968–3974.
[29] B. Fu, M. Wu, R. Li, W. Li, Z. Xu, and C. Yang, “A model-based book dewarp- ing method using text line detection,” in International Workshop on Camera Based Document Analysis and Recognition, 2007, pp. 63–70.
[30] N. Stamatopoulos, B. Gatos, I. Pratikakis, and S. J. Perantonis, “A two-step de- warping of camera document images,” inInternational Workshop on Document Analysis Systems, 2008, pp. 209–216.
[31] C. Liu, Y. Zhang, B. Wang, and X. Ding, “Restoring camera-captured distorted document images,” International Journal on Document Analysis and Recogni- tion, vol. 18, no. 2, pp. 111–124, 2015.
[32] B. Epshtein, “Determining document skew using inter-line spaces,” in IEEE International Conference on Document Analysis and Recognition, 2011, pp. 27–
31.
[33] S. N. Srihari and V. Govindaraju, “Analysis of textual images using the hough transform,”Machine Vision and Applications, vol. 2, no. 3, pp. 141–153, 1989.
[34] I. Bar-Yosef, N. Hagbi, K. Kedem, and I. Dinstein, “Fast and accurate skew estimation based on distance transform,” in IEEE International Workshop on Document Analysis Systems, 2008, pp. 402–407.
[35] C. Sun and D. Si, “Skew and slant correction for document images using gra- dient direction,” inIEEE International Conference on Document Analysis and Recognition, vol. 1, 1997, pp. 142–146.
[36] P. Clark and M. Mirmehdi, “Rectifying perspective views of text in 3d scenes using vanishing points,” Pattern Recognition, vol. 36, no. 11, pp. 2673–2686, 2003.
[37] J. Liang, D. DeMenthon, and D. Doermann, “Geometric rectification of camera- captured document images,”IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 30, no. 4, pp. 591–605, 2008.
[38] S. Messelodi and C. M. Modena, “Automatic identification and skew estimation of text lines in real scene images,”Pattern Recognition, vol. 32, no. 5, pp. 791–
810, 1999.
[39] G. K. Myers, R. C. Bolles, Q.-T. Luong, J. A. Herson, and H. B. Aradhye,
“Rectification and recognition of text in 3-d scenes,” International Journal of Document Analysis and Recognition, vol. 7, no. 2-3, pp. 147–158, 2005.
[40] M. Buˇsta, T. Drtina, D. Helekal, L. Neumann, and J. Matas, “Efficient character skew rectification in scene text images,” in Asian Conference on Computer Vision, 2014, pp. 134–146.
[41] X. Zhang, Z. Lin, F. Sun, and Y. Ma, “Rectification of optical characters as transform invariant low-rank textures,” in IEEE International Conference on Document Analysis and Recognition, 2013, pp. 393–397.
[42] ——, “Transform invariant text extraction,” The Visual Computer, vol. 30, no. 4, pp. 401–415, 2014.
[43] B. Wang, C. Liu, and X. Ding, “A scheme for automatic text rectification in real scene images,” inSPIE/IS&T Electronic Imaging, 2015, pp. 94 080M–94 080M.
[44] C. Merino-Gracia, M. Mirmehdi, J. Sigut, and J. L. Gonz´alez-Mora, “Fast perspective recovery of text in natural scenes,” Image and Vision Computing, vol. 31, no. 10, pp. 714–724, 2013.
[45] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, “Robust scene text recognition with automatic rectification,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4168–4176.
[46] F. Dellaert, S. M. Seitz, C. E. Thorpe, and S. Thrun, “Structure from motion without correspondence,” inIEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2000, pp. 557–564.
[47] P. Sturm and B. Triggs, “A factorization based algorithm for multi-image pro- jective structure and motion,” European Conference on Computer Vision, pp.
709–720, 1996.
[48] W. Hong, A. Y. Yang, K. Huang, and Y. Ma, “On symmetry and multiple-view geometry: Structure, pose, and calibration from a single image,”International Journal of Computer Vision, vol. 60, no. 3, pp. 241–265, 2004.
[49] M. Park, K. Brocklehurst, R. T. Collins, and Y. Liu, “Deformed lattice detection in real-world images using mean-shift belief propagation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1804–1816, 2009.
[50] H. I. Koo and N. I. Cho, “State estimation in a document image and its ap- plication in text block identification and text line extraction,” in European Conference on Computer Vision, 2010, pp. 421–434.
[51] H. Lee, E. Shechtman, J. Wang, and S. Lee, “Automatic upright adjustment of photographs with robust camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 5, pp. 833–844, 2014.
[52] J. An, H. I. Koo, and N. I. Cho, “Rectification of planar targets using line segments,”Machine Vision and Applications, vol. 28, no. 1, pp. 91–100, 2017.
[53] K. E. Atkinson, An introduction to numerical analysis. John Wiley & Sons, 2008.
[54] F. Shafait and T. M. Breuel, “Document image dewarping contest,” in Pro- ceeding of International Workshop on Camera Based Document Analysis and Recognition, 2007, pp. 181–188.
[55] B. Gatos, I. Pratikakis, and K. Ntirogiannis, “Segmentation based recovery of arbitrarily warped document images,” inProceeding of International Conference Document Analysis and Recognition, vol. 2, 2007, pp. 989–993.
[56] A. Masalovitch and L. Mestetskiy, “Usage of continuous skeletal image rep- resentation for document images de-warping,” in Proceeding of International Workshop on Camera Based Document Analysis and Recognition, 2007, pp.
45–53.
[57] V. Levenstein, “Binary codes capable of correcting spurious insertions and dele- tions of ones,” Problems of Information Transmission, vol. 1, no. 1, pp. 8–17, 1965.
[58] J. Ryu, H. I. Koo, and N. I. Cho, “Language-independent text-line extraction algorithm for handwritten documents,”IEEE Signal Processing Letters, vol. 21, no. 9, pp. 1115–1119, 2014.
[59] Z. Lin, M. Chen, and Y. Ma, “The augmented lagrange multiplier method for ex- act recovery of corrupted low-rank matrices,”arXiv preprint arXiv:1009.5055, 2010.
[60] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orienta- tions in natural images,” inIEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1083–1090.
[61] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. de las Heras, “ICDAR robust reading competition,” inIEEE International Conference on Document Analysis and Recognition, 2013, pp. 1484–1493.
초록
카메라로 촬영한 텍스트 영상에 대해서,광학 문자 인식(OCR)은 촬영된 장면을 분석 하는데 있어서 매우 중요하다.하지만 올바른 텍스트영역 검출 후에도,촬영한 영상에 대한 문자 인식은 여전히 어려운 문제로 여겨진다.이는 종이의 구부러짐과 카메라 시 점에의한기하학적인왜곡때문이고,따라서이러한텍스트영상에대한평활화는문자 인식에있어서필수적인전처리과정으로여겨진다.이를위한왜곡된촬영 영상을정면 시점으로복원하는 텍스트 영상평활화방법들은활발히연구되어지고 있다.최근에는, 평활화가 잘 된 텍스트의 성질에 초점을 맞춘 연구들이 주로 진행되고 있다. 이러한 관점에서,본 학위 논문은 텍스트영상평활화를 위하여 새로운 정렬 특성들을 다룬다. 이러한 정렬 특성들은 비용 함수로 설계되어지고, 비용 함수를 최소화하는 방법을 통 해서 평활화에 사용되어지는 평활화 변수들이 구해진다. 본 학위 논문은 “문서 영상 평활화,장면 텍스트 평활화, 일반 배경 속의 휘어진 표면 평활화”와 같이 3가지 세부 주제로 나눠진다.
첫번째로,본학위논문은텍스트라인들과선분들의정렬특성에기반의문서영상 평활화방법을 제안한다.기존의 텍스트 라인 기반의문서 영상 평활화 방법들의 경우, 문서가복잡한 레이아웃형태이거나적은수의텍스트라인을포함하고있을때문제가 발생한다.이는문서에텍스트대신그림,그래프혹은표와같은영역이많은경우이다. 따라서 레이아웃에강인한 문서 영상 평활화를 위하여 제안하는 방법은정렬된 텍스트 라인뿐만 아니라 선분들도 이용한다. 올바르게 평활화 된 선분들은 여전히 일직선의
형태이고,대부분가로 혹은 세로방향으로정렬되어있다는가정및 관측에 근거하여, 제안하는 방법은 이러한 성질들을 수식화하고 이를 텍스트 라인 기반의 비용 함수와 결합한다.그리고 비용 함수를 최소화 하는방법을 통해,제안하는 방법은종이의구부 러짐,카메라시점,초점거리와같은평활화변수들을추정한다.또한,오검출된텍스트 라인들과 임의의 방향을 가지는 선분들과 같은 이상점(outlier)을 고려하여, 제안하는 방법은 반복적인 단계로 설계된다. 각 단계에서,정렬 특성을 만족하지 않는 이상점들 은 제거되고,제거되지 않은 텍스트 라인 및 선분들만이 비용함수 최적화에 이용된다. 수행한실험 결과들은제안하는방법이 다양한레이아웃에 대하여 강인함을 보여준다. 두 번째로는, 본 논문은 장면 텍스트 평활화 방법을 제안한다. 기존 장면 텍스트 평활화 방법들의 경우, 가로/세로 방향의 획,대칭 형태와 같은 문자가 가지는 고유의 생김새에 관련된 특성을 이용한다. 하지만,이러한 방법들은 문자들의 정렬 형태는 고 려하지않고,각각 개별문자에대한특성들만을 이용하기때문에 여러문자들로구성된 텍스트에대해서잘정렬되지않은결과를출력한다.이러한문제점을해결하기위하여, 제안하는 방법은 문자들의 정렬 정보를 이용한다. 정확하게는, 문자 고유의 모양뿐만 아니라 정렬 특성들도 함께 비용함수로 수식화되고, 비용함수를 최소화하는 방법을 통해서 평활화가 진행된다. 또한,문자들의 정렬 특성을 수식화하기 위하여,제안하는 방법은 텍스트를각각 개별문자들로분리하는문자분리 또한수행한다.그뒤,텍스트 의 위,아래선들을RANSAC알고리즘을 이용한최소제곱법을통해추정한다.즉,전체 알고리즘은문자분리와선추정,평활화가반복적으로수행된다.제안하는비용함수는
볼록(convex)형태가 아니고또한 많은 변수들을 포함하고있기 때문에,이를 최적화하
기 위하여 Augmented Lagrange Multiplier 방법을 이용한다. 제안하는 방법은 일반 촬영 영상과 합성된 텍스트 영상을 통해 실험이 진행되었고, 실험 결과들은 제안하는 방법이 기존 방법들에 비하여 높은 인식 성능을 보이면서 동시에 시각적으로도 좋은 결과를 보임을보여준다.
마지막으로,제안하는방법은 일반 배경속의휘어진표면평활화방법으로도확장된